Proxmox VE 4.x Cluster: Difference between revisions
Line 41: | Line 41: | ||
=== Adding nodes to the Cluster === | === Adding nodes to the Cluster === | ||
Login via ssh to the '''other''' Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration). | Login via ssh to the '''other''' Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration). | ||
'''WARNING: Adding a node to the cluster will delete it's current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.''' | |||
'''Add the current node to the cluster:''' | '''Add the current node to the cluster:''' |
Revision as of 19:44, 14 December 2016
Introduction
Proxmox VE 4.x (and all versions above) cluster enables central management of multiple physical servers. A Proxmox VE Cluster consists of several nodes (up to 32 physical nodes, probably more, dependent on network latency).
Main features
- Centralized web management, including secure console
- Support for multiple authentication sources (e.g. local, MS ADS, LDAP, ...)
- Role based permission management for all objects (VM´s, storages, nodes, etc.)
- Creates multi-master clusters
- Proxmox Cluster file system (pmxcfs): Database-driven file system for storing configuration files, replicated in real-time on all nodes using corosync (maximal size 30 MB)
- Migration of Virtual Machines between physical hosts
- Cluster-wide logging
- RESTful web API
- Self-fencing as out of the box method (also possible to use power- or network-fencing).
- Fast deployment
- Cluster-wide Firewall
- Linux Container migration
Requirements
NOTE: It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster
- All nodes must be in the same network as corosync uses IP Multicast to communicate between nodes (See also Corosync Cluster Engine). Note: Some switches do not support IP multicast by default and must be manually enabled first. See multicast notes for more information about multicast.
- Date and time have to be synchronized.
- SSH tunnel on port 22 between nodes is used.
- If you are interested in High Availability too, for reliable quorum you must have at least 3 active nodes at all times (all nodes should have the same version).
- If shared storage is used a dedicated NIC for the traffic is needed.
Proxmox VE Cluster
First, install the Proxmox VE on all nodes, see Installation. Make sure that each Proxmox VE node is installed with the final hostname and IP configuration. Changing the hostname and IP is not possible after cluster creation.
Currently the cluster creation has to be done on the console, you can login to the Proxmox VE node via ssh.
All settings can be done via "pvecm", the Proxmox VE Cluster manager toolkit.
Create the cluster
Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.
Create:
hp1# pvecm create YOUR-CLUSTER-NAME
To check the state of cluster:
hp1# pvecm status
Adding nodes to the Cluster
Login via ssh to the other Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration). WARNING: Adding a node to the cluster will delete it's current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.
Add the current node to the cluster:
hp2# pvecm add IP-ADDRESS-CLUSTER
For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.
To check the state of cluster:
hp2# pvecm status
Display the state of cluster:
hp2# pvecm status Quorum information ------------------ Date: Mon Apr 20 12:30:13 2015 Quorum provider: corosync_votequorum Nodes: 4 Node ID: 0x00000001 Ring ID: 1928 Quorate: Yes Votequorum information ---------------------- Expected votes: 4 Highest expected: 4 Total votes: 4 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.15.91 0x00000002 1 192.168.15.92 (local) 0x00000003 1 192.168.15.93 0x00000004 1 192.168.15.94
Display the nodes of cluster:
hp2# pvecm nodes Membership information ---------------------- Nodeid Votes Name 1 1 hp1 2 1 hp2 (local) 3 1 hp3 4 1 hp4
Remove a cluster node
Read carefully the procedure before proceeding, as it could not be what you want or need.
Move all virtual machines from the node, just use the Central Web-based Management to migrate or delete all VM´s. Make sure you have no local backups you want to keep, or save them accordingly.
Log in to one remaining node via ssh. Issue a pvecm nodes command to identify the nodeID:
hp1# pvecm status Quorum information ------------------ Date: Mon Apr 20 12:30:13 2015 Quorum provider: corosync_votequorum Nodes: 4 Node ID: 0x00000001 Ring ID: 1928 Quorate: Yes Votequorum information ---------------------- Expected votes: 4 Highest expected: 4 Total votes: 4 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.15.91 (local) 0x00000002 1 192.168.15.92 0x00000003 1 192.168.15.93 0x00000004 1 192.168.15.94
ATTENTION: at this point you must power off the node to be removed and make sure that it will not power on again (in the network) as it is.
hp1# pvecm nodes Membership information ---------------------- Nodeid Votes Name 1 1 hp1 (local) 2 1 hp2 3 1 hp3
Log in to one remaining node via ssh. Issue the delete command (here deleting node hp2):
hp1# pvecm delnode hp2
If the operation succeeds no output is returned, just check the node list again with 'pvecm nodes' (or just 'pvecm n'), you should see something like:
hp1# pvecm status Quorum information ------------------ Date: Mon Apr 20 12:44:28 2015 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 0x00000001 Ring ID: 1992 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 3 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.15.90 (local) 0x00000002 1 192.168.15.91 0x00000003 1 192.168.15.92
ATTENTION: as said above, it is very important to power off the node before removal, and make sure that it will not power on again (in the network) as it is.
If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state.
If for whatever reason you want that this server joins the same cluster again, you have to
- reinstall pve on it from scratch
- reinstall it as a new node
- and then join it, as said in the previous section.
Referring to the above example, you can create a new hp5 node, which will then join the cluster.
Re-installing a cluster node
Prepare the node for re-install
Remove all virtual machines from the node by either transfering them to other nodes or creating a backup to an external storage and deleting them from the node.
Stop the following services:
systemctl stop pvestatd.service systemctl stop pvedaemon.service systemctl stop pve-cluster.service
Backup node and cluster configuration
Backup /var/lib/pve-cluster/
tar -czf /root/pve-cluster-backup.tar.gz /var/lib/pve-cluster
Backup /root/.ssh/ , there are two symlinks here to the shared pve config authorized_keys and authorized_keys.orig, don't worry about these two yet as they're stored in /var/lib/pve-cluster/
tar -czf /root/ssh-backup.tar.gz /root/.ssh
Backup /etc/corosync/
tar -czf /root/corosync-backup.tar.gz /etc/corosync
Backup /etc/hosts/
cp /etc/hosts /root/
Backup /etc/network/interfaces
cp /etc/network/interfaces /root/
If applicable, nor forget the settings related to iSCSI and multipath (/etc/iscsi/initiatorname.iscsi /etc/iscsi/iscsid.conf and /etc/multipath.conf are files that would like to keep for future reference help in the configuration of the new installation)
Backup files to you client machine via SCP (or a pen drive).
List of files to copy:
- /root/pve-cluster-backup.tar.gz
- /root/ssh-backup.tar.gz
- /root/corosync-backup.tar.gz
- /root/hosts
- /root/interfaces
Re-install the node
- Shutdown the server
- If you are using severals eth interfaces and LACP or some kind of load balancing you should configure the first port of the switch to single mode (no LACP) to allow the standard network config in proxmox to connect to the network.
- Re-install. Make sure the hostname is the same as it was before you continue.
- Activate license again if you have any.
- Install updates, to get the same patchlevel as the other nodes.
Restore node and cluster configuration
Copy the config files to the folder /root via SCP or from the pen drive.
Restore /etc/hosts/
cp /root/hosts /etc/hosts
Restore /etc/network/interfaces
cp /root/interfaces /etc/network/interfaces
IMPORTANT
Make sure that you have the right switch configuration in case you're using vlans, specific port assignments or LACP!
If you are using OVS (OpenvSwitch) you have to install the package before reboot.
apt-get install openvswitch-switch
Reboot server
Stop the following services:
systemctl stop pvestatd.service systemctl stop pvedaemon.service systemctl stop pve-cluster.service
Restore the files in /root/.ssh/
cd / ; tar -xzf /root/ssh-backup.tar.gz
Replace /var/lib/pve-cluster/ with your backup copy
rm -rf /var/lib/pve-cluster cd / ; tar -xzf /root/pve-cluster-backup.tar.gz
Replace /etc/corosync/ with your backup copy
rm -rf /etc/corosync cd / ; tar -xzf /root/corosync-backup.tar.gz
Start pve-cluster
systemctl start pve-cluster.service
Restore the two ssh symlinks:
ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys.orig
Start the rest of the services:
systemctl start pvestatd.service systemctl start pvedaemon.service
Re-add the node to the cluster to update the keys and then update the certs
pvecm add xxx.xxx.xxx.xxx -force pvecm updatecerts
Accept ssh keys (again) from other nodes
This may not be required, but may be required in some cases to make it work without erros!
Additionally you'll need to establish a ssh connection from every cluster other node to the re-installed node via ssh to accept the new host key.
If you have several subnets configured in your nodes make sure that you're accessing the correct ip via ssh.
ssh xxx.xxx.xxx.xxx ........ yes
Working with the Proxmox VE Cluster
Now you can start creating virtual machines on your cluster nodes by using the Central Web-based Management on any node.
Troubleshooting
General
- Date and time have to be synchronized (check "ntpdc -p")
- Check /etc/hosts for an actual IP address of a system