Difference between revisions of "Proxmox VE 4.x Cluster"

Revision as of 13:35, 25 November 2015

Introduction

Proxmox VE 4.x (and all versions above) cluster enables central management of multiple physical servers. A Proxmox VE Cluster consists of several nodes (up to 32 physical nodes, probably more, dependent on network latency).

Main features

Centralized web management, including secure console
Support for multiple authentication sources (e.g. local, MS ADS, LDAP, ...)
Role based permission management for all objects (VM´s, storages, nodes, etc.)
Creates multi-master clusters
Proxmox Cluster file system (pmxcfs): Database-driven file system for storing configuration files, replicated in real-time on all nodes using corosync (maximal size 30 MB)
Migration of Virtual Machines between physical hosts
Cluster-wide logging
RESTful web API
Self-fencing as out of the box method (also possible to use power- or network-fencing).
Fast deployment
Cluster-wide Firewall
Linux Container migration

Requirements

NOTE: It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster

All nodes must be in the same network as corosync uses IP Multicast to communicate between nodes (See also Corosync Cluster Engine). Note: Some switches do not support IP multicast by default and must be manually enabled first. See multicast notes for more information about multicast.
Date and time have to be synchronized.
SSH tunnel on port 22 between nodes is used.
If you are interested in High Availability too, for reliable quorum you must have at least 3 active nodes at all times (all nodes should have the same version).
If shared storage is used a dedicated NIC for the traffic is needed.

Proxmox VE Cluster

First, install the Proxmox VE on all nodes, see Installation. Make sure that each Proxmox VE node is installed with the final hostname and IP configuration. Changing the hostname and IP is not possible after cluster creation.

Currently the cluster creation has to be done on the console, you can login to the Proxmox VE node via ssh.

All settings can be done via "pvecm", the Proxmox VE Cluster manager toolkit.

Create the cluster

Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.

Create:

hp1# pvecm create YOUR-CLUSTER-NAME

To check the state of cluster:

hp1# pvecm status

Adding nodes to the Cluster

Login via ssh to the other Proxmox VE nodes. Please note, the nodes cannot hold any any VM´s. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration).

Add a node:

hp2# pvecm add IP-ADDRESS-CLUSTER

For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.

To check the state of cluster:

hp2# pvecm status

Display the state of cluster:

hp2# pvecm status
Quorum information
------------------
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.15.91
0x00000002          1 192.168.15.92 (local)
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94

Display the nodes of cluster:

hp2# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 hp1
         2          1 hp2 (local)
         3          1 hp3
         4          1 hp4

Remove a cluster node

Read carefully the procedure before proceeding, as it could not be what you want or need.

Move all virtual machines from the node, just use the Central Web-based Management to migrate or delete all VM´s. Make sure you have no local backups you want to keep, or save them accordingly.

Log in to one remaining node via ssh. Issue a pvecm nodes command to identify the nodeID:

hp1# pvecm status

Quorum information
------------------
Date:             Mon Apr 20 12:30:13 2015
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1928
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.15.91 (local)
0x00000002          1 192.168.15.92
0x00000003          1 192.168.15.93
0x00000004          1 192.168.15.94

ATTENTION: at this point you must power off the node to be removed and make sure that it will not power on again (in the network) as it is.

hp1# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 hp1 (local)
         2          1 hp2
         3          1 hp3

Log in to one remaining node via ssh. Issue the delete command (here deleting node hp2):

hp1# pvecm delnode hp4

If the operation succeeds no output is returned, just check the node list again with 'pvecm nodes' (or just 'pvecm n'), you should see something like:

hp1# pvecm status
Quorum information
------------------
Date:             Mon Apr 20 12:44:28 2015
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1992
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           3  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.15.90 (local)
0x00000002          1 192.168.15.91 
0x00000003          1 192.168.15.92

ATTENTION: as said above, it is very important to power off the node before removal, and make sure that it will not power on again (in the network) as it is.

If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state.

If for whatever reason you want that this server joins the same cluster again, you have to

reinstall pve on it from scratch
reinstall it as a new node
and then join it, as said in the previous section.

Referring to the above example, you can create a new hp5 node, which will then join the cluster.

Re-installing a cluster node

Remove all virtual machines from the node.

Stop the following services:

systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service

Backup /var/lib/pve-cluster/

tar -czf /root/pve-cluster-backup.tar.gz /var/lib/pve-cluster

Backup /root/.ssh/ , there are two symlinks here to the shared pve config authorized_keys and authorized_keys.orig , you need not worry about these two yet as they're stored in /var/lib/pve-cluster/

tar -czf /root/ssh-backup.tar.gz /root/.ssh

Shut server down & re-install. Make sure the hostname is the same as it was before you continue.

Stop the following services:

systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service

Restore the files in /root/.ssh/

cd / ; tar -xzf /root/ssh-backup.tar.gz

Replace /var/lib/pve-cluster/ with your backup copy

rm -rf /var/lib/pve-cluster
cd / ; tar -xzf /root/pve-cluster-backup.tar.gz

Start pve-cluster

systemctl start pve-cluster.service

Restore the two ssh symlinks:

ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys
ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys.orig

Start the rest of the services:

systemctl start pvestatd.service
systemctl start pvedaemon.service

Working with the Proxmox VE Cluster

Now you can start creating virtual machines on your cluster nodes by using the Central Web-based Management on each node.

Troubleshooting

General

Date and time have to be synchronized (check "ntpdc -p")
Check /etc/hosts for an actual IP address of a system

Video Tutorials

Proxmox VE Youtube channel