Separate Cluster Network
Introduction
It is good practice to use a separate network for corosync, which handles the cluster communication in Proxmox VE. It is one of the most important part in an fault tolerant (HA) system and other network traffic may disturb corosync. Storage communication should never be on the same network as corosync!
Also good practice is to add redundancy to your Cluster Network. This can be done by using RRP in combination with two physical separated networks. Besides the obvious benefits that the cluster still works on a switch failure, also the maintenance of your systems becomes easier. A firmware upgrade of a switch, for example, can be done on a running cluster with no downtime, as the other ring still handles the traffic in the time between.
This article shows you a way to use a completely separated corosync network in Proxmox VE 4.0, version 4.0-23 of the pve-cluster package is recommended.
Prerequisites
This HowTo uses a three node cluster with the nodes called 'one', 'two', 'three'.
An own NIC and an own (gigabit, although 100Mbit should be sufficient) switch for corosync is used. The NIC is configured on the eth1 interface and the network is 10.10.1.0/24
Reading through the corosync.conf manual entry is a good idea to get some hints and to see which options does what.
man corosync.conf
We distinguish two cases, one when we want to use one separated network:
- from the beginning, i.e. at cluster creation time
- when we already have a running cluster
Note: back up /etc/pve/corosync.conf (not existent if the cluster wasn't created) and /etc/hosts from each node, that lets you revert back when something bad happened. Changes to /etc/pve/corosync.conf will immediately propagate to all nodes and trigger a corosync config reload, if the reload fails the old config remains in use.
Configure interfaces
Build up an static network by editing /etc/network/interfaces, see the example of one node below.
auto eth1 iface eth1 inet static address 10.10.1.151 netmask 255.255.255.0
Do that on every node, change the address respectively. (in this example we use *.151 *.152 and *.153 as they mirror the endings of the interface/VM traffic IPs).
Restart the network and see if you can ping each node on the new network, be sure that multicast works and is not blocked by the firewall.
Configure hosts file
Now configure the /etc/hosts file so that we can use hostnames in the corosync config. This isn't strictly necessary you can also set the addresses directly but helps to keep the overview and is considered as good practice. Note that I added entries for the other nodes too, this isn't necessary but good practice as we can resolve them faster.
127.0.0.1 localhost.localdomain localhost 192.168.15.151 one.proxmox.com one pvelocalhost # corosync network hosts 10.10.1.151 one-corosync.proxmox.com one-corosync 10.10.1.152 two-corosync.proxmox.com two-corosync 10.10.1.153 three-corosync.proxmox.com three-corosync # The following lines are desirable for IPv6 capable hosts [...]
Setup at Cluster Creation
Since version 4.0-23 of the pve-cluster package we have built in support for creating the cluster with separate corosync ring(s) on own networks. If you're running a earlier version please update your system first.
bindnetaddr
This specifies the network address the corosync executive should bind to. bindnetaddr should be an IP address configured on the system, or a network address. For example, if the local interface is 192.168.5.151 with netmask 255.255.255.0, you should set bindnetaddr to 192.168.5.151 or 192.168.5.0. If the local interface is 192.168.5.151 with netmask 255.255.255.192, set bindnetaddr to 192.168.5.151 or 192.168.5.128, and so forth. This may also be an IPV6 address, in which case IPV6 networking will be used. In this case, the exact address must be specified and there is no automatic selection of the network interface within a specific subnet as with IPv4.
Note that a FQDN/hostname isn't allowed here, use a 'real' IP address.
ringX_addr
Hostname (or IP) of the corosync ringX (X can be 0 or 1) address of this node. There can be also two rings, see Redundant Ring Protocol for setup instructions.
Normally the for corosync defined hostname from the /etc/hosts file for that
Final Command
I our example the following parameters would be used when creating the cluster on the node named 'one':
- bindnetaddr: 10.10.10.151
- ring0_adress: one-corosync
pvecm create <clustername> -bindnet0_addr 10.10.10.151 -ring0_addr one-corosync
Setup on a Running Cluster
Needs pve-cluster in version 4.0-23 to properly work.
Note that a whole cluster reboot is needed to make this changes on a running cluster. Note look for 'no-reboot' way
Configure corosync
- First copy the current corosync config:
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
- Then edit the copied file with your favorite editor, or use nano as it is available on every Proxmox VE node by default:
nano /etc/pve/corosync.conf.new
- in the editor adapt the following attributes:
- if not already there, add an "name: <nodename>" entry to each node {} section.
- ring0_addr from every node entry, change it to the new defined hostnames from /etc/hosts.
- bindnetaddr in the totem entry. Change it to the matching IP from the separate network, (e.g. in our case I use the node with nodeid 1 and change 192.168.15.151 to 10.10.1.151)
- config_version: increase it, very important, you can write any number which is higher then the actual one, but you need to increase it.
Here is an example how it could look:
logging { debug: off to_syslog: yes } nodelist { node { name: two nodeid: 2 quorum_votes: 1 ring0_addr: two-corosync } node { name: one nodeid: 1 quorum_votes: 1 ring0_addr: one-corosync } node { name: three nodeid: 3 quorum_votes: 1 ring0_addr: three-corosync } } quorum { provider: corosync_votequorum } totem { cluster_name: testcluster config_version: 8 ip_version: ipv4 secauth: on version: 2 interface { bindnetaddr: 10.10.1.151 ringnumber: 0 } }
- rename the config file
mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
- reboot the first node, and look in the logs if corosync does not throw errors and could make an healthy cluster by itself on the new network.
When something failed look at the troubleshooting section.
- then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process.
An other, but unsupported, way to bring the changes in effect would be to restart all related services, like:
systemctl restart corosync.service systemctl restart pve-cluster.service systemctl restart pvedaemon.service systemctl restart pveproxy.service
Note that a reboot is cleaner and really recommended.
Adding nodes in the future
If you add a new node to the cluster in the future, first configure its own corosync interface the way described above, and edit the /etc/hosts file. You do not need to edit any corosync config file.
Second, use the standard pvecm command with one important addition:
pvecm add <IP addr of a cluster member> -ring0_addr <new nodes ring addr>
This sets the correct ring address in the config. Else you could get in trouble and need to manual intervent.
Redundant Ring Protocol
To be safe when the switch used for corosync fails, also to get faster throughput on the cluster communication - which may be helpful on big setups with a lot of nodes - you can use redundant rings. Those rings must run on two physical separated network, else you won't gain any plus on the High Availability side.
To use it first configure another interface and hostnames for your second ring like described above.
RRP modes
Active replication offers slightly lower latency from transmit to delivery in faulty network environments but with less performance. Passive replication may nearly double the speed of the totem protocol if the protocol doesn't become CPU bound. The final option is none, in which case only one network interface will be used to operate the totem protocol.
On Cluster Creation
The pvecm create command provides the additional parameters '-bindnet1_addr', '-ring1_addr' and '-rrp_mode', those can be used for RRP configuration.
See the bindnetaddr and ringX_addr sections for information about the addresses.
Note, when you only set the ring 1 addresses ring 0 will be set to the default values (local ip address and nodename).
On Running Cluster
Use the same steps described in the Configure corosync section to edit the corosync config.
in the editor adapt the following attributes:
- add a new interface section to the tome section of the config.
- there add "ringnumber: 1" and "bindnetaddr: <ring1bindnet_address>"
- add "ring1_addr: <ring1_hostname>" entries to each node section.
It should look something like:
totem { cluster_name: tweak config_version: 2 ip_version: ipv4 rrp_mode: passive secauth: on version: 2 interface { bindnetaddr: 10.10.1.62 ringnumber: 0 } interface { bindnetaddr: 10.10.3.62 ringnumber: 1 } } nodelist { node { name: pvecm62 nodeid: 1 quorum_votes: 1 ring0_addr: coro0-62 ring1_addr: coro1-62 } node { name: pvecm63 nodeid: 2 quorum_votes: 1 ring0_addr: coro0-63 ring1_addr: coro1-63 } [...] # other cluster nodes here } [...] # other config sections here
rename the config file
mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
reboot the first node, and look in the logs if corosync does not throw errors and could make an healthy cluster by itself on the new network. When something failed look at the troubleshooting section.
- then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process.
Troubleshooting
Known issues
quorum.expected_votes must be configured
If the logs show something like:
[...] corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!' [...]
Your hosts file entry for the corosync hostname and the one in ring0_addr from corosync.conf do not match or could not be resolved.
Fix them up and reboot/restart. If you need to change something in corosync.conf but have no write permissions see Write config when not quorate.
crit: cpg_send_message failed: 9
- If this pops up on only one node restart the pve-cluster service with:
systemctl restart pve-cluster.service
- If that does not solve the problem or it's on all node check your firewall and switch, the may block or not support multicast.
Unknown issues
Ask for support. In the meantime revert back to the backed up corosync.conf. See 'Write config when not quorate' and then overwrite the config with the backup on each node, increase the config versions inside it and give attention that the versions is the same on all nodes. Then reboot the cluster.
Write config when not quorate
If you need to change /etc/pve/corosync.conf on an node with no quorum, and you know what you do, use:
systemctl stop pve-cluster pmxcfs -l
to start the pmxcfs in a local mode. You have now write access, so you need to be very careful with changes!
After restarting the filesystem should merge changes, if there is no big merge conflict that could result in a split brain.