Separate Cluster Network: Difference between revisions
No edit summary |
|||
(18 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= Introduction = | {{Note|Article about old Proxmox VE 4.x and 5.x releases. Starting from Proxmox VE 6.x this is part of the reference documentation see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy}} | ||
== Introduction == | |||
It is good practice to use a separate network for corosync, which handles the cluster communication in Proxmox VE. It is one of the most important part in an fault tolerant (HA) system and other network traffic may disturb corosync. Storage communication should '''never''' be on the same network as corosync! | It is good practice to use a separate network for corosync, which handles the cluster communication in Proxmox VE. It is one of the most important part in an fault tolerant (HA) system and other network traffic may disturb corosync. Storage communication should '''never''' be on the same network as corosync! | ||
Line 6: | Line 8: | ||
This article shows you a way to use a completely separated corosync network in Proxmox VE 4.0, version 4.0-23 of the pve-cluster package is recommended. | This article shows you a way to use a completely separated corosync network in Proxmox VE 4.0, version 4.0-23 of the pve-cluster package is recommended. | ||
= Prerequisites = | == Prerequisites == | ||
This HowTo uses a three node cluster with the nodes called 'one', 'two', 'three'. | This HowTo uses a three node cluster with the nodes called 'one', 'two', 'three'. | ||
Line 13: | Line 15: | ||
Reading through the corosync.conf manual entry is a good idea to get some hints and to see which options does what. | Reading through the corosync.conf manual entry is a good idea to get some hints and to see which options does what. | ||
man corosync.conf | man corosync.conf | ||
We distinguish two cases, one when we want to use one separated network: | We distinguish two cases, one when we want to use one separated network: | ||
Line 19: | Line 20: | ||
* when we already have a running cluster | * when we already have a running cluster | ||
= Shared Steps = | == Shared Steps == | ||
'''Note''': back up ''/etc/pve/corosync.conf'' (not existent if the cluster wasn't created) and ''/etc/hosts'' from each node, that lets you revert back when something bad happened. | '''Note''': back up ''/etc/pve/corosync.conf'' (not existent if the cluster wasn't created) and ''/etc/hosts'' from each node, that lets you revert back when something bad happened. | ||
Changes to ''/etc/pve/corosync.conf'' will '''immediately''' propagate to all nodes and trigger a corosync config reload, if the reload fails the old config remains in use. | Changes to ''/etc/pve/corosync.conf'' will '''immediately''' propagate to all nodes and trigger a corosync config reload, if the reload fails the old config remains in use. | ||
== Configure interfaces == | === Configure interfaces === | ||
Build up an static network by editing /etc/network/interfaces, see the example of one node below. | Build up an static network by editing /etc/network/interfaces, see the example of one node below. | ||
Line 36: | Line 37: | ||
Restart the network and see if you can ping each node on the new network, be sure that multicast works and is not blocked by the firewall. | Restart the network and see if you can ping each node on the new network, be sure that multicast works and is not blocked by the firewall. | ||
== Configure hosts file == | === Configure hosts file === | ||
Now configure the /etc/hosts file so that we can use hostnames in the corosync config. This isn't strictly necessary you can also set the addresses directly but helps to keep the overview and is considered as good practice. | Now configure the /etc/hosts file so that we can use hostnames in the corosync config. This isn't strictly necessary you can also set the addresses directly but helps to keep the overview and is considered as good practice. | ||
Line 52: | Line 53: | ||
[...] | [...] | ||
= Setup at Cluster Creation = | == Setup at Cluster Creation == | ||
Since version 4.0-23 of the pve-cluster package we have built in support for creating the cluster with separate corosync ring(s) on own networks. | Since version 4.0-23 of the pve-cluster package we have built in support for creating the cluster with separate corosync ring(s) on own networks. | ||
If you're running a earlier version please update your system first. | If you're running a earlier version please update your system first. | ||
== bindnetaddr == | === bindnetaddr === | ||
This specifies the network address the corosync executive should bind to. bindnetaddr should be an IP address configured on the system, or a network address. | This specifies the network address the corosync executive should bind to. bindnetaddr should be an IP address configured on the system, or a network address. | ||
For example, if the local interface is 192.168.5.151 with netmask 255.255.255.0, you should set bindnetaddr to 192.168.5.151 or 192.168.5.0. | For example, if the local interface is 192.168.5.151 with netmask 255.255.255.0, you should set bindnetaddr to 192.168.5.151 or 192.168.5.0. | ||
Line 64: | Line 65: | ||
Note that a FQDN/hostname isn't allowed here, use a 'real' IP address. | Note that a FQDN/hostname isn't allowed here, use a 'real' IP address. | ||
== ringX_addr == | :'''Note''': if you are setting a cluster with unicast, in most situations the network mask /24 will create an error. See [[Troubleshooting_multicast,_quorum_and_cluster_issues#Use_unicast_.28UDPU.29_instead_of_multicast.2C_if_all_else_fails|Use Unicast]] | ||
=== ringX_addr === | |||
Hostname (or IP) of the corosync ringX (X can be 0 or 1) address of this node. There can be also two rings, see [[#Redundant Ring Protocol|Redundant Ring Protocol]] for setup instructions. | Hostname (or IP) of the corosync ringX (X can be 0 or 1) address of this node. There can be also two rings, see [[#Redundant Ring Protocol|Redundant Ring Protocol]] for setup instructions. | ||
Normally | Normally there for corosync defined hostname from the /etc/hosts file for that. | ||
=== Final Command === | |||
* bindnetaddr: 10.10. | |||
In our example the following parameters would be used when creating the cluster on the node named 'one': | |||
* bindnetaddr: 10.10.1.151 | |||
* ring0_adress: one-corosync | * ring0_adress: one-corosync | ||
pvecm create <clustername> -bindnet0_addr 10.10. | pvecm create <clustername> -bindnet0_addr 10.10.1.151 -ring0_addr one-corosync | ||
= Setup on a Running Cluster = | == Setup on a Running Cluster == | ||
Needs pve-cluster in version 4.0-23 to properly work. | Needs pve-cluster in version 4.0-23 to properly work. | ||
Line 83: | Line 87: | ||
'''Note''' look for 'no-reboot' way | '''Note''' look for 'no-reboot' way | ||
== Configure corosync == | === Configure corosync === | ||
* First copy the current corosync config: | * First copy the current corosync config: | ||
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new | cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new | ||
Line 94: | Line 98: | ||
** if not already there, add an "name: <nodename>" entry to each node {} section. | ** if not already there, add an "name: <nodename>" entry to each node {} section. | ||
** ring0_addr from every node entry, change it to the new defined hostnames from /etc/hosts. | ** ring0_addr from every node entry, change it to the new defined hostnames from /etc/hosts. | ||
** bindnetaddr in the totem entry. Change it to the matching IP from the separate network, (e.g. in our case I use the node with nodeid 1 and change 192.168.15.151 to 10.10.1.151) | ** bindnetaddr in the totem entry. Change it to the matching IP from the separate network, (e.g. in our case I use the node with nodeid 1 and change 192.168.15.151 to 10.10.1.151) [''If you are using unicast, remember checking [[Troubleshooting_multicast,_quorum_and_cluster_issues#Use_unicast_.28UDPU.29_instead_of_multicast.2C_if_all_else_fails|Use Unicast]]'' ] | ||
** config_version: increase it, '''very important''', you can write any number which is '''higher''' then the actual one, but you '''need''' to increase it. | ** config_version: increase it, '''very important''', you can write any number which is '''higher''' then the actual one, but you '''need''' to increase it. | ||
Line 159: | Line 163: | ||
Note that a reboot is cleaner and '''really''' recommended. | Note that a reboot is cleaner and '''really''' recommended. | ||
= Adding nodes in the future = | == Adding nodes in the future == | ||
If you add a new node to the cluster in the future, first configure its own corosync interface the way described above, and edit the /etc/hosts file. | If you add a new node to the cluster in the future, first configure its own corosync interface the way described above, and edit the /etc/hosts file. | ||
Line 170: | Line 174: | ||
This sets the correct ring address in the config. Else you could get in trouble and need to manual intervent. | This sets the correct ring address in the config. Else you could get in trouble and need to manual intervent. | ||
= Redundant Ring Protocol = | == Redundant Ring Protocol == | ||
To be safe when the switch used for corosync fails, also to get faster throughput on the cluster communication - which may be helpful on big setups with a lot of nodes - you can use redundant rings. | To be safe when the switch used for corosync fails, also to get faster throughput on the cluster communication - which may be helpful on big setups with a lot of nodes - you can use redundant rings. | ||
Line 178: | Line 182: | ||
== RRP modes == | == RRP modes == | ||
<b>Note:</b> Active mode is not completely stable, yet. <b>Always use passive mode for production use.</b> | |||
Active replication offers slightly lower latency from transmit to delivery in faulty network environments but with less performance. Passive replication may nearly double the speed of the totem protocol if the protocol doesn't become CPU bound. The final option is none, in which case only one network interface will be used to operate the totem protocol. | Citing the corosync.conf man page: | ||
Active replication offers slightly lower latency from transmit to delivery in faulty network environments but with less performance. | |||
Passive replication may nearly double the speed of the totem protocol if the protocol doesn't become CPU bound. | |||
The final option is none, in which case only one network interface will be used to operate the totem protocol. | |||
== On Cluster Creation == | === On Cluster Creation === | ||
The ''pvecm create'' command provides the additional parameters '-bindnet1_addr', '-ring1_addr' and '-rrp_mode', those can be used for RRP configuration. | The ''pvecm create'' command provides the additional parameters '-bindnet1_addr', '-ring1_addr' and '-rrp_mode', those can be used for RRP configuration. | ||
Line 189: | Line 197: | ||
Note, when you only set the ring 1 addresses ring 0 will be set to the default values (local ip address and nodename). | Note, when you only set the ring 1 addresses ring 0 will be set to the default values (local ip address and nodename). | ||
== On Running Cluster == | === On Running Cluster === | ||
Use the same steps described in the [[#Configure corosync|Configure corosync]] section to edit the corosync config. | Use the same steps described in the [[#Configure corosync|Configure corosync]] section to edit the corosync config. | ||
In the editor adapt the following attributes: | |||
* add a new interface section to the | * add a new interface section to the totem section of the config. | ||
** | ** then add "ringnumber: 1" and "bindnetaddr: <ring1bindnet_address>" | ||
* add "ring1_addr: <ring1_hostname>" entries to each node section. | * add "ring1_addr: <ring1_hostname>" entries to each node section. | ||
Line 247: | Line 255: | ||
* then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process. | * then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process. | ||
== Troubleshooting == | |||
= | === Known issues === | ||
==== quorum.expected_votes must be configured ==== | |||
== Known issues == | |||
=== quorum.expected_votes must be configured === | |||
If the logs show something like: | If the logs show something like: | ||
[...] | [...] | ||
Line 262: | Line 269: | ||
Fix them up and reboot/restart. If you need to change something in corosync.conf but have no write permissions see [[#Write config when not quorate|Write config when not quorate]]. | Fix them up and reboot/restart. If you need to change something in corosync.conf but have no write permissions see [[#Write config when not quorate|Write config when not quorate]]. | ||
=== crit: cpg_send_message failed: 9 === | ==== crit: cpg_send_message failed: 9 ==== | ||
* If this pops up on only one node restart the pve-cluster service with: | * If this pops up on only one node restart the pve-cluster service with: | ||
Line 269: | Line 276: | ||
* If that does not solve the problem or it's on all node check your firewall and switch, the may block or not support multicast. | * If that does not solve the problem or it's on all node check your firewall and switch, the may block or not support multicast. | ||
== Unknown issues == | Also your may have a switch with IGMP snooping enabled but no active multicast querier in the network. Install such a multicast querier or disable IGMP Snooping on the switch. | ||
Installing a IGMP querier is recommended, as it boosts the performance of the network and multicast itself. | |||
=== Unknown issues === | |||
Ask for support. In the meantime revert back to the backed up corosync.conf. See 'Write config when not quorate' and then overwrite the config with the backup on each node, '''increase''' the config versions inside it and give attention that the versions is the same on all nodes. Then reboot the cluster. | Ask for support. In the meantime revert back to the backed up corosync.conf. See 'Write config when not quorate' and then overwrite the config with the backup on each node, '''increase''' the config versions inside it and give attention that the versions is the same on all nodes. Then reboot the cluster. | ||
== Write config when not quorate == | === Write config when not quorate === | ||
If you need to change /etc/pve/corosync.conf on an node with no quorum, and you '''know''' what you do, use: | If you need to change /etc/pve/corosync.conf on an node with no quorum, and you '''know''' what you do, use: | ||
pvecm expected 1 | pvecm expected 1 | ||
Line 284: | Line 294: | ||
After restarting the filesystem should merge changes, if there is no big merge conflict that could result in a split brain. | After restarting the filesystem should merge changes, if there is no big merge conflict that could result in a split brain. | ||
[[Category: HOWTO]] | [[Category: HOWTO]] | ||
[[Category: High Availability]] | |||
[[Category: Cluster]] |
Latest revision as of 13:24, 20 August 2019
Note: Article about old Proxmox VE 4.x and 5.x releases. Starting from Proxmox VE 6.x this is part of the reference documentation see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy |
Introduction
It is good practice to use a separate network for corosync, which handles the cluster communication in Proxmox VE. It is one of the most important part in an fault tolerant (HA) system and other network traffic may disturb corosync. Storage communication should never be on the same network as corosync!
Also good practice is to add redundancy to your Cluster Network. This can be done by using RRP in combination with two physical separated networks. Besides the obvious benefits that the cluster still works on a switch failure, also the maintenance of your systems becomes easier. A firmware upgrade of a switch, for example, can be done on a running cluster with no downtime, as the other ring still handles the traffic in the time between.
This article shows you a way to use a completely separated corosync network in Proxmox VE 4.0, version 4.0-23 of the pve-cluster package is recommended.
Prerequisites
This HowTo uses a three node cluster with the nodes called 'one', 'two', 'three'.
An own NIC and an own (gigabit, although 100Mbit should be sufficient) switch for corosync is used. The NIC is configured on the eth1 interface and the network is 10.10.1.0/24
Reading through the corosync.conf manual entry is a good idea to get some hints and to see which options does what.
man corosync.conf
We distinguish two cases, one when we want to use one separated network:
- from the beginning, i.e. at cluster creation time
- when we already have a running cluster
Note: back up /etc/pve/corosync.conf (not existent if the cluster wasn't created) and /etc/hosts from each node, that lets you revert back when something bad happened. Changes to /etc/pve/corosync.conf will immediately propagate to all nodes and trigger a corosync config reload, if the reload fails the old config remains in use.
Configure interfaces
Build up an static network by editing /etc/network/interfaces, see the example of one node below.
auto eth1 iface eth1 inet static address 10.10.1.151 netmask 255.255.255.0
Do that on every node, change the address respectively. (in this example we use *.151 *.152 and *.153 as they mirror the endings of the interface/VM traffic IPs).
Restart the network and see if you can ping each node on the new network, be sure that multicast works and is not blocked by the firewall.
Configure hosts file
Now configure the /etc/hosts file so that we can use hostnames in the corosync config. This isn't strictly necessary you can also set the addresses directly but helps to keep the overview and is considered as good practice. Note that I added entries for the other nodes too, this isn't necessary but good practice as we can resolve them faster.
127.0.0.1 localhost.localdomain localhost 192.168.15.151 one.proxmox.com one pvelocalhost # corosync network hosts 10.10.1.151 one-corosync.proxmox.com one-corosync 10.10.1.152 two-corosync.proxmox.com two-corosync 10.10.1.153 three-corosync.proxmox.com three-corosync # The following lines are desirable for IPv6 capable hosts [...]
Setup at Cluster Creation
Since version 4.0-23 of the pve-cluster package we have built in support for creating the cluster with separate corosync ring(s) on own networks. If you're running a earlier version please update your system first.
bindnetaddr
This specifies the network address the corosync executive should bind to. bindnetaddr should be an IP address configured on the system, or a network address. For example, if the local interface is 192.168.5.151 with netmask 255.255.255.0, you should set bindnetaddr to 192.168.5.151 or 192.168.5.0. If the local interface is 192.168.5.151 with netmask 255.255.255.192, set bindnetaddr to 192.168.5.151 or 192.168.5.128, and so forth. This may also be an IPV6 address, in which case IPV6 networking will be used. In this case, the exact address must be specified and there is no automatic selection of the network interface within a specific subnet as with IPv4.
Note that a FQDN/hostname isn't allowed here, use a 'real' IP address.
- Note: if you are setting a cluster with unicast, in most situations the network mask /24 will create an error. See Use Unicast
ringX_addr
Hostname (or IP) of the corosync ringX (X can be 0 or 1) address of this node. There can be also two rings, see Redundant Ring Protocol for setup instructions.
Normally there for corosync defined hostname from the /etc/hosts file for that.
Final Command
In our example the following parameters would be used when creating the cluster on the node named 'one':
- bindnetaddr: 10.10.1.151
- ring0_adress: one-corosync
pvecm create <clustername> -bindnet0_addr 10.10.1.151 -ring0_addr one-corosync
Setup on a Running Cluster
Needs pve-cluster in version 4.0-23 to properly work.
Note that a whole cluster reboot is needed to make this changes on a running cluster. Note look for 'no-reboot' way
Configure corosync
- First copy the current corosync config:
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
- Then edit the copied file with your favorite editor, or use nano as it is available on every Proxmox VE node by default:
nano /etc/pve/corosync.conf.new
- in the editor adapt the following attributes:
- if not already there, add an "name: <nodename>" entry to each node {} section.
- ring0_addr from every node entry, change it to the new defined hostnames from /etc/hosts.
- bindnetaddr in the totem entry. Change it to the matching IP from the separate network, (e.g. in our case I use the node with nodeid 1 and change 192.168.15.151 to 10.10.1.151) [If you are using unicast, remember checking Use Unicast ]
- config_version: increase it, very important, you can write any number which is higher then the actual one, but you need to increase it.
Here is an example how it could look:
logging { debug: off to_syslog: yes } nodelist { node { name: two nodeid: 2 quorum_votes: 1 ring0_addr: two-corosync } node { name: one nodeid: 1 quorum_votes: 1 ring0_addr: one-corosync } node { name: three nodeid: 3 quorum_votes: 1 ring0_addr: three-corosync } } quorum { provider: corosync_votequorum } totem { cluster_name: testcluster config_version: 8 ip_version: ipv4 secauth: on version: 2 interface { bindnetaddr: 10.10.1.151 ringnumber: 0 } }
- rename the config file
mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
- reboot the first node, and look in the logs if corosync does not throw errors and could make an healthy cluster by itself on the new network.
When something failed look at the troubleshooting section.
- then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process.
An other, but unsupported, way to bring the changes in effect would be to restart all related services, like:
systemctl restart corosync.service systemctl restart pve-cluster.service systemctl restart pvedaemon.service systemctl restart pveproxy.service
Note that a reboot is cleaner and really recommended.
Adding nodes in the future
If you add a new node to the cluster in the future, first configure its own corosync interface the way described above, and edit the /etc/hosts file. You do not need to edit any corosync config file.
Second, use the standard pvecm command with one important addition:
pvecm add <IP addr of a cluster member> -ring0_addr <new nodes ring addr>
This sets the correct ring address in the config. Else you could get in trouble and need to manual intervent.
Redundant Ring Protocol
To be safe when the switch used for corosync fails, also to get faster throughput on the cluster communication - which may be helpful on big setups with a lot of nodes - you can use redundant rings. Those rings must run on two physical separated network, else you won't gain any plus on the High Availability side.
To use it first configure another interface and hostnames for your second ring like described above.
RRP modes
Note: Active mode is not completely stable, yet. Always use passive mode for production use.
Citing the corosync.conf man page:
Active replication offers slightly lower latency from transmit to delivery in faulty network environments but with less performance. Passive replication may nearly double the speed of the totem protocol if the protocol doesn't become CPU bound. The final option is none, in which case only one network interface will be used to operate the totem protocol.
On Cluster Creation
The pvecm create command provides the additional parameters '-bindnet1_addr', '-ring1_addr' and '-rrp_mode', those can be used for RRP configuration.
See the bindnetaddr and ringX_addr sections for information about the addresses.
Note, when you only set the ring 1 addresses ring 0 will be set to the default values (local ip address and nodename).
On Running Cluster
Use the same steps described in the Configure corosync section to edit the corosync config.
In the editor adapt the following attributes:
- add a new interface section to the totem section of the config.
- then add "ringnumber: 1" and "bindnetaddr: <ring1bindnet_address>"
- add "ring1_addr: <ring1_hostname>" entries to each node section.
It should look something like:
totem { cluster_name: tweak config_version: 2 ip_version: ipv4 rrp_mode: passive secauth: on version: 2 interface { bindnetaddr: 10.10.1.62 ringnumber: 0 } interface { bindnetaddr: 10.10.3.62 ringnumber: 1 } } nodelist { node { name: pvecm62 nodeid: 1 quorum_votes: 1 ring0_addr: coro0-62 ring1_addr: coro1-62 } node { name: pvecm63 nodeid: 2 quorum_votes: 1 ring0_addr: coro0-63 ring1_addr: coro1-63 } [...] # other cluster nodes here } [...] # other config sections here
rename the config file
mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
reboot the first node, and look in the logs if corosync does not throw errors and could make an healthy cluster by itself on the new network. When something failed look at the troubleshooting section.
- then reboot every node, one after the other, if HA is enabled reboot the node which is the current HA master at last, to speed up the process.
Troubleshooting
Known issues
quorum.expected_votes must be configured
If the logs show something like:
[...] corosync[1647]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!' [...]
Your hosts file entry for the corosync hostname and the one in ring0_addr from corosync.conf do not match or could not be resolved.
Fix them up and reboot/restart. If you need to change something in corosync.conf but have no write permissions see Write config when not quorate.
crit: cpg_send_message failed: 9
- If this pops up on only one node restart the pve-cluster service with:
systemctl restart pve-cluster.service
- If that does not solve the problem or it's on all node check your firewall and switch, the may block or not support multicast.
Also your may have a switch with IGMP snooping enabled but no active multicast querier in the network. Install such a multicast querier or disable IGMP Snooping on the switch. Installing a IGMP querier is recommended, as it boosts the performance of the network and multicast itself.
Unknown issues
Ask for support. In the meantime revert back to the backed up corosync.conf. See 'Write config when not quorate' and then overwrite the config with the backup on each node, increase the config versions inside it and give attention that the versions is the same on all nodes. Then reboot the cluster.
Write config when not quorate
If you need to change /etc/pve/corosync.conf on an node with no quorum, and you know what you do, use:
pvecm expected 1
to set the expected vote count to 1. This makes the cluster quorate and you can fix your config, or revert it back to the back up.
If that wasn't enough (e.g.: corosync is dead) use:
systemctl stop pve-cluster pmxcfs -l
to start the pmxcfs in a local mode. You have now write access, so you need to be very careful with changes!
After restarting the filesystem should merge changes, if there is no big merge conflict that could result in a split brain.