Open vSwitch: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
(update config to ifupdown2 "auto ...")
 
(19 intermediate revisions by 6 users not shown)
Line 1: Line 1:
Open vSwitch is an alternative to Linux native bridges, bonds, and vlan interfaces. It is designed with virtualized environments in mind and is recommended to ease deployments.
Open vSwitch (openvswitch, OVS) is an alternative to Linux native bridges, bonds, and vlan interfaces.  
Open vSwitch supports most of the features you would find on a physical switch, providing some advanced features like RSTP support, VXLANs, OpenFlow, and supports multiple vlans on a single bridge.
If you need these features, it makes sense to switch to Open vSwitch.


== Installation ==
== Installation ==
* Install the Open vSwitch packages
Update the package index and then install the Open vSwitch packages by executing:
<nowiki>
  apt update
apt-get install openvswitch-switch
  apt install openvswitch-switch
</nowiki>


== Configuration ==
== Configuration ==
Official reference here, though a bit bare: http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=debian/openvswitch-switch.README.Debian;hb=HEAD
Official reference here, though a bit bare: https://github.com/openvswitch/ovs/blob/master/debian/openvswitch-switch.README.Debian


=== Overview ===
=== Overview ===
Line 17: Line 18:


It should be noted that it is recommended that the bridge is bound to a trunk port with no untagged vlans; this means that your bridge itself will never have an ip address.  If you need to work with untagged traffic coming into the bridge, it is recommended you tag it (assign it to a vlan) on the originating interface before entering the bridge (though you can assign an IP address on the bridge directly for that untagged data, it is not recommended).  You can split out your tagged VLANs using virtual interfaces (OVSIntPort) if you need access to those vlans from your local host.  Proxmox will assign the guest VMs a tap interface associated with a vlan, so you do NOT need a bridge per vlan (such as classic linux networking requires).  You should think of your OVSBridge much like a physical hardware switch.
It should be noted that it is recommended that the bridge is bound to a trunk port with no untagged vlans; this means that your bridge itself will never have an ip address.  If you need to work with untagged traffic coming into the bridge, it is recommended you tag it (assign it to a vlan) on the originating interface before entering the bridge (though you can assign an IP address on the bridge directly for that untagged data, it is not recommended).  You can split out your tagged VLANs using virtual interfaces (OVSIntPort) if you need access to those vlans from your local host.  Proxmox will assign the guest VMs a tap interface associated with a vlan, so you do NOT need a bridge per vlan (such as classic linux networking requires).  You should think of your OVSBridge much like a physical hardware switch.
When configuring a bridge, in /etc/network/interfaces, prefix the bridge interface definition with allow-ovs $iface.  For instance, a simple bridge containing a single interface would look like:
<nowiki>
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports eth0
</nowiki>


Remember, if you want to split out vlans with ips for use on the local host, you should use OVSIntPorts, see sections to follow.
Remember, if you want to split out vlans with ips for use on the local host, you should use OVSIntPorts, see sections to follow.


However, any interfaces (Physical, OVSBonds, or OVSIntPorts) associated with a bridge should have their definitions prefixed with allow-$brname $iface, e.g.  allow-vmbr0 bond0


'''NOTE''': All interfaces must be listed under ovs_ports that are part of the bridge even if you have a port definition (e.g. OVSIntPort) that cross-references the bridge!!!
'''NOTE''': All interfaces must be listed under ovs_ports that are part of the bridge even if you have a port definition (e.g. OVSIntPort) that cross-references the bridge!!!
Line 38: Line 29:
When configuring a bond, it is recommended to use LACP (aka 802.3ad) for link aggregation.  This requires switch support on the other end.  A simple bond using eth0 and eth1 that will be part of the vmbr0 bridge might look like this.
When configuring a bond, it is recommended to use LACP (aka 802.3ad) for link aggregation.  This requires switch support on the other end.  A simple bond using eth0 and eth1 that will be part of the vmbr0 bridge might look like this.
  <nowiki>
  <nowiki>
allow-vmbr0 bond0
auto bond0
iface ovsbond inet manual
iface bond0 inet manual
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_type OVSBond
   ovs_type OVSBond
Line 51: Line 42:
In order for the host (e.g. proxmox host, not VMs themselves!) to utilize a vlan within the bridge, you must create OVSIntPorts.  These split out a virtual interface in the specified vlan that you can assign an ip address to (or use DHCP).  You need to set ovs_options tag=$VLAN  to let OVS know what vlan the interface should be a part of.  In the switch world, this is commonly referred to as an RVI (Routed Virtual Interface), or IRB (Integrated Routing and Bridging) interface.
In order for the host (e.g. proxmox host, not VMs themselves!) to utilize a vlan within the bridge, you must create OVSIntPorts.  These split out a virtual interface in the specified vlan that you can assign an ip address to (or use DHCP).  You need to set ovs_options tag=$VLAN  to let OVS know what vlan the interface should be a part of.  In the switch world, this is commonly referred to as an RVI (Routed Virtual Interface), or IRB (Integrated Routing and Bridging) interface.


'''IMPORTANT''': These OVSIntPorts you create MUST also show up in the actual bridge definition under ovs_ports.  If they do not, they will NOT be brought up even though you specified an ovs_bridge.  You also need to prefix the definition with allow-$bridge $iface
'''IMPORTANT''': These OVSIntPorts you create MUST also show up in the actual bridge definition under ovs_ports.  If they do not, they will NOT be brought up even though you specified an ovs_bridge.   


Setting up this vlan port would look like this in /etc/network/interfaces:
Setting up this vlan port would look like this in /etc/network/interfaces:
  <nowiki>
  <nowiki>
allow-vmbr0 vlan50
auto vlan50
iface vlan50 inet static
iface vlan50 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=50
   ovs_options tag=50
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.50.10.44
   address 10.50.10.44
   netmask 255.255.255.0
   netmask 255.255.255.0
Line 66: Line 56:
</nowiki>
</nowiki>


==== Spanning Tree (STP) ====
==== Rapid Spanning Tree (RSTP) ====
Open vSwitch supports the classic Spanning Tree Protocol, but is disabled by default.  Spanning Tree is a network protocol used to prevent loops in a bridged Ethernet local area network. The newer Rapid Spanning Tree Protocol (RSTP) is not yet supported in the version of Open vSwitch shipped with ProxMox (v2.3), that feature was added in v2.4.
Open vSwitch supports the Rapid Spanning Tree Protocol, but is disabled by default.  Rapid Spanning Tree is a network protocol used to prevent loops in a bridged Ethernet local area network.  
 
'''WARNING:''' The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.


In order to configure a bridge for STP support, you must use an "up" script as the "ovs_options" and "ovs_extras" options do not emit the proper commands.  An example would be to add this to your "vmbr0" interface configuration:
In order to configure a bridge for RSTP support, you must use an "up" script as the "ovs_options" and "ovs_extras" options do not emit the proper commands.  An example would be to add this to your "vmbr0" interface configuration:
  <nowiki>up ovs-vsctl set Bridge ${IFACE} stp_enable=true</nowiki>
  <nowiki>up ovs-vsctl set Bridge ${IFACE} rstp_enable=true</nowiki>
It may be wise to also set a "post-up" script that sleeps for 20 or so seconds waiting on STP convergence before boot continues.
It may be wise to also set a "post-up" script that sleeps for 10 or so seconds waiting on RSTP convergence before boot continues.


Other options that may be set are:
Other bridge options that may be set are:
* other_config:stp-priority=  Configures the root bridge priority, the lower the value the more likely to become the root bridge.  It is recommended to set this to the maximum value of 0xFFFF to prevent Open vSwitch from becoming the root bridge.  The default value is 0x8000
* other_config:rstp-priority=  Configures the root bridge priority, the lower the value the more likely to become the root bridge.  It is recommended to set this to the maximum value of 0xFFFF to prevent Open vSwitch from becoming the root bridge.  The default value is 0x8000
* other_config:stp-forward-delay= The amount of time the bridge will sit in learning mode before entering a forwarding state.  Range is 4-30, Default 15
* other_config:rstp-forward-delay= The amount of time the bridge will sit in learning mode before entering a forwarding state.  Range is 4-30, Default 15
* other_config:stp-max-age= Range is 6-40, Default 20
* other_config:rstp-max-age= Range is 6-40, Default 20
* other_config:stp-hello-time= Range is 1-10


You should also consider adding a cost value to all interfaces that are part of a bridge.  You can do so in the ethX interface configuration:
You should also consider adding a cost value to all interfaces that are part of a bridge.  You can do so in the ethX interface configuration:
  <nowiki>ovs_options other_config:stp-path-cost=100</nowiki>
  <nowiki>ovs_options other_config:rstp-path-cost=20000</nowiki>
 
Interface options that may be set via ovs_options are:
* other_config:rstp-path-cost= Default 2000 for 10GbE, 20000 for 1GbE
* other_config:rstp-port-admin-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering forwarding state if no BDPUs are detected
* other_config:rstp-port-auto-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering a forwarding state if no BDPUs are detected
* other_config:rstp-port-mcheck= Set to True if the other end is known to be using RSTP and not STP, will broadcast BDPUs immediately on link detection


You can look at the 'state' of each interface which should indicate if the port is in forwarding or blocking mode for STP via:
You can look at the RSTP status for an interface via:
  <nowiki>ovs-ofctl show vmbr0</nowiki>
  <nowiki>ovs-vsctl get Port eth0 rstp_status</nowiki>


NOTE: Open vSwitch does not currently allow a bond to participate in STP.
NOTE: Open vSwitch does not currently allow a bond to participate in RSTP.


==== Note on MTU ====
==== Note on MTU ====
If you plan on using a MTU larger than the default of 1500, you need to mark any physical interfaces, bonds, and bridges with a larger MTU by adding an mtu setting to the definition such as  mtu 9000 otherwise it will be disallowed. However, you should NOT create definitions for your physical interfaces that are part of a bond, instead at the bond layer, you should use a pre-up script such as
If you plan on using a MTU larger than the default of 1500, you need to mark any physical interfaces, bonds, and bridges with a larger MTU by adding an mtu setting to the definition such as  mtu 9000 otherwise it will be disallowed.  
 
  <nowiki>
  <nowiki>
pre-up ( ifconfig eth0 mtu 9000 && ifconfig eth1 mtu 9000 )
 
auto eth0
iface eth0 inet manual
    ovs_mtu 9000
 
#auto eth1
auto eth1
iface eth1 inet manual
    ovs_mtu 9000
 
# Interface bond0
 
auto bond0
iface bond0 inet manual
    ovs_bridge br-ex
    ovs_type OVSBond
    ovs_bonds eth0 eth1
    ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
    ovs_mtu 9000
 
auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_mtu 9000
 
</nowiki>
</nowiki>


If you instead create entries in /etc/network/interfaces for those physical interfaces and set the MTU there, then that MTU will propagate to EVERY child.  That means you wouldn't be able to configure OVSIntPorts with an mtu of 1500.


'''Odd Note''': Some newer Intel Gigabit NICs have a hardware limitation which means the maximum MTU they can support is 8996 (instead of 9000).  If your interfaces aren't coming up and you are trying to use 9000, this is likely the reason and can be difficult to debug.  Try setting all your MTUs to 8996 and see if it resolves your issues.
'''Odd Note''': Some newer Intel Gigabit NICs have a hardware limitation which means the maximum MTU they can support is 8996 (instead of 9000).  If your interfaces aren't coming up and you are trying to use 9000, this is likely the reason and can be difficult to debug.  Try setting all your MTUs to 8996 and see if it resolves your issues.
Line 110: Line 131:
# also attach to this bridge)
# also attach to this bridge)
auto vmbr0
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
iface vmbr0 inet manual
   ovs_type OVSBridge
   ovs_type OVSBridge
Line 117: Line 137:
   #      kind of cross-referencing but it won't work without it!
   #      kind of cross-referencing but it won't work without it!
   ovs_ports eth0 vlan1 vlan55
   ovs_ports eth0 vlan1 vlan55
   mtu 9000
   ovs_mtu 9000


# Physical interface for traffic coming into the system.  Retag untagged
# Physical interface for traffic coming into the system.  Retag untagged
# traffic into vlan 1, but pass through other tags.
# traffic into vlan 1, but pass through other tags.
auto eth0
auto eth0
allow-vmbr0 eth0
iface eth0 inet manual
iface eth0 inet manual
   ovs_bridge vmbr0
   ovs_bridge vmbr0
Line 130: Line 149:
# you could use:
# you could use:
# ovs_options tag=1 vlan_mode=native-untagged trunks=10,20,30,40
# ovs_options tag=1 vlan_mode=native-untagged trunks=10,20,30,40
   mtu 9000
   ovs_mtu 9000


# Virtual interface to take advantage of originally untagged traffic
# Virtual interface to take advantage of originally untagged traffic
allow-vmbr0 vlan1
auto vlan1
iface vlan1 inet static
iface vlan1 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=1
   ovs_options tag=1
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.50.10.44
   address 10.50.10.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   gateway 10.50.10.1
   gateway 10.50.10.1
   mtu 1500
   ovs_mtu 1500


# Ceph cluster communication vlan (jumbo frames)
# Ceph cluster communication vlan (jumbo frames)
allow-vmbr0 vlan55
auto vlan55
iface vlan55 inet static
iface vlan55 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=55
   ovs_options tag=55
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.55.10.44
   address 10.55.10.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   mtu 9000
   ovs_mtu 9000
</nowiki>
</nowiki>


Line 166: Line 183:


# Bond eth0 and eth1 together
# Bond eth0 and eth1 together
allow-vmbr0 bond0
auto eth0
iface eth0 inet manual
    ovs_mtu 9000
 
auto eth1
iface eth1 inet manual
    ovs_mtu 9000
 
auto bond0
iface bond0 inet manual
iface bond0 inet manual
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_type OVSBond
   ovs_type OVSBond
   ovs_bonds eth0 eth1
   ovs_bonds eth0 eth1
  # Force the MTU of the physical interfaces to be jumbo-frame capable.
  # This doesn't mean that any OVSIntPorts must be jumbo-capable. 
  # We cannot, however set up definitions for eth0 and eth1 directly due
  # to what appear to be bugs in the initialization process.
  pre-up ( ifconfig eth0 mtu 9000 && ifconfig eth1 mtu 9000 )
   ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
   ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
   mtu 9000
   ovs_mtu 9000


# Bridge for our bond and vlan virtual interfaces (our VMs will
# Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
# also attach to this bridge)
auto vmbr0
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
iface vmbr0 inet manual
   ovs_type OVSBridge
   ovs_type OVSBridge
Line 189: Line 208:
   #      kind of cross-referencing but it won't work without it!
   #      kind of cross-referencing but it won't work without it!
   ovs_ports bond0 vlan50 vlan55
   ovs_ports bond0 vlan50 vlan55
   mtu 9000
   ovs_mtu 9000


# Proxmox cluster communication vlan
# Proxmox cluster communication vlan
allow-vmbr0 vlan50
auto vlan50
iface vlan50 inet static
iface vlan50 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=50
   ovs_options tag=50
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.50.10.44
   address 10.50.10.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   gateway 10.50.10.1
   gateway 10.50.10.1
   mtu 1500
   ovs_mtu 1500


# Ceph cluster communication vlan (jumbo frames)
# Ceph cluster communication vlan (jumbo frames)
allow-vmbr0 vlan55
auto vlan55
iface vlan55 inet static
iface vlan55 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=55
   ovs_options tag=55
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.55.10.44
   address 10.55.10.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   mtu 9000
   ovs_mtu 9000
</nowiki>
</nowiki>


Line 225: Line 242:


# Bond eth0 and eth1 together
# Bond eth0 and eth1 together
allow-vmbr0 bond0
auto bond0
iface bond0 inet manual
iface bond0 inet manual
ovs_bridge vmbr0
ovs_bridge vmbr0
Line 235: Line 252:
# also attach to this bridge)
# also attach to this bridge)
auto vmbr0
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_type OVSBridge
Line 241: Line 257:


# Virtual interface to take advantage of originally untagged traffic
# Virtual interface to take advantage of originally untagged traffic
allow-vmbr0 vlan1
auto vlan1
iface vlan1 inet static
iface vlan1 inet static
ovs_type OVSIntPort
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_bridge vmbr0
ovs_options vlan_mode=access
ovs_options vlan_mode=access
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 192.168.3.5
address 192.168.3.5
netmask 255.255.255.0
netmask 255.255.255.0
Line 253: Line 268:


==== Example 4: Rapid Spanning Tree (RSTP) - 1Gbps uplink, 10Gbps interconnect ====
==== Example 4: Rapid Spanning Tree (RSTP) - 1Gbps uplink, 10Gbps interconnect ====
'''WARNING:''' The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability.
'''WARNING:''' The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.


This example shows how you can use Rapid Spanning Tree (RSTP) to interconnect your ProxMox nodes inexpensively, and uplinking to your core switches for external traffic, all while maintaining a fully fault-tolerant interconnection scheme.  This means VM<->VM access (or possibly Ceph<->Ceph) can operate at the speed of the network interfaces directly attached in a star or ring topology.  In this example, we are using 10Gbps to interconnect our 3 nodes (direct-attach), and uplink to our core switches at 1Gbps.  Spanning Tree configured with the right cost metrics will prevent loops and activate the optimal paths for traffic.  Obviously we are using this topology because 10Gbps switch ports are very expensive so this is strictly a cost-savings manoeuvre.  You could obviously use 40Gbps ports instead of 10Gbps ports, but the key thing is the interfaces used to interconnect the nodes are higher-speed than the interfaces used to connect to the core switches.
This example shows how you can use Rapid Spanning Tree (RSTP) to interconnect your ProxMox nodes inexpensively, and uplinking to your core switches for external traffic, all while maintaining a fully fault-tolerant interconnection scheme.  This means VM<->VM access (or possibly Ceph<->Ceph) can operate at the speed of the network interfaces directly attached in a star or ring topology.  In this example, we are using 10Gbps to interconnect our 3 nodes (direct-attach), and uplink to our core switches at 1Gbps.  Spanning Tree configured with the right cost metrics will prevent loops and activate the optimal paths for traffic.  Obviously we are using this topology because 10Gbps switch ports are very expensive so this is strictly a cost-savings manoeuvre.  You could obviously use 40Gbps ports instead of 10Gbps ports, but the key thing is the interfaces used to interconnect the nodes are higher-speed than the interfaces used to connect to the core switches.


This assumes you are using Open vSwitch 2.5, older versions did not support Rapid Spanning Tree, but only Spanning Tree which had some issues.
This assumes you are using Open vSwitch 2.5+, older versions did not support Rapid Spanning Tree, but only Spanning Tree which had some issues.


To better explain what we are accomplishing, look at this ascii-art representation below:
To better explain what we are accomplishing, look at this ascii-art representation below:
Line 299: Line 314:
iface lo inet loopback
iface lo inet loopback


allow-vmbr0 eth0
auto eth0
# 1Gbps link to core switch
# 1Gbps link to core switch
iface eth0 inet manual
iface eth0 inet manual
Line 305: Line 320:
   ovs_type OVSPort
   ovs_type OVSPort
   # Use cost 20000, 40000, 60000 for node 1, 2, 3 (primary 1G port on each node, in preference order)
   # Use cost 20000, 40000, 60000 for node 1, 2, 3 (primary 1G port on each node, in preference order)
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=20000
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=20000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   mtu 8996
   ovs_mtu 8996


allow-vmbr0 eth1
auto eth1
# 1Gbps link to secondary core switch
# 1Gbps link to secondary core switch
iface eth1 inet manual
iface eth1 inet manual
Line 314: Line 329:
   ovs_type OVSPort
   ovs_type OVSPort
   # Use cost 21000, 41000, 61000 for node 1, 2, 3 (secondary 1G port on each node, in preference order)
   # Use cost 21000, 41000, 61000 for node 1, 2, 3 (secondary 1G port on each node, in preference order)
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=21000
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=21000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   mtu 8996
   ovs_mtu 8996


allow-vmbr0 eth2
auto eth2
# 10Gbps link to another proxmox/ceph node
# 10Gbps link to another proxmox/ceph node
iface eth2 inet manual
iface eth2 inet manual
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_type OVSPort
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2000
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   mtu 8996
   ovs_mtu 8996


allow-vmbr0 eth3
auto eth3
# 10Gbps link to another proxmox/ceph node
# 10Gbps link to another proxmox/ceph node
iface eth3 inet manual
iface eth3 inet manual
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_type OVSPort
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2100
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2100 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   mtu 8996  
   ovs_mtu 8996  


auto vmbr0
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
iface vmbr0 inet manual
   ovs_type OVSBridge
   ovs_type OVSBridge
Line 344: Line 358:
   #      options.
   #      options.
   up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
   up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
   mtu 8996
   ovs_mtu 8996
   # Wait for spanning-tree convergence
   # Wait for spanning-tree convergence
   post-up sleep 10
   post-up sleep 10


# Proxmox cluster communication vlan
# Proxmox cluster communication vlan
allow-vmbr0 vlan50
auto vlan50
iface vlan50 inet static
iface vlan50 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=50
   ovs_options tag=50
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.50.30.44
   address 10.50.30.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   gateway 10.50.30.1
   gateway 10.50.30.1
   mtu 1500
   ovs_mtu 1500


# Ceph cluster communication vlan (jumbo frames)
# Ceph cluster communication vlan (jumbo frames)
allow-vmbr0 vlan55
auto vlan55
iface vlan55 inet static
iface vlan55 inet static
   ovs_type OVSIntPort
   ovs_type OVSIntPort
   ovs_bridge vmbr0
   ovs_bridge vmbr0
   ovs_options tag=55
   ovs_options tag=55
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
   address 10.55.30.44
   address 10.55.30.44
   netmask 255.255.255.0
   netmask 255.255.255.0
   mtu 8996
   ovs_mtu 8996
</nowiki>
</nowiki>


Line 378: Line 390:
set protocols rstp max-age 6
set protocols rstp max-age 6
# ProxMox 1
# ProxMox 1
set protocols rstp interface ge-0/0/2 cost 20000
set protocols rstp interface ge-0/0/2 cost 20000 no-root-port
set protocols rstp interface ge-1/0/2 cost 21000
set protocols rstp interface ge-1/0/2 cost 21000 no-root-port
# ProxMox 2
# ProxMox 2
set protocols rstp interface ge-0/0/3 cost 40000
set protocols rstp interface ge-0/0/3 cost 40000 no-root-port
set protocols rstp interface ge-1/0/3 cost 41000
set protocols rstp interface ge-1/0/3 cost 41000 no-root-port
# ProxMox 3
# ProxMox 3
set protocols rstp interface ge-0/0/4 cost 60000
set protocols rstp interface ge-0/0/4 cost 60000 no-root-port
set protocols rstp interface ge-1/0/4 cost 61000
set protocols rstp interface ge-1/0/4 cost 61000 no-root-port
</nowiki>
</nowiki>



Latest revision as of 16:31, 11 January 2022

Open vSwitch (openvswitch, OVS) is an alternative to Linux native bridges, bonds, and vlan interfaces. Open vSwitch supports most of the features you would find on a physical switch, providing some advanced features like RSTP support, VXLANs, OpenFlow, and supports multiple vlans on a single bridge. If you need these features, it makes sense to switch to Open vSwitch.

Installation

Update the package index and then install the Open vSwitch packages by executing:

 apt update
 apt install openvswitch-switch

Configuration

Official reference here, though a bit bare: https://github.com/openvswitch/ovs/blob/master/debian/openvswitch-switch.README.Debian

Overview

Open vSwitch and Linux bonding and bridging or vlans MUST NOT be mixed. For instance, do not attempt to add a vlan to an OVS Bond, or add a Linux Bond to an OVSBridge or vice-versa. Open vSwitch is specifically tailored to function within virtualized environments, there is no reason to use the native linux functionality.

Bridges

A bridge is another term for a Switch. It directs traffic to the appropriate interface based on mac address. Open vSwitch bridges should contain raw ethernet devices, along with virtual interfaces such as OVSBonds or OVSIntPorts. These bridges can carry multiple vlans, and be broken out into 'internal ports' to be used as vlan interfaces on the host.

It should be noted that it is recommended that the bridge is bound to a trunk port with no untagged vlans; this means that your bridge itself will never have an ip address. If you need to work with untagged traffic coming into the bridge, it is recommended you tag it (assign it to a vlan) on the originating interface before entering the bridge (though you can assign an IP address on the bridge directly for that untagged data, it is not recommended). You can split out your tagged VLANs using virtual interfaces (OVSIntPort) if you need access to those vlans from your local host. Proxmox will assign the guest VMs a tap interface associated with a vlan, so you do NOT need a bridge per vlan (such as classic linux networking requires). You should think of your OVSBridge much like a physical hardware switch.

Remember, if you want to split out vlans with ips for use on the local host, you should use OVSIntPorts, see sections to follow.


NOTE: All interfaces must be listed under ovs_ports that are part of the bridge even if you have a port definition (e.g. OVSIntPort) that cross-references the bridge!!!

Bonds

Bonds are used to join multiple network interfaces together to act as single unit. Bonds must refer to raw ethernet devices (e.g. eth0, eth1).

When configuring a bond, it is recommended to use LACP (aka 802.3ad) for link aggregation. This requires switch support on the other end. A simple bond using eth0 and eth1 that will be part of the vmbr0 bridge might look like this.

auto bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth0 eth1
  ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast

NOTE: The interfaces that are part of a bond do not need to have their own configuration section.

VLANs Host Interfaces

In order for the host (e.g. proxmox host, not VMs themselves!) to utilize a vlan within the bridge, you must create OVSIntPorts. These split out a virtual interface in the specified vlan that you can assign an ip address to (or use DHCP). You need to set ovs_options tag=$VLAN to let OVS know what vlan the interface should be a part of. In the switch world, this is commonly referred to as an RVI (Routed Virtual Interface), or IRB (Integrated Routing and Bridging) interface.

IMPORTANT: These OVSIntPorts you create MUST also show up in the actual bridge definition under ovs_ports. If they do not, they will NOT be brought up even though you specified an ovs_bridge.

Setting up this vlan port would look like this in /etc/network/interfaces:

auto vlan50
iface vlan50 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=50
  address 10.50.10.44
  netmask 255.255.255.0
  gateway 10.50.10.1

Rapid Spanning Tree (RSTP)

Open vSwitch supports the Rapid Spanning Tree Protocol, but is disabled by default. Rapid Spanning Tree is a network protocol used to prevent loops in a bridged Ethernet local area network.

WARNING: The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.

In order to configure a bridge for RSTP support, you must use an "up" script as the "ovs_options" and "ovs_extras" options do not emit the proper commands. An example would be to add this to your "vmbr0" interface configuration:

up ovs-vsctl set Bridge ${IFACE} rstp_enable=true

It may be wise to also set a "post-up" script that sleeps for 10 or so seconds waiting on RSTP convergence before boot continues.

Other bridge options that may be set are:

  • other_config:rstp-priority= Configures the root bridge priority, the lower the value the more likely to become the root bridge. It is recommended to set this to the maximum value of 0xFFFF to prevent Open vSwitch from becoming the root bridge. The default value is 0x8000
  • other_config:rstp-forward-delay= The amount of time the bridge will sit in learning mode before entering a forwarding state. Range is 4-30, Default 15
  • other_config:rstp-max-age= Range is 6-40, Default 20

You should also consider adding a cost value to all interfaces that are part of a bridge. You can do so in the ethX interface configuration:

ovs_options other_config:rstp-path-cost=20000

Interface options that may be set via ovs_options are:

  • other_config:rstp-path-cost= Default 2000 for 10GbE, 20000 for 1GbE
  • other_config:rstp-port-admin-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering forwarding state if no BDPUs are detected
  • other_config:rstp-port-auto-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering a forwarding state if no BDPUs are detected
  • other_config:rstp-port-mcheck= Set to True if the other end is known to be using RSTP and not STP, will broadcast BDPUs immediately on link detection

You can look at the RSTP status for an interface via:

ovs-vsctl get Port eth0 rstp_status

NOTE: Open vSwitch does not currently allow a bond to participate in RSTP.

Note on MTU

If you plan on using a MTU larger than the default of 1500, you need to mark any physical interfaces, bonds, and bridges with a larger MTU by adding an mtu setting to the definition such as mtu 9000 otherwise it will be disallowed.


auto eth0
iface eth0 inet manual
    ovs_mtu 9000

#auto eth1
auto eth1
iface eth1 inet manual
    ovs_mtu 9000

# Interface bond0

auto bond0
iface bond0 inet manual
    ovs_bridge br-ex 
    ovs_type OVSBond
    ovs_bonds eth0 eth1
    ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
    ovs_mtu 9000

auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_mtu 9000



Odd Note: Some newer Intel Gigabit NICs have a hardware limitation which means the maximum MTU they can support is 8996 (instead of 9000). If your interfaces aren't coming up and you are trying to use 9000, this is likely the reason and can be difficult to debug. Try setting all your MTUs to 8996 and see if it resolves your issues.

Examples

Example 1: Bridge + Internal Ports + Untagged traffic

The below example shows you how to create a bridge with one physical interface, with 2 vlan interfaces split out, and tagging untagged traffic coming in on eth0 to vlan 1.

This is a complete and working /etc/network/interfaces listing:

# Loopback interface
auto lo
iface lo inet loopback

# Bridge for our eth0 physical interfaces and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  # NOTE: we MUST mention eth0, vlan1, and vlan55 even though each
  #       of them lists ovs_bridge vmbr0!  Not sure why it needs this
  #       kind of cross-referencing but it won't work without it!
  ovs_ports eth0 vlan1 vlan55
  ovs_mtu 9000

# Physical interface for traffic coming into the system.  Retag untagged
# traffic into vlan 1, but pass through other tags.
auto eth0
iface eth0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSPort
  ovs_options tag=1 vlan_mode=native-untagged
# Alternatively if you want to also restrict what vlans are allowed through
# you could use:
# ovs_options tag=1 vlan_mode=native-untagged trunks=10,20,30,40
  ovs_mtu 9000

# Virtual interface to take advantage of originally untagged traffic
auto vlan1
iface vlan1 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=1
  address 10.50.10.44
  netmask 255.255.255.0
  gateway 10.50.10.1
  ovs_mtu 1500

# Ceph cluster communication vlan (jumbo frames)
auto vlan55
iface vlan55 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=55
  address 10.55.10.44
  netmask 255.255.255.0
  ovs_mtu 9000

Example 2: Bond + Bridge + Internal Ports

The below example shows you a combination of all the above features. 2 NICs are bonded together and added to an OVS Bridge. 2 vlan interfaces are split out in order to provide the host access to vlans with different MTUs.

This is a complete and working /etc/network/interfaces listing:

# Loopback interface
auto lo
iface lo inet loopback

# Bond eth0 and eth1 together
auto eth0
iface eth0 inet manual
    ovs_mtu 9000

auto eth1
iface eth1 inet manual
    ovs_mtu 9000

auto bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth0 eth1
  ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
  ovs_mtu 9000

# Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  # NOTE: we MUST mention bond0, vlan50, and vlan55 even though each
  #       of them lists ovs_bridge vmbr0!  Not sure why it needs this
  #       kind of cross-referencing but it won't work without it!
  ovs_ports bond0 vlan50 vlan55
  ovs_mtu 9000

# Proxmox cluster communication vlan
auto vlan50
iface vlan50 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=50
  address 10.50.10.44
  netmask 255.255.255.0
  gateway 10.50.10.1
  ovs_mtu 1500

# Ceph cluster communication vlan (jumbo frames)
auto vlan55
iface vlan55 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=55
  address 10.55.10.44
  netmask 255.255.255.0
  ovs_mtu 9000

Example 3: Bond + Bridge + Internal Ports + Untagged traffic + No LACP

The below example shows you a combination of all the above features. 2 NICs are bonded together and added to an OVS Bridge. This example imitates the default proxmox network configuration but using a bond instead of a single NIC and the bond will work without a managed switch which supports LACP.

This is a complete and working /etc/network/interfaces listing:

# Loopback interface
auto lo
iface lo inet loopback

# Bond eth0 and eth1 together
auto bond0
iface bond0 inet manual
	ovs_bridge vmbr0
	ovs_type OVSBond
	ovs_bonds eth0 eth1
	ovs_options bond_mode=balance-slb vlan_mode=native-untagged

# Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr0
iface vmbr0 inet manual
	ovs_type OVSBridge
	ovs_ports bond0 vlan1

# Virtual interface to take advantage of originally untagged traffic
auto vlan1
iface vlan1 inet static
	ovs_type OVSIntPort
	ovs_bridge vmbr0
	ovs_options vlan_mode=access
	address 192.168.3.5
	netmask 255.255.255.0
	gateway 192.168.3.254

Example 4: Rapid Spanning Tree (RSTP) - 1Gbps uplink, 10Gbps interconnect

WARNING: The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.

This example shows how you can use Rapid Spanning Tree (RSTP) to interconnect your ProxMox nodes inexpensively, and uplinking to your core switches for external traffic, all while maintaining a fully fault-tolerant interconnection scheme. This means VM<->VM access (or possibly Ceph<->Ceph) can operate at the speed of the network interfaces directly attached in a star or ring topology. In this example, we are using 10Gbps to interconnect our 3 nodes (direct-attach), and uplink to our core switches at 1Gbps. Spanning Tree configured with the right cost metrics will prevent loops and activate the optimal paths for traffic. Obviously we are using this topology because 10Gbps switch ports are very expensive so this is strictly a cost-savings manoeuvre. You could obviously use 40Gbps ports instead of 10Gbps ports, but the key thing is the interfaces used to interconnect the nodes are higher-speed than the interfaces used to connect to the core switches.

This assumes you are using Open vSwitch 2.5+, older versions did not support Rapid Spanning Tree, but only Spanning Tree which had some issues.

To better explain what we are accomplishing, look at this ascii-art representation below:

 X     = 10Gbps port
 G     = 1Gbps port
 B     = Blocked via Spanning Tree
 R     = Spanning Tree Root
 PM1-3 = Proxmox hosts 1-3
 SW1-2 = Juniper Switches (stacked) 1-2
 * NOTE: Open vSwitch cannot do STP on bonded links, otherwise the links to the core
         switches would be bonded in this diagram :/

 |-----------------------------|
 | G           G           G   | SW1
 |-|-----------|-----------|---| R
 |-+-----------+-----------+---|
 | | G         | G         | G | SW2
 |-+-|---------+-|---------+-|-|
   | |         | |         | |
   | |         | |         | |
   | B         B B         B B
   | |         | |         | |
|--|-|--|      | |      |--|-|--|
|  G GX--------+-+--------XG G  |
|     X |      | |      | X     |
|------\|      | |      |/------|
   PM1  \      | |      /  PM3
         \     | |     B
          \    | |    /
           \|--|-|--|/
            \  G G  /
            |X     X|
            |-------|
               PM2 

This is a complete and working /etc/network/interfaces listing:

auto lo
iface lo inet loopback

auto eth0
# 1Gbps link to core switch
iface eth0 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   # Use cost 20000, 40000, 60000 for node 1, 2, 3 (primary 1G port on each node, in preference order)
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=20000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   ovs_mtu 8996

auto eth1
# 1Gbps link to secondary core switch
iface eth1 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   # Use cost 21000, 41000, 61000 for node 1, 2, 3 (secondary 1G port on each node, in preference order)
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=21000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   ovs_mtu 8996

auto eth2
# 10Gbps link to another proxmox/ceph node
iface eth2 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2000 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   ovs_mtu 8996

auto eth3
# 10Gbps link to another proxmox/ceph node
iface eth3 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options tag=1 vlan_mode=native-untagged other_config:rstp-enable=true other_config:rstp-path-cost=2100 other_config:rstp-port-admin-edge=false other_config:rstp-port-auto-edge=false other_config:rstp-port-mcheck=true
   ovs_mtu 8996 

auto vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports eth0 eth1 eth2 eth3 vlan50 vlan55

  # Lower settings for shorter convergence times, we're on a fast network.
  # Set the priority high so that it won't be promoted to the STP root
  # NOTE: ovs_options and ovs_extra do *not* work for some reason to set the STP
  #       options.
  up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
  ovs_mtu 8996
  # Wait for spanning-tree convergence
  post-up sleep 10

# Proxmox cluster communication vlan
auto vlan50
iface vlan50 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=50
  address 10.50.30.44
  netmask 255.255.255.0
  gateway 10.50.30.1
  ovs_mtu 1500

# Ceph cluster communication vlan (jumbo frames)
auto vlan55
iface vlan55 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=55
  address 10.55.30.44
  netmask 255.255.255.0
  ovs_mtu 8996

On our Juniper core switches, we put in place this configuration:

set protocols rstp bridge-priority 0
set protocols rstp forward-delay 4
set protocols rstp max-age 6
# ProxMox 1
set protocols rstp interface ge-0/0/2 cost 20000 no-root-port
set protocols rstp interface ge-1/0/2 cost 21000 no-root-port
# ProxMox 2
set protocols rstp interface ge-0/0/3 cost 40000 no-root-port
set protocols rstp interface ge-1/0/3 cost 41000 no-root-port
# ProxMox 3
set protocols rstp interface ge-0/0/4 cost 60000 no-root-port
set protocols rstp interface ge-1/0/4 cost 61000 no-root-port

Inspecting:

# Get Bridge info
ovs-vsctl get Bridge vmbr0 rstp_status
# Get Per-Port info
for port in eth0 eth1 eth2 eth3 ; do 
  echo "==================================="
  echo "PORT $port :"
  echo ""
  ovs-vsctl get Port $port rstp_status
done

Multicast

Right now Open vSwitch doesn't do anything in regards to multicast. Typically where you might tell linux to enable the multicast querier on the bridge, you should instead set up your querier at your router or switch. Please refer to the Multicast_notes wiki for more information.

Using Open vSwitch in Proxmox

Using Open vSwitch isn't that much different than using normal linux bridges. The main difference is instead of having a bridge per vlan, you have a single bridge containing all your vlans. Then when configuring the network interface for the VM, you would select the bridge (probably the only bridge you have), and you would also enter the VLAN Tag associated with the VLAN you want your VM to be a part of. Now there is zero effort when adding or removing VLANs!