Infiniband
Introduction
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB)
Subnet Manager
Infiniband requires a subnet manager to function. Many Infiniband switches have a built in subnet manager that can be enabled. When using multiple switches you can enable a subnet manager on all of them for redundancy.
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s). opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.
Sockets Direct Protocol (SDP)
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband. DRBD supports SDP and offers some performance gains.
The Linux Kernel does not include the SDP module. If you want to use SDP you need to install OFED. Thus far I have been unable to get OFED to compile for Proxmox 2.0.
IPoIB
IP over Infiniband allows sending IP packets over the Infiniband fabric.
Proxmox 1.X Prerequisites
Debian Lenny network scripts do not work well with Infiniband interfaces. This can be corrected by installing the following packages from Debian squeeze:
ifenslave-2.6_1.1.0-17_amd64.deb net-tools_1.60-23_amd64.deb ifupdown_0.6.10_amd64.deb
= Proxmox 2.0=
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. TBD: is this needed under some circumstances?
aptitude install opensm
Proxmox 3.x
See directions for 2.0. Nothing has changed in 3 that warrants noting here.
Create IPoIB Interface
Bonding
It is not possible to bond Infiniband to increase throughput If you want to use bonding for redundancy create a bonding interface.
/etc/modprobe.d/aliases-bond.conf
alias bond0 bonding options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2
Infiniband interfaces are named ib0,ib1, etc.
Edit /etc/network/interfaces
auto bond0 iface bond0 inet static address 192.168.1.1 netmask 255.255.255.0 slaves ib0 ib1 bond_miimon 100 bond_mode active-backup pre-up modprobe ib_ipoib pre-up echo connected > /sys/class/net/ib0/mode pre-up echo connected > /sys/class/net/ib1/mode pre-up modprobe bond0 mtu 65520
To bring up the interface:
ifup bond0
Without Bonding
Edit /etc/network/interfaces
auto ib0 iface ib0 inet static address 192.168.1.1 netmask 255.255.255.0 pre-up modprobe ib_ipoib pre-up echo connected > /sys/class/net/ib0/mode mtu 65520
To bring up the interface:
ifup ib0
TCP/IP Tuning
These settings performed best on my servers, your mileage may vary.
edit /etc/sysctl.conf
#Infiniband Tuning net.ipv4.tcp_mem=1280000 1280000 1280000 net.ipv4.tcp_wmem = 32768 131072 1280000 net.ipv4.tcp_rmem = 32768 131072 1280000 net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.core.rmem_default=16777216 net.core.wmem_default=16777216 net.core.optmem_max=1524288 net.ipv4.tcp_sack=0 net.ipv4.tcp_timestamps=0
To apply the changes now:
sysctl -p
iperf speed tests
this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.
on systems to test install
aptitude install iperf
on one system run as server. in example it is using Ip 10.0.99.8
iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------
on a client.
# iperf -c 10.0.99.8 ------------------------------------------------------------ Client connecting to 10.0.99.8, TCP port 5001 TCP window size: 646 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec
I want to see the infiniband interface exposed in my VMs - can I do that?
The short answer is no. The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work.
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device.
Using IB for cluster networking
IB can be used for cluster communications.
Edit /etc/hosts and change the host names/IPs to your IB network. Reboot each host, and make sure ssh can connect to all hosts from each host over IB.
admin
Install this and check docs in pkg and at http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html
apt-get install infiniband-diags
aptitude show infiniband-diags Package: infiniband-diags New: yes State: installed Automatically installed: no Version: 1.4.4-20090314-1.2 Priority: extra Section: net Maintainer: OFED and Debian Developement and Discussion <pkg-ofed-devel@lists.alioth.debian.org> Architecture: amd64 Uncompressed Size: 472 k Depends: libc6 (>= 2.3), libibcommon1, libibmad1, libibumad1, libopensm2, perl Description: InfiniBand diagnostic programs InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. This package provides diagnostic programs and scripts needed to diagnose an InfiniBand subnet. Homepage: http://www.openfabrics.org
- ibstat . in the following we had a cable not fully connected to ib card
# ibstat CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.3.0 Hardware version: a0 Node GUID: 0x0002c90200277c9c System image GUID: 0x0002c90200277c9f Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 18 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0002c90200277c9d Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200277c9e