Infiniband: Difference between revisions
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
=Introduction= | == Introduction == | ||
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) | Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) | ||
==Subnet Manager== | === Subnet Manager === | ||
Infiniband requires a subnet manager to function. | Infiniband requires a subnet manager to function. | ||
Many Infiniband switches have a built in subnet manager that can be enabled. | Many Infiniband switches have a built in subnet manager that can be enabled. | ||
Line 10: | Line 10: | ||
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager. | opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager. | ||
=== Sockets Direct Protocol (SDP) === | |||
==Sockets Direct Protocol (SDP)== | |||
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband. | SDP can be used with a preload library to speed up TCP/IP communications over Infiniband. | ||
DRBD supports SDP and offers some performance gains. | DRBD supports SDP and offers some performance gains. | ||
Line 19: | Line 18: | ||
Thus far I have been unable to get OFED to compile for Proxmox 2.0. | Thus far I have been unable to get OFED to compile for Proxmox 2.0. | ||
=IPoIB= | == IPoIB == | ||
IP over Infiniband allows sending IP packets over the Infiniband fabric. | IP over Infiniband allows sending IP packets over the Infiniband fabric. | ||
==Proxmox 1.X Prerequisites== | === Proxmox 1.X Prerequisites === | ||
Debian Lenny network scripts do not work well with Infiniband interfaces. | Debian Lenny network scripts do not work well with Infiniband interfaces. | ||
This can be corrected by installing the following packages from Debian squeeze: | This can be corrected by installing the following packages from Debian squeeze: | ||
Line 29: | Line 28: | ||
ifupdown_0.6.10_amd64.deb</pre> | ifupdown_0.6.10_amd64.deb</pre> | ||
==Proxmox 2.0== | === Proxmox 2.0 === | ||
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box. | Nothing special is needed with Proxmox 2.0, everything seems to work out of the box. | ||
==== Subnet manager ==== | |||
At least one instance of subnet manager is required to have switch routing tables up-to-date. | |||
Subnet manager is often part of managed switch MLNX-OS image, or it can be installed on any bare-metal host. | |||
==Proxmox 3.x== | === Proxmox 3.x === | ||
See directions for 2.0. Nothing has changed in 3 that warrants noting here. | See directions for 2.0. Nothing has changed in 3 that warrants noting here. | ||
==Create IPoIB Interface== | === Create IPoIB Interface === | ||
===Bonding=== | ==== Bonding ==== | ||
It is not possible to bond Infiniband to increase throughput | It is not possible to bond Infiniband to increase throughput | ||
If you want to use bonding for redundancy create a bonding interface. | If you want to use bonding for redundancy create a bonding interface. | ||
Line 68: | Line 69: | ||
<pre>ifup bond0</pre> | <pre>ifup bond0</pre> | ||
===Without Bonding=== | ==== Without Bonding ==== | ||
Edit /etc/network/interfaces | Edit /etc/network/interfaces | ||
Line 82: | Line 83: | ||
<pre>ifup ib0</pre> | <pre>ifup ib0</pre> | ||
==TCP/IP Tuning== | === TCP/IP Tuning === | ||
These settings performed best on my servers, your mileage may vary. | These settings performed best on my servers, your mileage may vary. | ||
Line 102: | Line 103: | ||
==iperf speed tests== | === iperf speed tests === | ||
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.'' | ''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.'' | ||
Line 127: | Line 128: | ||
</pre> | </pre> | ||
==I want to see the infiniband interface exposed in my VMs - can I do that?== | === I want to see the infiniband interface exposed in my VMs - can I do that? === | ||
The short answer is no. | The short answer is no. | ||
The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. | The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. | ||
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. | In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. | ||
=== Using IB for cluster networking === | |||
IB can be used for cluster communications. | |||
<pre> | |||
Edit /etc/hosts and change the host names/IPs to your IB network. | |||
Reboot each host, and make sure ssh can connect to all hosts from each host over IB. | |||
</pre> | |||
== admin == | |||
Install this and check docs in pkg and at http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html | |||
apt-get install infiniband-diags | |||
<pre> | |||
aptitude show infiniband-diags | |||
Package: infiniband-diags | |||
New: yes | |||
State: installed | |||
Automatically installed: no | |||
Version: 1.4.4-20090314-1.2 | |||
Priority: extra | |||
Section: net | |||
Maintainer: OFED and Debian Developement and Discussion <pkg-ofed-devel@lists.alioth.debian.org> | |||
Architecture: amd64 | |||
Uncompressed Size: 472 k | |||
Depends: libc6 (>= 2.3), libibcommon1, libibmad1, libibumad1, libopensm2, perl | |||
Description: InfiniBand diagnostic programs | |||
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its | |||
features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. | |||
This package provides diagnostic programs and scripts needed to diagnose an InfiniBand subnet. | |||
Homepage: http://www.openfabrics.org | |||
</pre> | |||
* ibstat . in the following we had a cable not fully connected to ib card | |||
<pre> | |||
# ibstat | |||
CA 'mthca0' | |||
CA type: MT25208 | |||
Number of ports: 2 | |||
Firmware version: 5.3.0 | |||
Hardware version: a0 | |||
Node GUID: 0x0002c90200277c9c | |||
System image GUID: 0x0002c90200277c9f | |||
Port 1: | |||
State: Active | |||
Physical state: LinkUp | |||
Rate: 10 | |||
Base lid: 18 | |||
LMC: 0 | |||
SM lid: 3 | |||
Capability mask: 0x02510a68 | |||
Port GUID: 0x0002c90200277c9d | |||
Port 2: | |||
State: Down | |||
Physical state: Polling | |||
Rate: 10 | |||
Base lid: 0 | |||
LMC: 0 | |||
SM lid: 0 | |||
Capability mask: 0x02510a68 | |||
Port GUID: 0x0002c90200277c9e | |||
</pre> | |||
[[Category: HOWTO]] | [[Category: HOWTO]] |
Latest revision as of 10:28, 23 August 2023
Introduction
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB)
Subnet Manager
Infiniband requires a subnet manager to function. Many Infiniband switches have a built in subnet manager that can be enabled. When using multiple switches you can enable a subnet manager on all of them for redundancy.
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s). opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.
Sockets Direct Protocol (SDP)
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband. DRBD supports SDP and offers some performance gains.
The Linux Kernel does not include the SDP module. If you want to use SDP you need to install OFED. Thus far I have been unable to get OFED to compile for Proxmox 2.0.
IPoIB
IP over Infiniband allows sending IP packets over the Infiniband fabric.
Proxmox 1.X Prerequisites
Debian Lenny network scripts do not work well with Infiniband interfaces. This can be corrected by installing the following packages from Debian squeeze:
ifenslave-2.6_1.1.0-17_amd64.deb net-tools_1.60-23_amd64.deb ifupdown_0.6.10_amd64.deb
Proxmox 2.0
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.
Subnet manager
At least one instance of subnet manager is required to have switch routing tables up-to-date. Subnet manager is often part of managed switch MLNX-OS image, or it can be installed on any bare-metal host.
Proxmox 3.x
See directions for 2.0. Nothing has changed in 3 that warrants noting here.
Create IPoIB Interface
Bonding
It is not possible to bond Infiniband to increase throughput If you want to use bonding for redundancy create a bonding interface.
/etc/modprobe.d/aliases-bond.conf
alias bond0 bonding options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2
Infiniband interfaces are named ib0,ib1, etc.
Edit /etc/network/interfaces
auto bond0 iface bond0 inet static address 192.168.1.1 netmask 255.255.255.0 slaves ib0 ib1 bond_miimon 100 bond_mode active-backup pre-up modprobe ib_ipoib pre-up echo connected > /sys/class/net/ib0/mode pre-up echo connected > /sys/class/net/ib1/mode pre-up modprobe bond0 mtu 65520
To bring up the interface:
ifup bond0
Without Bonding
Edit /etc/network/interfaces
auto ib0 iface ib0 inet static address 192.168.1.1 netmask 255.255.255.0 pre-up modprobe ib_ipoib pre-up echo connected > /sys/class/net/ib0/mode mtu 65520
To bring up the interface:
ifup ib0
TCP/IP Tuning
These settings performed best on my servers, your mileage may vary.
edit /etc/sysctl.conf
#Infiniband Tuning net.ipv4.tcp_mem=1280000 1280000 1280000 net.ipv4.tcp_wmem = 32768 131072 1280000 net.ipv4.tcp_rmem = 32768 131072 1280000 net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.core.rmem_default=16777216 net.core.wmem_default=16777216 net.core.optmem_max=1524288 net.ipv4.tcp_sack=0 net.ipv4.tcp_timestamps=0
To apply the changes now:
sysctl -p
iperf speed tests
this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.
on systems to test install
aptitude install iperf
on one system run as server. in example it is using Ip 10.0.99.8
iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------
on a client.
# iperf -c 10.0.99.8 ------------------------------------------------------------ Client connecting to 10.0.99.8, TCP port 5001 TCP window size: 646 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec
I want to see the infiniband interface exposed in my VMs - can I do that?
The short answer is no. The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work.
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device.
Using IB for cluster networking
IB can be used for cluster communications.
Edit /etc/hosts and change the host names/IPs to your IB network. Reboot each host, and make sure ssh can connect to all hosts from each host over IB.
admin
Install this and check docs in pkg and at http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html
apt-get install infiniband-diags
aptitude show infiniband-diags Package: infiniband-diags New: yes State: installed Automatically installed: no Version: 1.4.4-20090314-1.2 Priority: extra Section: net Maintainer: OFED and Debian Developement and Discussion <pkg-ofed-devel@lists.alioth.debian.org> Architecture: amd64 Uncompressed Size: 472 k Depends: libc6 (>= 2.3), libibcommon1, libibmad1, libibumad1, libopensm2, perl Description: InfiniBand diagnostic programs InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. This package provides diagnostic programs and scripts needed to diagnose an InfiniBand subnet. Homepage: http://www.openfabrics.org
- ibstat . in the following we had a cable not fully connected to ib card
# ibstat CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.3.0 Hardware version: a0 Node GUID: 0x0002c90200277c9c System image GUID: 0x0002c90200277c9f Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 18 LMC: 0 SM lid: 3 Capability mask: 0x02510a68 Port GUID: 0x0002c90200277c9d Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200277c9e