ISCSI Multipath
Introduction
Main purpose of multipath connectivity is to provide redundant access to the storage devices, i.e to have access to the storage device when one or more of the components in a path fail. Another advantage of multipathing is the increased throughput by way of load balancing. Common example for the use of multipathing is a iSCSI SAN connected storage device. You have redundancy and maximum performance.
If you use iSCSI, multipath is recommended - this works without configurations on the switches. (If you use NFS or CIFS, use bonding, e.g. 802.ad)
The connection from the Proxmox VE host through the iSCSI SAN is referred as a path. When multiple paths exists to a storage device (LUN) on a storage subsystem, it is referred as multipath connectivity. Therefore you need to make sure that you got at least two NICs dedicated for iSCSI, using separate networks (and switches to be protected against switch failures).
This is a generic how-to. Please consult the storage vendor documentation for vendor specific settings.
Update your iSCSI configuration
It is important to start all required iSCSI connections at boot time. You can do that by setting 'node.startup' to 'automatic'.
The default 'node.session.timeo.replacement_timeout' is 120 seconds. We recommend using a much smaller value of 15 seconds instead.
You can set those values in '/etc/iscsi/iscsid.conf' (defaults). If you are already connected to the iSCSI target, you need to modify the target specific defaults in '/etc/iscsi/nodes/<TARGET>/<PORTAL>/default'
A modified 'iscsid.conf' file contains the following lines:
node.startup = automatic node.session.timeo.replacement_timeout = 15
Please configure your iSCSI storage on the GUI if you have not done that already ("Datacenter/Storage: Add iSCSI target").
Install multipath tools
The default installation does not include this package, so you first need to install the multipath-tools package:
# apt-get update # apt-get install multipath-tools
Multipath configuration
Then you need to create the multipath configuration file '/etc/multipath.conf'. You can find details about all setting on the manual page:
# man multipath.conf
We recommend to use 'wwid' to identify disks (World Wide Identification). You can use the 'scsi_id' command to get the 'wwid' for a specific device. For example, the following command returns the 'wwid' for device '/dev/sda'
# /lib/udev/scsi_id -g -u -d /dev/sda
We normally blacklist all devices, and only allow specific devices using 'blacklist_exceptions':
blacklist { wwid .* } blacklist_exceptions { wwid "3600144f028f88a0000005037a95d0001" wwid "3600144f028f88a0000005037a95d0002" }
We also use the 'alias' directive to name the device, but this is optional:
multipaths { multipath { wwid "3600144f028f88a0000005037a95d0001" alias mpath0 } multipath { wwid "3600144f028f88a0000005037a95d0002" alias mpath1 } }
The wwids have also to be added into /etc/multipath/wwids. For this run e.g. the following commands:
multipath -a 3600144f028f88a0000005037a95d0001 multipath -a 3600144f028f88a0000005037a95d0002
And finally you need reasonable defaults. We normally use the following multibus configuration:
Note PVE4.x and higher
defaults { polling_interval 2 path_selector "round-robin 0" path_grouping_policy multibus uid_attribute ID_SERIAL rr_min_io 100 failback immediate no_path_retry queue user_friendly_names yes }
Note PVE3.x
defaults { polling_interval 2 path_selector "round-robin 0" path_grouping_policy multibus getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n" rr_min_io 100 failback immediate no_path_retry queue }
Note If you run multipath on V2.3 and before, you need to adapt your multipath.conf - 'path_selector' was called 'selector' called
Also check your SAN vendor documentation.
To activate those settings you need to restart they multipath daemon with: Note PVE 3 and before (Sysvinit)
# service multipath-tools restart
Note PVE 4.x and higher (systemd)
# systemctl restart multipath-tools.service
Example multipath.conf
Edit or create the following file with your preferred text editor.
# cat /etc/multipath.conf
Note PVE 4.x
defaults { polling_interval 2 path_selector "round-robin 0" path_grouping_policy multibus uid_attribute ID_SERIAL rr_min_io 100 failback immediate no_path_retry queue user_friendly_names yes } blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^(td|hd)[a-z]" devnode "^dcssblk[0-9]*" devnode "^cciss!c[0-9]d[0-9]*" device { vendor "DGC" product "LUNZ" } device { vendor "EMC" product "LUNZ" } device { vendor "IBM" product "Universal Xport" } device { vendor "IBM" product "S/390.*" } device { vendor "DELL" product "Universal Xport" } device { vendor "SGI" product "Universal Xport" } device { vendor "STK" product "Universal Xport" } device { vendor "SUN" product "Universal Xport" } device { vendor "(NETAPP|LSI|ENGENIO)" product "Universal Xport" } } blacklist_exceptions { wwid "3600144f028f88a0000005037a95d0001" } multipaths { multipath { wwid "3600144f028f88a0000005037a95d0001" alias test } }
Note PVE 3.x
defaults { polling_interval 2 path_selector "round-robin 0" path_grouping_policy multibus getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n" rr_min_io 100 failback immediate no_path_retry queue } blacklist { wwid * } blacklist_exceptions { wwid "3600144f028f88a0000005037a95d0001" } multipaths { multipath { wwid "3600144f028f88a0000005037a95d0001" alias nexenta0 } }
Query device status
You can view the status with:
# multipath -ll
mpath0 (3600144f028f88a0000005037a95d0001) dm-3 NEXENTA,NEXENTASTOR size=64G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=2 status=active |- 5:0:0:0 sdb 8:16 active ready running `- 6:0:0:0 sdc 8:32 active ready running
To get more information about used devices use:
# multipath -v3
Performance test with fio
In order to check the performance, you can use fio.
Example read test:
fio --filename=/dev/mapper/mpath0 --direct=1 --rw=read --bs=1m --size=20G --numjobs=200 --runtime=60 --group_reporting --name=file1
Vendor specific settings
Please add vendor specific recommendations here.
Dell
You need to load a Dell specific module scsi_dh_rdac permanently, in order to do this, just edit:
nano /etc/modules
# /etc/modules: kernel modules to load at boot time. # # This file contains the names of kernel modules that should be loaded # at boot time, one per line. Lines beginning with "#" are ignored. # Parameters can be specified after the module name. scsi_dh_rdac
defaults { polling_interval 2 path_selector "round-robin 0" path_grouping_policy multibus getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n" rr_min_io 100 failback immediate no_path_retry queue } blacklist { wwid * } blacklist_exceptions { wwid 3690b22c00008da2c000008a35098b0dc } devices { device { vendor "DELL" product "MD32xxi" path_grouping_policy group_by_prio prio rdac polling_interval 5 path_checker rdac path_selector "round-robin 0" hardware_handler "1 rdac" failback immediate features "2 pg_init_retries 50" no_path_retry 30 rr_min_io 100 } } multipaths { multipath { wwid 3690b22c00008da2c000008a35098b0dc alias md3200i } }
And you need to configure a suitable filter in /etc/lvm/lvm.conf in order to avoid error messages.
See also:
Reduxio HX550
Configuration
You need to configure the following:
- Update /etc/iscsi/iscsid.conf
- Create/update /etc/multipath.conf
- Create /etc/udev/rules.d/99-reduxio.rules
/etc/iscsi/iscsid.conf
Add or update the following parameters:
node.startup = automatic # The length of time to wait before retrying a failed IO . Can be reduced to a minimum since multipath detects the failure and immediately fails to another path. The value is in seconds and the default is typically 120. node.session.timeo.replacement_timeout = 5 # The time to wait for an iSCSI login to complete. The value is in seconds and the default is 15. node.conn[0].timeo.login_timeout = 15 # To specify the time to wait for logout to complete, edit the line. # The value is in seconds and the default is 15 seconds. node.conn[0].timeo.logout_timeout = 15 # Time interval to wait for on connection before sending a ping. node.conn[0].timeo.noop_out_interval = 5 # To specify the time to wait for a Nop-out response before failing # the connection, edit this line. Failing the connection will # cause IO to be failed back to the SCSI layer. If using dm-multipath # this will cause the IO to be failed to the multipath layer. node.conn[0].timeo.noop_out_timeout = 5 # # This retry count along with node.conn[0].timeo.login_timeout # determines the maximum amount of time iscsid will try to # establish the initial login. node.session.initial_login_retry_max is # multiplied by the node.conn[0].timeo.login_timeout to determine the # maximum amount. node.session.initial_login_retry_max 8
/etc/udev/rules.d/99-reduxio.rules
Create the following file:
# /etc/udev/rules.d/99-reduxio.rules SUBSYSTEM=="block" , ACTION=="change", ATTRS{model}=="TCAS", ATTRS{vendor}=="REDUXIO", RUN+="/bin/sh -c '/usr/sbin/iscsiadm -m session -R '" SUBSYSTEM=="block" , ACTION=="change", ATTRS{model}=="TCAS", ATTRS{vendor}=="REDUXIO", ATTR{size}=="0", RUN+="/bin/sh -c 'echo 1 > /sys$DEVPATH/../../delete '" SUBSYSTEM=="block" , ACTION=="change", ATTRS{model}=="TCAS", ATTRS{vendor}=="REDUXIO", RUN+="/bin/sh -c 'service multipathd reload || service multipath-tools reload ' " SUBSYSTEM=="block" , ACTION=="change", ATTRS{model}=="TCAS", ATTRS{vendor}=="REDUXIO", RUN+="/bin/sh -c '/usr/sbin/multipath -r $DEVNAME '"
/etc/multipath.conf
Create or update /etc/multipath.conf. This is required for correct high-availability
devices { device { vendor "REDUXIO" product "TCAS" revision "2300" path_grouping_policy "group_by_prio" path_checker "tur" hardware_handler "1 alua" path_selector "round-robin 0" prio "alua" failback "immediate" features "0" rr_weight "uniform" no_path_retry "72" queue_without_daemon "no" rr_min_io_rq 10 rr_min_io 10 user_friendly_names "yes" fast_io_fail_tmo "10" } } blacklist { # Note: it is highly recommended to blacklist by wwid or vendor instead of device name devnode "^sd[a]$" }