Multipath

From Proxmox VE
Revision as of 08:12, 17 October 2024 by Fweber (talk | contribs) (Initial version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

The main purpose of multipath connectivity is to provide redundant access to a storage device, i.e., to have access to the storage device when one or more of the components in a path fail. Another advantage of multipathing is the increased throughput by way of load balancing.

This article serves as a guideline for setting up multipath connectivity to a logical disk that is managed by a SAN and accessed via iSCSI or Fibre Channel (FC), or accessed directly via direct-attached SAS. It also provides some advanced and/or vendor-specific configuration tweaks, and covers steps necessary to use a LUN as an LVM Physical Volume. This article does not apply to network filesystems such as NFS or CIFS storages. If you use NFS or CIFS and want to ensure redundant access in case of network failures, use a Linux bond.

In the following, the term LUN will refer to a logical disk. A LUN can be uniquely identified by its WWID (also called WWN). One particular connection from a Proxmox VE host to a LUN is referred to as a path. With multipath connectivity, not only one but multiple paths exists to a LUN. In order to obtain redundancy, you have to ensure that the paths are not part of the same failure domain. This is to make sure that the Proxmox VE host can still access the storage device in case one path fails. For instance, if you use iSCSI with two paths, the two paths should use at least two dedicated NICs, separate networks, and separate switches (to protect against switch failures).

Under Linux, multipath connectivity can be set up using multipath-tools. This is done in two steps:

  1. Set up multiple paths to the LUNs. With multiple paths to a specific LUN, a Linux host will see one block device per path, all with the same WWID. However, you should not use any of these block devices directly: Each block device corresponds to one particular path, and if that path fails, the block device will stop working. To obtain redundancy, you need to set up multipath-tools.
  2. Set up multipath-tools. By checking the WWID of each block device, multipath-tools can recognize that multiple block devices point to the same LUN. It then creates a multipath device for each LUN, and takes care of delegating IO to the actual paths. You can use the multipath device, and if one of the paths fail, multipath-tools will take care of falling back to the available paths.

Afterwards, you can optionally set up an LVM Physical Volume (PV) on top of the LUN, and create an LVM Volume Group (VG) with that PV.

This is a generic how-to. Please consult the storage vendor documentation for vendor-specific settings.

Set up multiple paths to the LUNs

The steps differ between LUNs accessed via iSCSI and via FC/SAS. LUNs accessed via iSCSI require additional steps in comparison to LUNs accessed via FC/SAS.

iSCSI

An iSCSI target provides access to LUNs, and an iSCSI initiator connects to the iSCSI target via one or more portals. An iSCSI target is conceptually similar to a server, and an iSCSI initiator to a client. The iSCSI target can advertise multiple portals, and the iSCSI initiator may connect via one or more portals. In the context of multipath connectivity, each connection from the iSCSI initiator to a LUN on an iSCSI target via a specific portal can be regarded as one path. In other words, multiple portals correspond to multiple paths.

When using the default iSCSI storage (storage type iscsi) in Proxmox VE, the SAN acts as the iSCSI target and open-iscsi running on the Proxmox VE host acts as the initiator.

We recommend making some changes to the Open-ISCSI configuration by editing the defaults in /etc/iscsi/iscsid.conf: The default node.session.timeo.replacement_timeout is 120 seconds, we recommend using a much smaller value of 15 seconds instead. The modified iscsid.conf file should contain the following line:

node.session.timeo.replacement_timeout = 15

Then, configure your iSCSI storage on the GUI ("Datacenter->Storage->Add->iSCSI"). In the "Add: iSCSI" dialog, enter the IP of an arbitrary portal. Usually, the iSCSI target advertises all available portals back to the iSCSI initiator, and in the default configuration, Proxmox VE will try to connect to all advertised portals.

You can check the active Open-iSCSI sessions (connections to a particular portal) with the following command:

iscsiadm -m session

You can see the LUNs and corresponding block devices for each session with the following command:

iscsiadm -m session -P3

Example output with 2 portals and 3 LUNs (some parts are omitted for brevity):

# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 2.1.8
Target: iqn.2005-03.org.open-iscsi.iscsi:iscsivolumea (non-flash)
	Current Portal: 10.2.1.127:3262,1
	Persistent Portal: 10.2.1.127:3262,1
	[...]
		scsi4 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdd		State: running
		scsi4 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdh		State: running
		scsi4 Channel 00 Id 0 Lun: 2
			Attached scsi disk sdg		State: running
	Current Portal: 10.1.1.127:3261,1
	Persistent Portal: 10.1.1.127:3261,1
	[....]
		Host Number: 5	State: running
		scsi5 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdc		State: running
		scsi5 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdf		State: running
		scsi5 Channel 00 Id 0 Lun: 2
			Attached scsi disk sde		State: running

In this example, LUN CH 00 ID 0 LUN 0 is reachable via two paths (two portals), and the two corresponding block devices are /dev/sdd and /dev/sdc. Each of the other two LUNs is also reachable via two paths.

FC/SAS

With FC or SAS, you usually do not need to perform any specific setup steps on the Proxmox VE hosts. Once you have established the connection, you can check that the host sees one block device per path and LUN by running:

ls -l /dev/disk/by-path

Example output with one LUN and two paths:

# ls -l /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 Aug  7 09:57 pci-0000:27:00.0-fc-0x5000001231231234-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Aug  7 09:57 pci-0000:27:00.1-fc-0x5000001231231234-lun-0 -> ../../sdc
[...]

Set up multipath-tools

The default installation does not include the multipath-tools package, so you first need to install it:

apt update
apt install multipath-tools

We recommend using WWIDs to identify LUNs. You can use the scsi_id command to query the WWID for a specific device /dev/sdX.

/lib/udev/scsi_id -g -u -d /dev/sdX

Example if sdb and sdc correspond to two paths to the same LUN with WWID 3600144f028f88a0000005037a95d0001:

# /lib/udev/scsi_id -g -u -d /dev/sdb
3600144f028f88a0000005037a95d0001
# /lib/udev/scsi_id -g -u -d /dev/sdc
3600144f028f88a0000005037a95d0001

Add WWIDs to the WWIDs file

By default, you need to add the WWIDs of all LUNs with multipath connectivity to the file /etc/multipath/wwids. To do this, run the following commands with the appropriate WWID:

multipath -a WWID

An example for the WWID 3600144f028f88a0000005037a95d0001:

# multipath -a 3600144f028f88a0000005037a95d0001

To activate these settings, you need to run:

multipath -r

WWIDs must be added explicitly due to the recommended default setting of find_multipaths "strict", see #Configuration.

Configuration

Then you can create a multipath configuration file /etc/multipath.conf. You can find details about each setting in the multipath.conf manpage.

The optimal multipath configuration highly depends on the SAN you're using. Please check your SAN vendor's documentation for recommendations how to set up multipath-tools on Linux.

If your SAN vendor does not provide recommendations, you should start with an empty /etc/multipath.conf. If the file is empty, multipath-tools falls back to its default configuration and a predefined list of device-specific defaults.

After making changes to /etc/multipath.conf, you have to reload the configuration by running

multipath -r

Additional recommendations:

  • We recommend to keep the find_multipaths option at its default value "strict". With this setting, multipath devices are only set up for LUNs whose WWIDs are listed in /etc/multipath/wwids (see #Add WWIDs to the WWIDs file). We recommend this setting because it prevents multipath from automatically setting up unwanted or unnecessary multipath devices.
  • You can optionally use the alias directive to provide a name for the device:
    multipaths {
      multipath {
            wwid "3600144f028f88a0000005037a95d0001"
            alias mpath0
      }
    }
    

    If no explicit name is given, multipath-tools will use the WWID as the name.

Check your SAN vendor's documentation for additional information! You can also check the multipath documentation by Red Hat and Ubuntu.

Note that in a setup with LVM on top of an FC or SAS LUN, it is recommended to additionally install multipath-tools-boot. Then, you also need to regenerate the initramfs after each change to /etc/multipath/wwids. See #FC/SAS-specific configuration for more information.

Query device status

You can view the status with:

multipath -ll

An example:

# multipath -ll
mpath0 (3600144f028f88a0000005037a95d0001) dm-3 NEXENTA,NEXENTASTOR
size=64G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=2 status=active
  |- 5:0:0:0 sdb 8:16 active ready running
  `- 6:0:0:0 sdc 8:32 active ready running

To get more information about used devices use:

multipath -v3

Multipath setup in a Proxmox VE cluster

If you have a Proxmox VE cluster, you have to perform the setup steps above on each cluster node. Any changes to the multipath configuration must be performed on each cluster node. Multipath configuration is not replicated between cluster nodes.

LVM on top of a LUN

You can set up a storage for guest disks on top of an iSCSI/FC/SAS LUN by creating an LVM Physical Volume (PV) and LVM Volume Group (VG) on top of the LUN. After setting up multipath for the LUN, there are two possibilities to create a corresponding LVM storage.

  • Via the GUI (only applicable for iSCSI LUNs): First, create an iSCSI storage in the GUI as described above. Then, navigate to the Datacenter->Storage panel and click "Add: LVM". Choose the iSCSI storage as the "Base Storage" and the LUN as the "Base Volume", enter the name of the new VG, and click "Add". Creating the LVM storage this way has the advantage that Proxmox VE will activate the iSCSI storage before trying to activate the LVM storage. See the admin guide for more information on the base property.
  • Via the CLI: The following commands create a PV and VG on top of the LUN (replace DEVICE_NAME with the name of the multipath device, and VG_NAME with the desired name for the VG):
    pvcreate /dev/mapper/DEVICE_NAME
    vgcreate VG_NAME /dev/mapper/DEVICE_NAME

    You can then create an LVM storage for that VG in the Proxmox GUI.

If multiple Proxmox VE cluster nodes can access the LUN, you can inform Proxmox VE about this fact by marking the storage as "shared".

In a Proxmox VE cluster, the new PV and VG may not be immediately visible on all nodes. To make them visible on all nodes, you can either reboot all nodes or run the following command on all nodes:

pvscan --cache

FC/SAS-specific configuration

When using LVM on top of an FC or SAS LUN, we recommend to install the multipath-tools-boot package:

apt install multipath-tools-boot

With this package installed, /etc/multipath.conf and /etc/multipath/wwids are copied to the initramfs. Hence, after making any change to these files (e.g. by running multipath -a WWID), you need to regenerate the initramfs:

update-initramfs -u -k all

Some background: If this package is installed, multipath will be set up earlier in the boot sequence (in early userspace). This is intended to avoid various issues with LVM autoactivation in early userspace in combination with multipath. Block devices corresponding to FC/SAS LUNs (in contrast to iSCSI LUNs) are already visible in early userspace, so LVM may already recognize the PVs on top of them. However, multipath devices are not set up yet, so LVM will wrongly use a block device corresponding to one specific path instead of the multipath device. This can cause issues later on, e.g. when this particular path fails. Also, LVM commands print "Device mismatch" warnings:

WARNING: Device mismatch detected for VG/vm-100-disk-0 which is accessing /dev/sdc instead of /dev/mapper/3600144f028f88a0000005037a95d0001.

Another issue can occur if the LVM PV is not created on top of the LUN directly, but on top of a partition on top of the LUN. In that case, multipath may fail to set up the multipath device at all, and multipath -ll will not show any output. See this forum post for more information.

If multipath-tools-boot is installed, multipath is already set up in early userspace and /etc/multipath/wwids is available in the initramfs. LVM will read that file in early userspace and ignore any block devices with WWIDs listed in that file until a multipath device is available. This will prevent the issues outlined above.

Troubleshooting

Block devices for iSCSI/FC LUNs not visible

If you do not see block devices for iSCSI/FC LUNs in the output of lsblk, check the permission settings on the SAN. Make sure the host (the iSCSI/FC initiator) has the permissions required to access the LUNs.

LVM storage on top of LUN is greyed out in the GUI

Check whether you see the block devices for the individual paths in lsblk. If you see the paths, check whether the VG is listed in the output of vgs. If it is not listed, check the output of vgscan -vvv.

Often, the issue is that multipath-tools is not correctly set up. Make sure that you follow the instructions in #Setup multipath-tools. Check that the multipath device shows up in the output of multipath -ll.

"Device mismatch detected" warnings

See #FC/SAS-specific configuration.

Multipath device for FC LUN not visible after boot

Check whether you see the block devices for the individual paths in lsblk. If you see them but multipath -ll shows no multipath device, check the output of multipath -v3.

If multipath -v3 shows errors like the following:

1550.188682 | libdevmapper: ioctl/libdm-iface.c(1980): device-mapper: reload ioctl on 3690b22c00008da2c000008a35098b0dc (252:6) failed: Device or resource busy

the issue is likely due to an unwanted interaction between multipath and LVM. See #FC/SAS-specific configuration for instructions how to work around the issue.