Storage: ZFS over iSCSI

From Proxmox VE
Jump to: navigation, search

Technology and features

As of Proxmox 3.3 the ZFS storage plugin is full supported which means the ability to use an external storage based on ZFS via iSCSI. The plugin will seamlessly integrate the ZFS storage as a viable storage backend for creating VM's using the the normal VM creation wizard in Proxmox.

When Proxmox creates the raw disk image it will use the plugin to create a ZFS volume as the storage which contains the disk image. Eg. a ZFS volume will be created for every disk image like tank/vm-100-disk-1. Being a native ZFS volume also means that Proxmox will provide users live snapshots and cloning of VM's using ZFS' native snapshot and volume reference features.

Since ZFS is available on several platforms using different iSCSI target implementation the plugin has a number of helper modules each providing the needed iSCSI functionality for the specific platform. For now iSCSI modules exists for the following platforms:

  • Solaris based platforms using Comstar. Tested on Omnios and Nexenta Store. For GUI use napp-it or Nexenta.
  • BSD based platforms using Istgt. Tested on FreeBSD 8.3, 9.0, 9.1. For GUI use zfsguru.
  • Linux based platforms with zfsonlinux using Iet. Tested on Debian Wheezy. I have no knowledge of available GUI's. Edit 2013-10-30: I have begun developing a ZFS plugin for OpenMediaVault in collaboration with the OpenMediaVault team. A beta release of the plugin is scheduled ultimo next month (November 2013).

A word of caution. For enterprise usecases I would only recommend solaris based platforms with Comstar. Linux based platforms can IMHO be used in a non-enterprise setup which requires working HA. I will not recommend BSD based platforms for enterprise and/or HA setups due to limitations in the current iSCSI target implementation. Istgt will require a restart of the daemon every time a LUN is to be deleted or updated which means dropping all current connections. Work has begun to provide a native iSCSI target for FreeBSD 10 which hopefully will solve this inconvenience. NOTE: This is fixed in FreeBSD 10.x URL

Platform notes

  • On all zfs storages nodes the following should be added to /etc/ssh/sshd_config:

For old ssh from Solaris based OS

LookupClientHostnames no 
VerifyReverseMapping no 
GSSAPIAuthentication no

For OS which use openssh

UseDNS no
GSSAPIAuthentication no
  • For all storage platforms the distribution of root's ssh key is maintained through Proxmox's cluster wide file system which means you have to create this folder: /etc/pve/priv/zfs. In this folder you place the ssh key to use for each ZFS storage and the name of the key follows this naming scheme: <portal>_id_rsa. Portal is entered in the gui wizard's field portal so if a ZFS storage is referenced via the IP 192.168.1.1 then this IP is entered in the field portal and therefore the key will have this name: 192.168.1.1_id_rsa. Creating the key is simple. As root do the following:
mkdir /etc/pve/priv/zfs
ssh-keygen -f /etc/pve/priv/zfs/192.168.1.1_id_rsa
ssh-copy-id -i /etc/pve/priv/zfs/192.168.1.1_id_rsa.pub root@192.168.1.1
  • login once to zfs san from each proxmox node
ssh -i /etc/pve/priv/zfs/192.168.1.1_id_rsa root@192.168.1.1

The authenticity of host '192.168.1.1 (192.168.1.1)' can't be established.
RSA key fingerprint is 8c:f9:46:5e:40:65:b4:91:be:41:a0:25:ef:7f:80:5f.
Are you sure you want to continue connecting (yes/no)? yes

If you are logged in without errors you are ready to use your storage.

  • The key creation is only needed once for each portal so if the same portal provides several targets which is used for several storages in Proxmox you only create one key.
  • Solaris: Apart from performing the steps above no other things must be done.
  • BSD: Apart from performing the steps above the following is required: Since istgt must have at least one LUN before enabling a target you will have to create one LUN manually. The size is irrelevant so a LUN referencing a volume with size 1MB is sufficient but remember to name the volume with something different than the Proxmox naming scheme to avoid having it show up in the Proxmox content GUI.
  • Linux: Apart from performing the steps above no other things must be done.
  • Nexenta: Apart from performing the steps above the following is required: rm /root/.bash_profile. To avoid to go in nmc console by default.

Proxmox configuration

Use the GUI (Datacenter/Storage: Add ZFS) which will add configuration like below to /etc/pve/storage.cfg

zfs: solaris
	blocksize 4k
	target iqn.2010-08.org.illumos:02:b00c9870-6a97-6f0b-847e-bbfb69d2e581:tank1
	pool tank
	iscsiprovider comstar
	portal 192.168.3.101
	content images

zfs: BSD
	blocksize 4k
	target iqn.2007-09.jp.ne.peach.istgt:tank1
	pool tank
	iscsiprovider istgt
	portal 192.168.3.114
	content images

zfs: linux
	blocksize 4k
	target iqn.2001-04.com.example:tank1
	pool tank
	iscsiprovider iet
	portal 192.168.3.196
	content images

Then you can simply create disk with proxmox gui.

Extra configuration

  • Thin provision: When this option is checked volumes will only use actual space and grow as needed until limit is reached.
  • Write cache: When this option is unchecked the iSCSI write cache is disabled. Disabling write cache makes every write to the LUN synchronous thus reducing write performance but ensures data is persisted after each flush request made by the VM (if volumes has sync disabled data is only flushed to log!). If write cache is enabled then data persistence is left to the zfs volumes sync setting to decide when data should be flushed to disk. When iSCSI write cache is enabled your volume should have sync=standard or sync=always to ensure against data loss. Write cache is only configurable with Comstar. For istgt and iet write cache is disabled in the driver and cannot be enabled.
  • Host group and target group: If your storage node is configured to restrict access through host and target group this is where you should enter the required information.

Note: iscsi multipath doesn't work yet, so it's use only the portal ip for the iscsi connection.