OpenVZ on ISCI

This is fairly straight forward to accomplish and also allows (offline) migration between proxmox cluster nodes.

Offline migration means there is a 5 second outage between being relocated to another node in the cluster. The phase 1 sync is done while online, the container is then shutdown, and phase 2 sync is completed and then the container is brought up on the other node.

Of course this does depend on how long the services etc take to shutdown and how much data there is in the Phase 2 sync. So don't quote me on the times :) (The time mentioned was for a DNS,DHCP,File Server I've tested personally).

This assumes you have already setup a cluster, and SAN. Also this setup example is for a 3 node cluster. So if you have more nodes or only 1, just use your brains to work it out from this example.

NOTE: OpenVZ cannot share the same LUN on different nodes. So you need ONE LUN PER CLUSTER NODE. You can however have as many containers per node/LUN.

For example:

Master node0 connects to - LUN1 - 10 containers
Cluster node1 connects to - LUN2 - 5 containers
Cluster node2 connects to - LUN3 - 40 containers
Etc..........

You can also run KVM instances at the same time however they require their OWN LUN's also

Add the iSCSI target to the master server

Go to the web interface for Proxmox and add the iSCSI target under storage.

If you have multiple targets ensure you add each target for all the LUN's you need.

Set ISCSI to automatic connection

Open an SSH connection to the Proxmox Master Node.

First do an fdisk -l so we have a before and after view of system devices.

You will end up with something similar to the following. Make a note of what are your local system devices so you know what is there already. This goes for whether you have existing SAN connections or not - it's best to know what the system looks like so you can see what devices are added and easily reference them later.

ie: /dev/sda1; /dev/sda2 etc

	prox:~# fdisk -l

		Disk /dev/sda: 8589 MB, 8589934592 bytes
		255 heads, 63 sectors/track, 1044 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		   Device Boot      Start         End      Blocks   Id  System
		/dev/sda1   *           1          66      524288   83  Linux
		Partition 1 does not end on cylinder boundary.
		/dev/sda2              66        1044     7861610   8e  Linux LVM

		Disk /dev/dm-0: 1073 MB, 1073741824 bytes
		255 heads, 63 sectors/track, 130 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-0 doesn't contain a valid partition table

		Disk /dev/dm-1: 2147 MB, 2147483648 bytes
		255 heads, 63 sectors/track, 261 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-1 doesn't contain a valid partition table

		Disk /dev/dm-2: 3758 MB, 3758096384 bytes
		255 heads, 63 sectors/track, 456 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-2 doesn't contain a valid partition table

Now Edit the iSCSI node config file per instance of target/lun you have added. For mine I have a single target, with all LUNS being used by openVZ availble on that single target. If you like to use one target per LUN that can also be done, but just make sure you edit each node config file as follows:

nano -w /etc/iscsi/node/iqn_for_your_node_here/IP_Address_and_port_here/default

You should end up with something like this - you can hit tab of course to help you with this path as you are entering it:

nano -w /etc/iscsi/nodes/iqn.2006-01.com.openfiler\:tsn.672802aca9d8/10.5.0.6\,3260\,1/default

Near the top of the file change node.startup to automatic

node.startup = automatic

Near the bottom change node.conn[0] to automatic

node.conn[0].startup = automatic

Exit and save the file.

Restart the open-iscsi service with the following command:

prox1:#/etc/init.d/open-iscsi restart 

Disconnecting iSCSI targets:Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal:10.5.0.6,3260]
Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful
Stopping iSCSI initiator service:.
Starting iSCSI initiator service: iscsid.
Setting up iSCSI targets:
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful

Mounting network filesystems:.

NOTE: I have noticed that sometimes this is changed automatically back to manual. It is important to have this set to automatic if you have any servers set to boot at startup. It is also important to check this file on ALL cluster nodes to ensure they are also set, because the file replicates it with the setting as manual. SO this needs to be completed on each cluster node!

This config file is also where you setup authentication for iscsi targets if you are using CHAP authentication (not recommended)

Setting up the LUN for openVZ

First confirm that the LUN you are wanting to use for this cluster node is now available to the system with fdisk -l.

Your output should now look something like the following:

prox:# fdisk -l

		Disk /dev/sda: 8589 MB, 8589934592 bytes
		255 heads, 63 sectors/track, 1044 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Device Boot      Start         End      Blocks   Id  System
		/dev/sda1   *           1          66      524288   83  Linux
		Partition 1 does not end on cylinder boundary.
		/dev/sda2              66        1044     7861610   8e  Linux LVM

		Disk /dev/dm-0: 1073 MB, 1073741824 bytes
		255 heads, 63 sectors/track, 130 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-0 doesn't contain a valid partition table

		Disk /dev/dm-1: 2147 MB, 2147483648 bytes
		255 heads, 63 sectors/track, 261 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-1 doesn't contain a valid partition table

		Disk /dev/dm-2: 3758 MB, 3758096384 bytes
		255 heads, 63 sectors/track, 456 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-2 doesn't contain a valid partition table

		Disk /dev/sdb: 148.0 GB, 148049946624 bytes
		64 heads, 32 sectors/track, 141290 cylinders
		Units = cylinders of 2048 * 512 = 1048576 bytes
		Disk identifier: 0x22ea0421

		Disk /dev/sdb doesn't contain a valid partition table

		Disk /dev/sdc: 148.0 GB, 148049946624 bytes
		64 heads, 32 sectors/track, 141290 cylinders
		Units = cylinders of 2048 * 512 = 1048576 bytes
		Disk identifier: 0x79266f52

		Disk /dev/sdc doesn't contain a valid partition table

		Disk /dev/sdd: 148.0 GB, 148049946624 bytes
		64 heads, 32 sectors/track, 141290 cylinders
		Units = cylinders of 2048 * 512 = 1048576 bytes
		Disk identifier: 0x2480ea13

		Disk /dev/sdd doesn't contain a valid partition table

As you can see from the above output /dev/sdb; /dev/sdc; & /dev/sdd have been added to my system. These are the 3 LUNS (1 per node) that we will be adding.

To make it clear they will be used as follows:

          /dev/sdb - prox  (cluster node 1)
          /dev/sdc - prox1 (cluster node 2)
          /dev/sdd - prox2 (cluster node 3)

So now we know what device we will be using for node 1 we do the following to create a partition on it:

prox1:#fdisk /dev/sdb
			
			Type 'n' and press enter to create a new partition
			Type 'p' and enter for primary partition
			Type '1' and enter for the 1st (and only) partition
			Press enter to accept the default start cylinder
			Press enter to accept the default end cylinder
			Type 't' and enter to set the system type
			Type '83' and enter to set it as Linux
			Type 'w' and enter to save changes and exit

Now create the file system on the partition you just created by running:

prox1:#mkfs.ext3 /dev/sdb1 (Obviously set the /dev/???1 to whatever your device is in your system)

You should see similar to the following output:

prox1:# mkfs.ext3 /dev/sdb1
		mke2fs 1.41.3 (12-Oct-2008)
		Filesystem label=
		OS type: Linux
		Block size=4096 (log=2)
		Fragment size=4096 (log=2)
		10756096 inodes, 43005997 blocks
		2150299 blocks (5.00%) reserved for the super user
		First data block=0
		Maximum filesystem blocks=4294967296
		1313 block groups
		32768 blocks per group, 32768 fragments per group
		8192 inodes per group
		Superblock backups stored on blocks: 
			32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
			4096000, 7962624, 11239424, 20480000, 23887872

		Writing inode tables: done                            
		Creating journal (32768 blocks): done
		Writing superblocks and filesystem accounting information: done

		This filesystem will be automatically checked every 38 mounts or
		180 days, whichever comes first.  Use tune2fs -c or -i to override.

Mounting the filesystem

Now you can mount the filesystem you have just created. As we are wanting to use the iSCSI to hold openVZ containers we will have to remount the local copy to another folder.

So first create a folder

# mkdir /var/lib/vz1 (or whatever name you want to give it)

Now we open up fstab and edit the local mount and add in our iscsi mount point

# nano -w /etc/fstab

You will see a line similar or the same as this near the top of the file:

/dev/pve/data /var/lib/vz ext3 defaults 0 1

CHANGE that to point to the new folder you just created - for example:

/dev/pve/data /var/lib/vz1 ext3 defaults 0 1

Near the bottom of the file you will need to add in the following line:

/dev/sdb1 /var/lib/vz ext3 defaults,auto,_netdev 0 0

Note: Obviously change the /dev/???1 to whatever device is in your system

Now exit and save the file and run mount -a

# mount -a
		#

Note: You don't want to see errors at this point, or any feedback at this point otherwise you've done something wrong

If successfull you can now look at the filesystems mounted in your server with:

prox1:#df -h

You will see something similar to this:

prox1:#/etc/qemu-server# df -h
		Filesystem            Size  Used Avail Use% Mounted on
		/dev/mapper/pve-root   17G  5.3G   11G  34% /
		tmpfs                  16G     0   16G   0% /lib/init/rw
		udev                   10M  616K  9.4M   7% /dev
		tmpfs                  16G     0   16G   0% /dev/shm
		/dev/mapper/pve-data   38G  4.7G   34G  13% /var/lib/vz1
		/dev/sda1             504M   31M  448M   7% /boot
		172.15.241.29:/mnt/luns/nfs/ISO
				       24G  172M   23G   1% /mnt/pve/ISO
		/dev/sdb1             162G  188M  154G   1% /var/lib/vz

Notice how the local disk is now mounted to the new location and the iscsi is mounted to the default openVZ folder

Copying your containers and booting up

Now that you have the filesystem mounted and setup correctly you can now copy the local openVZ files along with any existing containers into the /var/lib/vz folder. Complete this step even if you have no existing containers as the file structure and system files are still required to exist BEFORE you start adding them like you would normally via the proxmox web interface.

prox1:#cp -vpr /var/lib/vz1/. /var/lib/vz/. - this will recursively copy all existing data across to the SAN

If you want to confirm that this is actually happening you can simply run the df -h command and you will see the Used and Avail sizes changing for /dev/???1

Setting up the cluster nodes

Repeat steps 2 to 5 on any of your cluster nodes. Just ensure that for each node you are setting up that you change the relevant commands and config files to point to the correct device for each node.

In this example I have 3 nodes in total, so I will use the following

		/dev/sdb1 - lun0 - master cluster node 0
		/dev/sdc1 - lun1 - cluster node 1
		/dev/sdd1 - lun2 - cluster node 2

Testing migration out

Once you have setup all nodes successfully!! you can then start migrating to your hearts content

Only offline migrations work. But as stated there is minimal downtime even for offline migrations

Things to consider

1. There is significant network and cpu overhead while migrating containers because this is NOT a shared filesystem as in KVM. So ALL data for the container you are migrating is copied from one LUN to the other LUN via the host nodes you are migrating from and to.

2. I'm not sure what script is being used to do the migration or if it is a command only that handles this but after it is migrated the /var/lib/private/CTID folder for the container being migrated is deleted from the source host.

In my mind, rsync (which is what is used during the migration process) being a powerful command, should be more than capable of leaving both folders in place and only synching the differences.

That way you could manually copy "frequently migrated" containers to all nodes and run synch jobs afterhours and if you "need" to migrate a container during the day for whatever reason, you would reduce network & cpu load and the impact on other guests as you migrate, because it would ONLY be migrating changed data, NOT the entire container each and every time.

NOTE: I have posted in the Proxmox forum regarding this so i will update when i hear anything else

OpenVZ on ISCSI howto

Contents