OpenVZ on ISCSI howto

From Proxmox VE
Jump to: navigation, search
Yellowpin.svg Note: Article about the old stable Proxmox VE 3.x releases

OpenVZ on iSCSI

This is fairly straight forward to accomplish and also allows (offline) migration between Proxmox cluster nodes.

Offline migration means there is a 5 second outage between being relocated to another node in the cluster. The phase 1 sync is done while online, the container is then shutdown, and phase 2 sync is completed and then the container is brought up on the other node.

Of course this does depend on how long the services etc take to shutdown and how much data there is in the Phase 2 sync. So don't quote me on the times :) (The time mentioned was for a DNS,DHCP,File Server I've tested personally).

This assumes you have already setup a cluster, and SAN. Also this setup example is for a 3 node cluster. So if you have more nodes or only 1, just use your brains to work it out from this example.

NOTE: OpenVZ cannot share the same LUN (Logical_Unit_Number) on different nodes. So you need ONE LUN PER CLUSTER NODE. You can however have as many containers per node/LUN.

For example:

  • Master node0 connects to - LUN1 - 10 containers
  • Cluster node1 connects to - LUN2 - 5 containers
  • Cluster node2 connects to - LUN3 - 40 containers
  • Etc..........

You can also run KVM instances at the same time however they require their OWN LUN's

Add the iSCSI target to the master server

  • Before you do this lets get a look at our system devices
  • Open an SSH connection to the Proxmox Master Node.
  • Run fdisk -l so we have a before and after view of system devices.

You will end up with something similar to the following. Make a note of what are your local system devices so you know what is there already. This goes for whether you have existing SAN connections or not - it's best to know what the system looks like so you can see what devices are added and easily reference them later.

ie: /dev/sda1; /dev/sda2 etc

	prox:~# fdisk -l

		Disk /dev/sda: 8589 MB, 8589934592 bytes
		255 heads, 63 sectors/track, 1044 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		   Device Boot      Start         End      Blocks   Id  System
		/dev/sda1   *           1          66      524288   83  Linux
		Partition 1 does not end on cylinder boundary.
		/dev/sda2              66        1044     7861610   8e  Linux LVM

		Disk /dev/dm-0: 1073 MB, 1073741824 bytes
		255 heads, 63 sectors/track, 130 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-0 doesn't contain a valid partition table

		Disk /dev/dm-1: 2147 MB, 2147483648 bytes
		255 heads, 63 sectors/track, 261 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-1 doesn't contain a valid partition table

		Disk /dev/dm-2: 3758 MB, 3758096384 bytes
		255 heads, 63 sectors/track, 456 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-2 doesn't contain a valid partition table
  • Go to the web interface for Proxmox and add the iSCSI target under storage.
  • If you have multiple targets ensure you add each target for all the LUN's you need.

Set ISCSI to automatic connection

  • Now Edit the iSCSI node config file per instance of target/lun you have added. For mine I have a single target, with all LUNS being used by openVZ availble on that single target. If you like to use one target per LUN that can also be done, but just make sure you edit each node config file as follows:
nano -w /etc/iscsi/node/iqn_for_your_node_here/IP_Address_and_port_here/default

You should end up with something like this - you can hit tab of course to help you with this path as you are entering it:

nano -w /etc/iscsi/nodes/iqn.2006-01.com.openfiler\:tsn.672802aca9d8/10.5.0.6\,3260\,1/default
  • Near the top of the file change node.startup to automatic

node.startup = automatic

  • Near the bottom change node.conn[0] to automatic

node.conn[0].startup = automatic

  • Exit and save the file.
  • Restart the open-iscsi service with the following command:
prox1:#/etc/init.d/open-iscsi restart 

Disconnecting iSCSI targets:Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal:10.5.0.6,3260]
Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful
Stopping iSCSI initiator service:.
Starting iSCSI initiator service: iscsid.
Setting up iSCSI targets:
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful

Mounting network filesystems:.

NOTE:I have noticed that sometimes this is changed automatically back to manual. It is important to have this set to automatic if you have any servers set to boot at startup. It is also important to check this file on ALL cluster nodes to ensure they are also set, because the file replicates it with the setting as manual. SO this needs to be completed on each cluster node!


  • This config file is also where you setup authentication for iscsi targets if you are using CHAP authentication (not recommended)


Setting up the LUN for OpenVZ

  • First confirm that the LUN you are wanting to use for this cluster node is now available to the system with fdisk -l.

Your output should now look something like the following:

prox:# fdisk -l

		Disk /dev/sda: 8589 MB, 8589934592 bytes
		255 heads, 63 sectors/track, 1044 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Device Boot      Start         End      Blocks   Id  System
		/dev/sda1   *           1          66      524288   83  Linux
		Partition 1 does not end on cylinder boundary.
		/dev/sda2              66        1044     7861610   8e  Linux LVM

		Disk /dev/dm-0: 1073 MB, 1073741824 bytes
		255 heads, 63 sectors/track, 130 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-0 doesn't contain a valid partition table

		Disk /dev/dm-1: 2147 MB, 2147483648 bytes
		255 heads, 63 sectors/track, 261 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-1 doesn't contain a valid partition table

		Disk /dev/dm-2: 3758 MB, 3758096384 bytes
		255 heads, 63 sectors/track, 456 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0x00000000

		Disk /dev/dm-2 doesn't contain a valid partition table

		Disk /dev/sdb: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xf2cc6ee0

		Disk /dev/sdb doesn't contain a valid partition table

		Disk /dev/sdc: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xf8fa13a9

		Disk /dev/sdc doesn't contain a valid partition table


		Disk /dev/sdd: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xd1403fdb

		Disk /dev/sdd doesn't contain a valid partition table

As you can see from the above output /dev/sdb; /dev/sdc; & /dev/sdd have been added to my system. These are the 3 LUNS (1 per node) that we will be adding.

NOTE: The disk identifiers MAY also appear as 0000000000 until you create the partitions and file system as detailed below.

To make it clear they will be used as follows:

          /dev/sdb - prox  (cluster node 1)
          /dev/sdc - prox1 (cluster node 2)
          /dev/sdd - prox2 (cluster node 3)
  • So now we know what device we will be using for node 1 we do the following to create a partition on it:
prox1:#fdisk /dev/sdb
			
			Type 'n' and press enter to create a new partition
			Type 'p' and enter for primary partition
			Type '1' and enter for the 1st (and only) partition
			Press enter to accept the default start cylinder
			Press enter to accept the default end cylinder
			Type 't' and enter to set the system type
			Type '83' and enter to set it as Linux
			Type 'w' and enter to save changes and exit
  • Now create the file system on the partition you just created by running:

prox1:#mkfs.ext3 /dev/sdb1 (Obviously set the /dev/???1 to whatever your device is in your system)


You should see similar to the following output:

prox1:# mkfs.ext3 /dev/sdb1
		mke2fs 1.41.3 (12-Oct-2008)
		Filesystem label=
		OS type: Linux
		Block size=4096 (log=2)
		Fragment size=4096 (log=2)
		10756096 inodes, 43005997 blocks
		2150299 blocks (5.00%) reserved for the super user
		First data block=0
		Maximum filesystem blocks=4294967296
		1313 block groups
		32768 blocks per group, 32768 fragments per group
		8192 inodes per group
		Superblock backups stored on blocks: 
			32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
			4096000, 7962624, 11239424, 20480000, 23887872

		Writing inode tables: done                            
		Creating journal (32768 blocks): done
		Writing superblocks and filesystem accounting information: done

		This filesystem will be automatically checked every 38 mounts or
		180 days, whichever comes first.  Use tune2fs -c or -i to override.
  • Repeat creating the partition and filesystem on any other LUNS you will be using for openvz on other cluster nodes. This is so we can obtain the disk identifier and easily reference what disk is being used in what system.
  • Running fdisk -l now will look something like this for the example 3 LUNS:
		Disk /dev/sdb: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xf2cc6ee0

		   Device Boot      Start         End      Blocks   Id  System
		/dev/sdb1               1       21416   172023988+  83  Linux

		Disk /dev/sdc: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xf8fa13a9

		   Device Boot      Start         End      Blocks   Id  System
		/dev/sdc1               1       21416   172023988+  83  Linux

		Disk /dev/sdd: 176.1 GB, 176160768000 bytes
		255 heads, 63 sectors/track, 21416 cylinders
		Units = cylinders of 16065 * 512 = 8225280 bytes
		Disk identifier: 0xd1403fdb

		   Device Boot      Start         End      Blocks   Id  System
		/dev/sdd1               1       21416   172023988+  83  Linux

NOTE: Make a note of the disk identifiers as these will be the same on all 3 nodes, but the /dev/???1 device mount point can change.


Mounting the filesystem

  • Now you can mount the filesystem you have just created. As we are wanting to use the iSCSI to hold openVZ containers we will have to remount the local copy to another folder.

So first create a folder

# mkdir /var/lib/vz1 (or whatever name you want to give it)
  • Now we open up fstab and edit the local mount and add in our iscsi mount point
# nano -w /etc/fstab
  • You will see a line similar or the same as this near the top of the file:

/dev/pve/data /var/lib/vz ext3 defaults 0 1


CHANGE that to point to the new folder you just created - for example:

/dev/pve/data /var/lib/vz1 ext3 defaults 0 1

  • At the bottom of the file you will need to add in the following line:

/dev/sdb1 /var/lib/vz ext3 defaults,auto,_netdev 0 0

Note: Obviously change the /dev/???1 to whatever device is in your system

  • Now exit and save the file
  • At this point ENSURE there are NO VZ containers running
  • Now run umount /var/lib/vz and mount -a
                prox1:#umount /var/lib/vz
                prox1:#
                prox1:#mount -a
		prox1:#

Note: You don't want to see errors at this point, or any feedback at this point otherwise you've done something wrong

  • If successfull you can now look at the filesystems mounted in your server with:
prox1:#df -h

You will see something similar to this:

prox1:#/etc/qemu-server# df -h
		Filesystem            Size  Used Avail Use% Mounted on
		/dev/mapper/pve-root   17G  5.3G   11G  34% /
		tmpfs                  16G     0   16G   0% /lib/init/rw
		udev                   10M  616K  9.4M   7% /dev
		tmpfs                  16G     0   16G   0% /dev/shm
		/dev/mapper/pve-data   38G  4.7G   34G  13% /var/lib/vz1
		/dev/sda1             504M   31M  448M   7% /boot
		172.15.241.29:/mnt/luns/nfs/ISO
				       24G  172M   23G   1% /mnt/pve/ISO
		/dev/sdb1             162G  188M  154G   1% /var/lib/vz

Notice how the local disk is now mounted to the new location and the iscsi is mounted to the default openVZ folder


Copying your containers and booting up

  • Now that you have the filesystem mounted and setup correctly you can now copy the local openVZ files along with any existing containers into the /var/lib/vz folder. Complete this step even if you have no existing containers as the file structure and system files are still required to exist BEFORE you start adding them like you would normally via the proxmox web interface.
prox1:#cp -vpr /var/lib/vz1/. /var/lib/vz/. - this will recursively copy all existing data across to the SAN

If you want to confirm that this is actually happening you can simply run the df -h command and you will see the Used and Avail sizes changing for /dev/???1


Setting up the cluster nodes

  • As we have now (in steps 2 & 3) already partitioned and created the filesytem for the other LUNs, you only need to repeat steps 4 & 5 on any of your cluster nodes. Just ensure that for each node you are setting up that you change the relevant commands and config files to point to the correct device for each node.

NOTE: If you are adding new LUNs then of course you will need to repeat all steps

Each server MIGHT mount the devices in a different order or with different names SO ALWAYS CHECK THE DISK IDENTIFIER TO ENSURE YOU ARE FORMATTING THE CORRECT LUN

In this example I have 3 nodes in total, so I will use the following

		/dev/sdb1 - lun0 - master cluster node 0
		/dev/sdc1 - lun1 - cluster node 1
		/dev/sdd1 - lun2 - cluster node 2

Testing migration out

  • Once you have setup all nodes successfully!! you can then start migrating to your hearts content
  • Only offline migrations work. But as stated there is minimal downtime even for offline migrations
  • Things to consider

1. There is significant network and cpu overhead while migrating containers because this is NOT a shared filesystem as in KVM. So ALL data for the container you are migrating is copied from one LUN to the other LUN via the host nodes you are migrating from and to, if you do this via the Proxmox gui.

2. vzmigrate is the script that is used to migrate the containers from one node to the other, however by default it removes the source container from the current host after migrating. Instead of doing this via the gui you can run the rsync command manually with a custom script that accepts a from and to vraiable which runs rsync, then vzctl stop, then a final rsync, then vzctl start. This is effectively what an offline vzmigrate does, but of course if you do this manually then you are only synchronising changes not the entire folder and therefore you are significantly reducing network and cpu load in the process.

NOTE: This is only for containers that you might frequently want to move from one cluster node to another. For example less critical services that may spike in load at times and impact more critical services on that node. If say after hours you synch containers that are candidates for frequent migration, then during critical times it would take only a matter of seconds potentially to migrate the container (with little to no impact on other more critical services), instead of minutes (with severe impact on other servers due to high network and cpu load)