OpenVZ on ISCSI howto
OpenVZ on ISCI
This is fairly straight forward to accomplish and also allows (offline) migration between proxmox cluster nodes.
Offline migration means there is a 5 second outage between being relocated to another node in the cluster. The phase 1 sync is done while online, the container is then shutdown, and phase 2 sync is completed and then the container is brought up on the other node.
Of course this does depend on how long the services etc take to shutdown and how much data there is in the Phase 2 sync. So don't quote me on the times :) (The time mentioned was for a DNS,DHCP,File Server I've tested personally).
This assumes you have already setup a cluster, and SAN. Also this setup example is for a 3 node cluster. So if you have more nodes or only 1, just use your brains to work it out from this example.
NOTE: OpenVZ cannot share the same LUN on different nodes. So you need ONE LUN PER CLUSTER NODE.
You can however have as many containers per node/LUN.
For example:
- Master node0 connects to - LUN1 - 10 containers
- Cluster node1 connects to - LUN2 - 5 containers
- Cluster node2 connects to - LUN3 - 40 containers
- Etc..........
You can also run KVM instances at the same time however they require their OWN LUN's
Add the iSCSI target to the master server
- Go to the web interface for Proxmox and add the iSCSI target under storage.
- If you have multiple targets ensure you add each target for all the LUN's you need.
Set ISCSI to automatic connection
- Open an SSH connection to the Proxmox Master Node.
- First do an fdisk -l so we have a before and after view of system devices.
You will end up with something similar to the following. Make a note of what are your local system devices so you know what is there already. This goes for whether you have existing SAN connections or not - it's best to know what the system looks like so you can see what devices are added and easily reference them later.
ie: /dev/sda1; /dev/sda2 etc
prox:~# fdisk -l Disk /dev/sda: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 * 1 66 524288 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 66 1044 7861610 8e Linux LVM Disk /dev/dm-0: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/dm-1: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-1 doesn't contain a valid partition table Disk /dev/dm-2: 3758 MB, 3758096384 bytes 255 heads, 63 sectors/track, 456 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-2 doesn't contain a valid partition table
- Now Edit the iSCSI node config file per instance of target/lun you have added. For mine I have a single target, with all LUNS being used by openVZ availble on that single target. If you like to use one target per LUN that can also be done, but just make sure you edit each node config file as follows:
nano -w /etc/iscsi/node/iqn_for_your_node_here/IP_Address_and_port_here/default
You should end up with something like this - you can hit tab of course to help you with this path as you are entering it:
nano -w /etc/iscsi/nodes/iqn.2006-01.com.openfiler\:tsn.672802aca9d8/10.5.0.6\,3260\,1/default
- Near the top of the file change node.startup to automatic
node.startup = automatic
- Near the bottom change node.conn[0] to automatic
node.conn[0].startup = automatic
- Exit and save the file.
- Restart the open-iscsi service with the following command:
prox1:#/etc/init.d/open-iscsi restart Disconnecting iSCSI targets:Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal:10.5.0.6,3260] Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful Stopping iSCSI initiator service:. Starting iSCSI initiator service: iscsid. Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260] Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.672802aca9d8, portal: 10.5.0.6,3260]: successful Mounting network filesystems:.
NOTE: I have noticed that sometimes this is changed automatically back to manual. It is important to have this set to automatic if you have any servers set to boot at startup. It is also important to check this file on ALL cluster nodes to ensure they are also set, because the file replicates it with the setting as manual. SO this needs to be completed on each cluster node!
- This config file is also where you setup authentication for iscsi targets if you are using CHAP authentication (not recommended)
Setting up the LUN for openVZ
- First confirm that the LUN you are wanting to use for this cluster node is now available to the system with fdisk -l.
Your output should now look something like the following:
prox:# fdisk -l Disk /dev/sda: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sda1 * 1 66 524288 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 66 1044 7861610 8e Linux LVM Disk /dev/dm-0: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/dm-1: 2147 MB, 2147483648 bytes 255 heads, 63 sectors/track, 261 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-1 doesn't contain a valid partition table Disk /dev/dm-2: 3758 MB, 3758096384 bytes 255 heads, 63 sectors/track, 456 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/dm-2 doesn't contain a valid partition table Disk /dev/sdb: 148.0 GB, 148049946624 bytes 64 heads, 32 sectors/track, 141290 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Disk identifier: 0x22ea0421 Disk /dev/sdb doesn't contain a valid partition table Disk /dev/sdc: 148.0 GB, 148049946624 bytes 64 heads, 32 sectors/track, 141290 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Disk identifier: 0x79266f52 Disk /dev/sdc doesn't contain a valid partition table Disk /dev/sdd: 148.0 GB, 148049946624 bytes 64 heads, 32 sectors/track, 141290 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Disk identifier: 0x2480ea13 Disk /dev/sdd doesn't contain a valid partition table
As you can see from the above output /dev/sdb; /dev/sdc; & /dev/sdd have been added to my system. These are the 3 LUNS (1 per node) that we will be adding.
To make it clear they will be used as follows:
/dev/sdb - prox (cluster node 1) /dev/sdc - prox1 (cluster node 2) /dev/sdd - prox2 (cluster node 3)
- So now we know what device we will be using for node 1 we do the following to create a partition on it:
prox1:#fdisk /dev/sdb Type 'n' and press enter to create a new partition Type 'p' and enter for primary partition Type '1' and enter for the 1st (and only) partition Press enter to accept the default start cylinder Press enter to accept the default end cylinder Type 't' and enter to set the system type Type '83' and enter to set it as Linux Type 'w' and enter to save changes and exit
- Now create the file system on the partition you just created by running:
prox1:#mkfs.ext3 /dev/sdb1 (Obviously set the /dev/???1 to whatever your device is in your system)
You should see similar to the following output:
prox1:# mkfs.ext3 /dev/sdb1 mke2fs 1.41.3 (12-Oct-2008) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 10756096 inodes, 43005997 blocks 2150299 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 1313 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 38 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Mounting the filesystem
- Now you can mount the filesystem you have just created. As we are wanting to use the iSCSI to hold openVZ containers we will have to remount the local copy to another folder.
So first create a folder
# mkdir /var/lib/vz1 (or whatever name you want to give it)
- Now we open up fstab and edit the local mount and add in our iscsi mount point
# nano -w /etc/fstab
- You will see a line similar or the same as this near the top of the file:
/dev/pve/data /var/lib/vz ext3 defaults 0 1
CHANGE that to point to the new folder you just created - for example:
/dev/pve/data /var/lib/vz1 ext3 defaults 0 1
- Near the bottom of the file you will need to add in the following line:
/dev/sdb1 /var/lib/vz ext3 defaults,auto,_netdev 0 0
Note: Obviously change the /dev/???1 to whatever device is in your system
- Now exit and save the file and run mount -a
# mount -a #
Note: You don't want to see errors at this point, or any feedback at this point otherwise you've done something wrong
- If successfull you can now look at the filesystems mounted in your server with:
prox1:#df -h
You will see something similar to this:
prox1:#/etc/qemu-server# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/pve-root 17G 5.3G 11G 34% / tmpfs 16G 0 16G 0% /lib/init/rw udev 10M 616K 9.4M 7% /dev tmpfs 16G 0 16G 0% /dev/shm /dev/mapper/pve-data 38G 4.7G 34G 13% /var/lib/vz1 /dev/sda1 504M 31M 448M 7% /boot 172.15.241.29:/mnt/luns/nfs/ISO 24G 172M 23G 1% /mnt/pve/ISO /dev/sdb1 162G 188M 154G 1% /var/lib/vz
Notice how the local disk is now mounted to the new location and the iscsi is mounted to the default openVZ folder
Copying your containers and booting up
- Now that you have the filesystem mounted and setup correctly you can now copy the local openVZ files along with any existing containers into the /var/lib/vz folder. Complete this step even if you have no existing containers as the file structure and system files are still required to exist BEFORE you start adding them like you would normally via the proxmox web interface.
prox1:#cp -vpr /var/lib/vz1/. /var/lib/vz/. - this will recursively copy all existing data across to the SAN
If you want to confirm that this is actually happening you can simply run the df -h command and you will see the Used and Avail sizes changing for /dev/???1
Setting up the cluster nodes
- Repeat steps 2 to 5 on any of your cluster nodes. Just ensure that for each node you are setting up that you change the relevant commands and config files to point to the correct device for each node.
In this example I have 3 nodes in total, so I will use the following /dev/sdb1 - lun0 - master cluster node 0 /dev/sdc1 - lun1 - cluster node 1 /dev/sdd1 - lun2 - cluster node 2
Testing migration out
- Once you have setup all nodes successfully!! you can then start migrating to your hearts content
- Only offline migrations work. But as stated there is minimal downtime even for offline migrations
- Things to consider
1. There is significant network and cpu overhead while migrating containers because this is NOT a shared filesystem as in KVM. So ALL data for the container you are migrating is copied from one LUN to the other LUN via the host nodes you are migrating from and to.
2. I'm not sure what script is being used to do the migration or if it is a command only that handles this but after it is migrated the /var/lib/private/CTID folder for the container being migrated is deleted from the source host.
In my mind, rsync (which is what is used during the migration process) being a powerful command, should be more than capable of leaving both folders in place and only synching the differences.
That way you could manually copy "frequently migrated" containers to all nodes and run synch jobs afterhours and if you "need" to migrate a container during the day for whatever reason, you would reduce network & cpu load and the impact on other guests as you migrate, because it would ONLY be migrating changed data, NOT the entire container each and every time.
NOTE: I have posted in the Proxmox forum regarding this so i will update when i hear anything else