DRBD9: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
(Removed outdated docs, now maintained by Linbit)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==


DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1. For detailed information please visit [http://www.linbit.com Linbit].
DRBD9 is removed from the Proxmox VE core distribution since 4.4 and is now maintained directly by Linbit, due to [https://forum.proxmox.com/threads/drbdmanage-license-change.30404/ license change]
 
Main features of the integration in Proxmox VE:
 
*drbd9/drbdmanage; drbd devices on top of LVM
*All VM disks (LVM volumes on the DRBD device) can be replicated in real time on several Proxmox VE nodes via the network.
*Ability to live migrate running machines without downtime in a few seconds WITHOUT the need of SAN (iSCSI, FC, NFS) as the data is already on both nodes.
*LXC containers can use DRBD9 storage
 
'''Note:'''
 
DRBD9 integration is introduced in Proxmox VE 4.0 as technology preview.
 
== System requirements ==
 
You need 3 identical Proxmox VE servers (V4.0 or higher) with the following extra hardware:
 
*Extra NIC (dedicated for DRBD traffic)
*Second disk, SSD, Flash card or raid volume (e.g. /dev/sdb) for DRBD
*Use a hardware raid controller with BBU to eliminate performance issues concerning internal metadata (see [http://fghaas.wordpress.com/2009/08/20/internal-metadata-and-why-we-recommend-it/ Florian´s blog]).
*A functional Proxmox VE Cluster (V4.0 or higher)
*At least 2GB RAM in each node
 
== VM settings when running on top of DRBD ==
*  DRBD supports only the ''raw disk'' format at the moment.
*  You need to change the VM disk cache mode from  the default 'none' to 'writethrough' instead of the default 'none'. Do not use write cache for any virtual drives on top of DRBD as this can cause out of sync blocks. Follow the link for more information: http://forum.proxmox.com/threads/18259-KVM-on-top-of-DRBD-and-out-of-sync-long-term-investigation-results?p=93126
* Consider doing [[#Integrity checking|integrity checking]] periodically to be sure DRBD is consistent
 
=== Network ===
 
Configure the NIC dedicated for DRBD traffic (eth1 in the current example) on all nodes with a fixed private IP address via the web interface and reboot each server.
 
For better understanding, here is an /etc/network/interfaces example from the first node called pve1, after the reboot:
<pre>cat /etc/network/interfaces
# network interface settings
auto lo
iface lo inet loopback
 
iface eth0 inet manual
 
auto eth1
iface eth1 inet static
        address  10.0.15.81
        netmask  255.255.255.0
 
auto vmbr0
iface vmbr0 inet static
        address  192.168.15.81
        netmask  255.255.255.0
        gateway  192.168.15.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0</pre>
And from the second node, called pve2:
<pre># network interface settings
auto lo
iface lo inet loopback
 
iface eth0 inet manual
 
auto eth1
iface eth1 inet static
        address  10.0.15.82
        netmask  255.255.240.0
 
auto vmbr0
iface vmbr0 inet static
        address  192.168.15.82
        netmask  255.255.255.0
        gateway  192.168.15.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0</pre>
 
And finally from the third node pve3: 
<pre># network interface settings
auto lo
iface lo inet loopback
 
iface eth0 inet manual
 
auto eth1
iface eth1 inet static
        address  10.0.0.83
        netmask  255.255.255.0
 
auto vmbr0
iface vmbr0 inet static
        address  192.168.15.83
        netmask  255.255.255.0
        gateway  192.168.15.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0</pre>
 
=== Disk for DRBD ===
DRBD will search for the LVM Volume Group drbdpool.
So you have to create them on all nodes.
 
I will use /dev/sdb1 for DRBD. Therefore I need to create this single big partition on /dev/sdb - make sure they exist on all nodes.
 
To prepare the disk for DRBD just run
 
<pre>
parted /dev/sdb mktable gpt
parted /dev/sdb mkpart drbd 1 100%
parted /dev/sdb p
 
Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sdb: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
 
Number  Start  End    Size    File system  Name  Flags
1      1049kB  512GB  512GB                drbd
</pre>
 
And create then logical volume dedicated for drbd
 
<b>NOTE: The logical volumes must have all the same size on each node!</b>
<pre>
root@proxmox:~# vgcreate drbdpool /dev/sdb1
  Physical volume "/dev/sdb1" successfully created
  Volume group "drbdpool" successfully created
 
root@proxmox:~# lvcreate -L 511G -n drbdthinpool -T drbdpool
  Logical volume "drbdthinpool" created.
</pre>
 
Remember to leave at least 1 extent available to the volume group, if you don't, drdbmanage will fail with the following error : "Volume group "drbdpool" has insufficient free space (0 extents): 1 required."
 
As LVM thin provisionning is used at the logical volume level, install the following package :
 
'''Note:''' thin-provisioning-tools are includes in Proxmox VE 4.2 and above!
<pre>
apt-get install thin-provisioning-tools
</pre>
 
== DRBD configuration ==
 
=== Software installation ===
 
Install the DRBD user tools on all nodes :
<pre>apt-get install drbdmanage -y</pre>
 
And reboot all nodes.
 
=== Configure DRBD ===
 
First make sure that the ssh-keys of each node are in "known_hosts" list from all the other. This can be easily ensured by
 
<pre>
root@pve1:~# ssh 10.0.15.82
root@pve1:~# ssh 10.0.15.83
</pre>
and then
<pre>
root@pve2:~# ssh 10.0.15.81
root@pve2:~# ssh 10.0.15.83
</pre>
and finally
<pre>
root@pve3:~# ssh 10.0.15.81
root@pve3:~# ssh 10.0.15.82
</pre>
 
To configure DRBD9 it is only necessary to run the following command on node pve1:
<pre>
drbdmanage init -q 10.0.15.81
  Failed to find logical volume "drbdpool/.drbdctrl_0"
  Failed to find logical volume "drbdpool/.drbdctrl_1"
  Logical volume ".drbdctrl_0" created.
  Logical volume ".drbdctrl_1" created.
initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.
initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.
empty drbdmanage control volume initialized.
empty drbdmanage control volume initialized.
Operation completed successfully
 
</pre>
 
Now add all nodes of the cluster to DRBD, with the following commands, still on node pve1
(you should check ssh login as root to these nodes work)
<pre>
root@pve1:~# drbdmanage add-node -q pve2 10.0.15.82
Operation completed successfully
Operation completed successfully
 
Executing join command using ssh.
IMPORTANT: The output you see comes from pve2
IMPORTANT: Your input is executed on pve2
  Failed to find logical volume "drbdpool/.drbdctrl_0"
  Failed to find logical volume "drbdpool/.drbdctrl_1"
  Logical volume ".drbdctrl_0" created.
  Logical volume ".drbdctrl_1" created.
NOT initializing bitmap
initializing activity log
Writing meta data...
New drbd meta data block successfully created.
NOT initializing bitmap
initializing activity log
Writing meta data...
New drbd meta data block successfully created.
Operation completed successfully
</pre>
and then finally
<pre>
root@pve1:~# drbdmanage add-node -q pve3 10.0.15.83
[...]
</pre>
 
then add a DRBD entry to '''/etc/pve/storage.cfg''' like this:
 
<b>NOTE1:</b> Redundancy <Number> - this number can not be higher than the maximum number of your actual total nodes.
 
<b>NOTE2:</b> If the file does not exist, try adding some storage in web GUI like a "local directory" one, and pve will create the file for you
 
<b>NOTE3:</b> Each storage entry in that file must be followed by exactly one empty line
<pre>
 
drbd: drbd1
        content images,rootdir
        redundancy 3
</pre>
 
The node configuration can be verified by
 
drbdmanage list-nodes
 
== Create the first VM on DRBD for testing and live migration ==
 
On the GUI you can see the DRBD storage and you can use it for as virtual disk storage.
 
<b>NOTE:</b> DRBD supports only raw disk format at the moment.
 
Try to live migrate the VM - as all data is available on both nodes it will take just a few seconds. The overall process might take a bit longer if the VM is under load and if there is a lot of RAM involved. But in any case, the downtime is minimal and you will see no interruption at all.
 
== DRBD support ==
 
DRBD can be configured in many different ways and there is a lot of space for optimizations and performance tuning. If you run DRBD in a production environment we highly recommend the [http://www.linbit.com/en/p/products/drbd9 DRBD commercial support] from the DRBD developers. The company behind DRBD is [http://www.linbit.com Linbit].
 
== Recovery from communication failure ==
If the communication between storage is interrupted but the nic is still up, when node reconnects they will sync again.
 
If the nic instead goes down (i.e. cable unplugged), then when the nic and communication is up again the nodes remain isolated. For example if you have pve1 with resource A as primary and pve2, when you reconnect you will see (# drbdsetup status) on pve1 "pve2 connection:Connecting" and in pve2 "pve1 connection:StandAlone" and it's disks flagged as "outdated". When in "StandAlone" state no sync it automatically performed.
To force the reconnection you have to issue on node pve2 the following command:
<pre>
root@pve2:~# drbdadm adjust all
</pre>
The same applies if you create a resource on a node when it's disconnected from the others.
To automate the process you can append a line in /etc/network/interfaces on the storage nic definition section like
<pre>
post-up drbdadm adjust all
</pre>
 
== Integrity checking ==
 
*You can enable "data-integrity-alg" for testing purposes and test at least for a week before production use. Don't use in production as this can cause split brain in dual-primary configuration and also it decreases performance.
*It is good idea to run "drbdadm verify" once a week (or at least once a month) when servers under low load.
<pre># /etc/cron.d/drbdadm-verify-weekly
# This will have cron invoked a drbd resources verification every Monday at 42 minutes past midnight
42 0 * * 1    root    /sbin/drbdadm verify all
</pre>
*Check man drbd.conf, section "NOTES ON DATA INTEGRITY" for more information.
 
== Final considerations ==
 
Now you have a fully redundant storage for your VM´s without using expensive SAN equipment, configured in about 10 to 30 minutes - starting from bare-metal.
 
*Three servers for a redundant SAN
*Three servers for redundant virtualization hosts
 
== Alternative Two storage node setup ==
 
You can also setup a cluster with 2 powerful servers that has storage, and a third node with a lightweight PC (i.e. Mitac Pluto 220) just for quorum.
The difference are (if 'pve3', IP 192.168.15.83, is the node without DRBD storage):
* In node pve3 you don't need to configure eth1 (was 10.0.0.83) for DRBD storage communication
* In node pve3 you don't have to configure storage nor DRBD
* From pve1 you don't have to # drbdmanage add-node -q pve3 10.0.15.83
* In storage.cfg you will have "redundancy 2" instead of 3 and you should add a line like this to limit the DRBD storage visibility: nodes pve1,pve2
 
The only drawback is that node 3 is listed when you choose target for migration but is not a good choice!
 
[[Category:HOWTO]] [[Category:Technology]]

Latest revision as of 10:04, 5 January 2017

Introduction

DRBD9 is removed from the Proxmox VE core distribution since 4.4 and is now maintained directly by Linbit, due to license change