Difference between revisions of "DRBD9"

From Proxmox VE
Jump to navigation Jump to search
(Removed outdated docs, now maintained by Linbit)
 
(44 intermediate revisions by 8 users not shown)
Line 1: Line 1:
= Introduction =
+
== Introduction ==
  
<b>BETA NOT FOR PRODUCTION</b>
+
DRBD9 is removed from the Proxmox VE core distribution since 4.4 and is now maintained directly by Linbit, due to [https://forum.proxmox.com/threads/drbdmanage-license-change.30404/ license change]
 
 
DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1. For detailed information please visit [http://www.linbit.com Linbit].
 
 
 
Main features of the integration in Proxmox VE:
 
 
 
*drbd9/drbdmanage; drbd devices on top of LVM
 
*All VM disks (LVM volumes on the DRBD device) can be replicated in real time on several Proxmox VE nodes via the network.  
 
*Ability to live migrate running machines without downtime in a few seconds WITHOUT the need of SAN (iSCSI, FC, NFS) as the data is already on both nodes.
 
 
 
= System requirements  =
 
 
 
You need 2 identical Proxmox VE servers (V4.0 or higher) with the following extra hardware:
 
 
 
*Extra NIC (dedicated for DRBD traffic)
 
*Second disk or raid volume (e.g. /dev/sdb) for DRBD
 
*Use a hardware raid controller with BBU to eliminate performance issues concerning internal metadata (see [http://fghaas.wordpress.com/2009/08/20/internal-metadata-and-why-we-recommend-it/ Florian´s blog]).
 
*A functional Proxmox VE Cluster (V4.0 or higher)
 
 
 
= WARNINGS =
 
* Do not use write cache for any virtual drives on top of DRBD as this can cause out of sync blocks. You need to use 'writethrough' instead of the default 'none'. Follow the link for more information: http://forum.proxmox.com/threads/18259-KVM-on-top-of-DRBD-and-out-of-sync-long-term-investigation-results?p=93126
 
* Consider doing [[#Integrity checking|integrity checking]] periodically to be sure DRBD is consistent
 
 
 
== Network  ==
 
 
 
Configure eth1 on both nodes with a fixed private IP address via the web interface and reboot each server.
 
 
 
For better understanding, here is my /etc/network/interfaces file from my first node pve1, after the reboot:
 
<pre>cat /etc/network/interfaces
 
# network interface settings
 
auto lo
 
iface lo inet loopback
 
 
 
iface eth0 inet manual
 
 
 
auto eth1
 
iface eth1 inet static
 
        address  10.0.15.81
 
        netmask  255.255.255.0
 
 
 
auto vmbr0
 
iface vmbr0 inet static
 
        address  192.168.15.81
 
        netmask  255.255.255.0
 
        gateway  192.168.15.1
 
        bridge_ports eth0
 
        bridge_stp off
 
        bridge_fd 0</pre>
 
And from the second node pve2:
 
<pre># network interface settings
 
auto lo
 
iface lo inet loopback
 
 
 
iface eth0 inet manual
 
 
 
auto eth1
 
iface eth1 inet static
 
        address  10.0.15.82
 
        netmask  255.255.240.0
 
 
 
auto vmbr0
 
iface vmbr0 inet static
 
        address  192.168.15.82
 
        netmask  255.255.255.0
 
        gateway  192.168.15.1
 
        bridge_ports eth0
 
        bridge_stp off
 
        bridge_fd 0</pre>
 
 
 
And finally the third node pve3: 
 
<pre># network interface settings
 
auto lo
 
iface lo inet loopback
 
 
 
iface eth0 inet manual
 
 
 
auto eth1
 
iface eth1 inet static
 
        address  10.0.0.83
 
        netmask  255.255.255.0
 
 
 
auto vmbr0
 
iface vmbr0 inet static
 
        address  192.168.15.83
 
        netmask  255.255.255.0
 
        gateway  192.168.15.1
 
        bridge_ports eth0
 
        bridge_stp off
 
        bridge_fd 0</pre>
 
 
 
== Disk for DRBD  ==
 
DRBD will search for the LVM Volume Group drbdpool.
 
So you have to create them.
 
 
 
I will use /dev/sdb1 for DRBD. Therefore I need to create this single big partition on /dev/sdb - make sure they exist on all nodes.
 
 
 
To prepare the disk for DRBD just run
 
 
 
<b>NOTE: The logical volumes must have all the same size on each node!</b>
 
 
 
<pre>
 
parted /dev/sdb mktable gpt
 
parted /dev/sdb mkpart drbd 1 100%
 
parted /dev/sdb p
 
 
 
Model: ATA Samsung SSD 850 (scsi)
 
Disk /dev/sdb: 512GB
 
Sector size (logical/physical): 512B/512B
 
Partition Table: gpt
 
Disk Flags:
 
 
 
Number  Start  End    Size    File system  Name  Flags
 
1      1049kB  512GB  512GB                drbd
 
 
 
root@proxmox:~# vgcreate drbdpool /dev/sdb1
 
  Physical volume "/dev/sdb1" successfully created
 
  Volume group "drbdpool" successfully created
 
root@proxmox:~# lvcreate -n  drbdthinpool -L 511G drbdpool
 
  Logical volume "drbdthinpool" created.
 
root@proxmox:~# lvconvert --type thin-pool drbdpool/drbdthinpool
 
  WARNING: Converting logical volume drbdpool/drbdthinpool to pool's data volume.
 
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
 
Do you really want to convert drbdpool/drbdthinpool? [y/n]: y
 
  Converted drbdpool/drbdthinpool to thin pool.
 
</pre>
 
 
 
= DRBD configuration  =
 
 
 
== Software installation  ==
 
 
 
Install the DRBD user tools on all nodes :
 
<pre>apt-get install drbdmanage -y</pre>
 
 
 
== Configure DRBD  ==
 
To configure DRBD9 it is only necessary to run the following on node pve1:
 
<pre>
 
drbdmanage init
 
You are going to initalize a new drbdmanage cluster.
 
CAUTION! Note that:
 
  * Any previous drbdmanage cluster information may be removed
 
  * Any remaining resources managed by a previous drbdmanage installation
 
    that still exist on this system will no longer be managed by drbdmanage
 
 
 
Confirm:
 
 
 
  yes/no: yes
 
  Failed to find logical volume "drbdpool/.drbdctrl"
 
  Logical volume ".drbdctrl" created.
 
initializing activity log
 
NOT initializing bitmap
 
Writing meta data...
 
New drbd meta data block successfully created.
 
empty drbdmanage control volume initialized.
 
Operation completed successfully
 
</pre>
 
 
 
Now add all nodes of the cluster to DRBD, with the following command
 
<pre>
 
root@pve1:~# drbdmanage new-node pve2 192.168.15.82
 
Operation completed successfully
 
Operation completed successfully
 
 
 
Executing join command using ssh.
 
IMPORTANT: The output you see comes from pve2
 
IMPORTANT: Your input is executed on pve2
 
You are going to join an existing drbdmanage cluster.
 
CAUTION! Note that:
 
  * Any previous drbdmanage cluster information may be removed
 
  * Any remaining resources managed by a previous drbdmanage installation
 
    that still exist on this system will no longer be managed by drbdmanage
 
 
 
Confirm:
 
 
 
  yes/no: yes
 
  Logical volume ".drbdctrl" successfully removed
 
  Logical volume ".drbdctrl" created.
 
You want me to create a v09 style flexible-size internal meta data block.
 
There appears to be a v09 flexible-size internal meta data block
 
already in place on /dev/drbdpool/.drbdctrl at byte offset 4190208
 
 
 
Do you really want to overwrite the existing meta-data?
 
*** confirmation forced via --force option ***
 
 
 
Do you want to proceed?
 
*** confirmation forced via --force option ***
 
NOT initializing bitmap
 
md_offset 4190208
 
al_offset 4157440
 
bm_offset 4153344
 
 
 
Found some data
 
 
 
==> This might destroy existing data! <==
 
initializing activity log
 
Writing meta data...
 
New drbd meta data block successfully created.
 
Operation completed successfully
 
root@pve1:~# drbdmanage new-node pve3 192.168.15.83
 
</pre>
 
 
 
then add the DRBD to /etc/pve/storage.cfg like this
 
 
 
<b>NOTE:</b> Redundancy <Number> - this number can not be higher than the maximum number of your actual total nodes.
 
<pre>dir: local
 
path /var/lib/vz
 
content rootdir,iso,images,vztmpl
 
maxfiles 0
 
 
 
drbd: drbd1
 
        redundancy 3
 
</pre>
 
 
 
= Create the first VM on DRBD for testing and live migration  =
 
 
 
On the GUI you can see the DRBD storage and you can use it for as virtual disk storage.
 
 
 
<b>NOTE:</b> DRBD supports only raw disk format at the moment.
 
 
 
Try to live migrate the VM - as all data is available on both nodes it will take just a few seconds. The overall process might take a bit longer if the VM is under load and if there is a lot of RAM involved. But in any case, the downtime is minimal and you will see no interruption at all.
 
 
 
= DRBD support  =
 
 
 
DRBD can be configured in many different ways and there is a lot of space for optimizations and performance tuning. If you run DRBD in a production environment we highly recommend the [http://www.linbit.com/en/p/products/drbd9 DRBD commercial support] from the DRBD developers. The company behind DRBD is [http://www.linbit.com Linbit].
 
 
 
= Recovery from communication failure  =
 
 
 
= Integrity checking =
 
 
 
*You can enable "data-integrity-alg" for testing purposes and test at least for a week before production use. Don't use in production as this can cause split brain in dual-primary configuration and also it decreases performance.
 
*It is good idea to run "drbdadm verify" once a week (or at least once a month) when servers under low load.
 
<pre># /etc/cron.d/drbdadm-verify-weekly
 
# This will have cron invoked a drbd resources verification every Monday at 42 minutes past midnight
 
42 0 * * 1    root    /sbin/drbdadm verify all
 
</pre>
 
*Check man drbd.conf, section "NOTES ON DATA INTEGRITY" for more information.
 
 
 
= Final considerations  =
 
 
 
Now you have a fully redundant storage for your VM´s without using expensive SAN equipment, configured in about 10 to 30 minutes - starting from bare-metal.
 
 
 
*Three servers for a redundant SAN
 
*Three servers for redundant virtualization hosts
 
 
 
[[Category:HOWTO]] [[Category:Technology]]
 

Latest revision as of 12:04, 5 January 2017

Introduction

DRBD9 is removed from the Proxmox VE core distribution since 4.4 and is now maintained directly by Linbit, due to license change