Difference between revisions of "Ceph RBD Mirroring"

From Proxmox VE
Jump to navigation Jump to search
m (change docs url to nautilus)
(restructure to include snapshot based mirroring, add section about failover)
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Configuring rbd-mirror for Off-Site-Backup (one-way-mirroring)
+
There are two possible ways to set up mirroring of RBD images to other Ceph clusters.
 +
One is using ''journaling'', the other is using ''snapshots''.
  
== Requirements ==
+
The journal based approach will cause more load on your cluster as each write operation needs to be written twice. Once to the actual data and then to the journal. The journal is read by the target cluster and replayed.
* Two Ceph clusters
 
* One or more pools of the same name in both clusters
 
* Installed rbd-mirror on the backup cluster ONLY (apt install rbd-mirror)
 
  
This guide assumes you have two clusters, one called master where your images are used in production and a backup cluster where you want to create your disaster recovery backup. The general idea is, that one or more rbd-mirror-daemons on the backup cluster are pulling changes from the master cluster. This should be appropriate to maintain a crash consistency copy of the original image. This approach will not help you when you want to failback to the master cluster, for this you will need two-way-mirroring or at least set it up at the time you want to failback.
+
When using snapshot mirroring, the source image is being snapshotted according to a set schedule and the target cluster will fetch the new snapshots.
  
First of all only images with the "exclusive-lock" and "journaling" feature will be mirrored, because "journaling" depends on "exclusive-lock" you will need to enable both features. To check whether or not these features are already enabled on an image do the following:
+
Journal based mirroring can run into the situation that the target cluster cannot replay the journal fast enough. Either because the network in between the two clusters is not fast enough or because the target cluster itself is too slow. This will result in the source cluster filling up with journal objects. In such a situation, consider switching over to snapshot based mirroring.
 +
 
 +
This guide is based on the official [https://docs.ceph.com/en/latest/rbd/rbd-mirroring/ Ceph RBD mirror documentation] with specifics in mind for a hyperconverged Proxmox VE + Ceph setup.
 +
 
 +
== Overview ==
 +
 
 +
We assume two clusters, site A and site B. The target of this guide is to set up mirroring from A to B. Adding two way mirroring is doing most steps again, in the other direction.
 +
 
 +
The pools on both clusters need to be named the same.
 +
 
 +
To follow this guide, use any node on site A. On site B, run the commands on the node on which the RBD mirror daemon should be running.
 +
 
 +
{{Note|Nodes with the RBD mirror daemon must be able to access all Ceph nodes in both clusters!}}
 +
 
 +
{{Note|KRBD does not support journal based mirroring! This means that for LXC containers you need to use snapshot mirroring. For VMs you can disable KRBD in the Proxmox VE storage configuration.}}
  
 
<pre>
 
<pre>
# rbd info <your_pool_name>/<your_vm_disk_image>
+
┌────────────────┐    ┌────────────────┐
 +
│    Site A    │    │    Site B    │
 +
│ ┌────────────┐ │    │ ┌────────────┐ │
 +
│ │  Node 1  │ │    │ │  Node 1  │ │
 +
│ │            │ │    │ │            │ │
 +
│ │            │>──>──>┼─┼─RBD Mirror │ │
 +
│ └────────────┘ │    │ └────────────┘ │
 +
│ ┌────────────┐ │    │ ┌────────────┐ │
 +
│ │  Node 2  │ │    │ │  Node 2  │ │
 +
│ └────────────┘ │    │ └────────────┘ │
 +
│ ┌────────────┐ │    │ ┌────────────┐ │
 +
│ │  Node 3  │ │    │ │  Node 3  │ │
 +
│ └────────────┘ │    │ └────────────┘ │
 +
└────────────────┘    └────────────────┘
 
</pre>
 
</pre>
 +
The RBD Mirror daemon is responsible to fetch the journal or snapshots from the source and to apply it in the targer cluster. It runs on the target cluster.
 +
 +
== Set up users ==
 +
 +
=== Site A ===
 +
 +
At the beginning, we need to set up the needed users. There will be two users. One on the source cluster (site A) with which the rbd-mirror daemon on the target cluster (site B) authenticates against site A. The second user is the one with which the rbd-mirror authenticates against the target cluster (site B).
 +
 +
Let's first create the user in the source cluster (site A):
 +
root@site-a $ ceph auth get-or-create client.rbd-mirror-peer-a mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/site-b.client.rbd-mirror-peer-a.keyring
 +
 +
We need to make this file available over at the other cluster, site B. Either use SCP or copy the contents of the file manually to the following location on site B:
 +
/etc/pve/priv/site-a.client.rbd-mirror-peer-a.keyring
 +
The `site-a` part at the beginning defines how the source cluster will be called by the target cluster! If you use something else, make sure to use the same name throughout the guide!
 +
 +
=== Site B ===
 +
We need to create a local user for the rbd-mirror daemon on the target cluster.
 +
root@site-b $  ceph auth get-or-create client.rbd-mirror.$(hostname) mon 'profile rbd-mirror' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.$(hostname).keyring
 +
 +
 +
{{Note|We use `$(hostname)` to match the unique ID to what is used for other Ceph services such as monitors.}}
 +
{{Note|You can restrict the permissions to a specific pool if you write 'profile rbd pool&#61;mypool' }}
 +
 +
== Copy Ceph config of site-a to site-b ==
 +
 +
In order for the rbd-mirror to access the Ceph cluster on site A, we need to copy over the `ceph.conf` file from site A to site B and name it correctly.
 +
 +
We place it in the `/etc/pve` directory to make it available on all nodes and symlink it into the `/etc/ceph` directory.
 +
 +
For example:
 +
root@site-a $ scp /etc/pve/ceph.conf root@<rbd_mirror_host_in_site_B>:/etc/pve/site-a.conf
 +
Switch to the other cluster:
 +
root@site-b $ ln -s /etc/pve/site-a.conf /etc/ceph/site-a.conf
 +
 +
Make sure that the name of the config file matches the name used in the keyring that stores the authentication infos.
 +
 +
== Enable mirroring on pools ==
 +
 +
Run the following command on both clusters to enable mirroring:
 +
 +
$ rbd mirror pool enable <pool> <mode>
 +
 +
If you want to use journal based mirroring, you can set `<mode>` to `pool`. This will mirror all images that have the `journaling` feature enabled.
 +
 +
For snapshot based mirroring or if you want to manually enable mirroring in journal based mirroring, set `<mode>` to `image`.
 +
 +
For example if you want image based mirroring that allows you to choose between snapshot or journal based mirroring for each image:
 +
$ rbd mirror pool enable <pool> image
 +
 +
== Configure peers ==
 +
 +
Next we need to tell the pool on site B which keyring and Ceph config file it should use to connect to the peer (site A).
 +
 +
root@site-b $ rbd mirror pool peer add <pool> client.rbd-mirror-peer-a@site-a
 +
 +
You can check the settings by running
 +
root@site-b $ rbd mirror pool info <pool>
 +
Mode: image
 +
Site Name: 44d5aca2-d47c-4f1f-bfa8-2c52281619ee
 +
 +
Peer Sites:
 +
 +
UUID: aa08d6ab-a8a4-4cb4-ba92-6a03c738b8ca
 +
Name: site-a
 +
Mirror UUID:
 +
Direction: rx-tx
 +
Client: client.rbd-mirror-peer-a
 +
 +
The direction should be `rx-tx` and the client should be set correctly to match the keyring file. The name should also be shown correctly (site A).
 +
Should you need to change any of these settings, you can do so with:
 +
rbd mirror pool peer set <pool> <uuid> <property> <value>
 +
 +
The source cluster (site A) does not yet know about the peer, as it hasn't connected yet.
 +
 +
== Set up the rbd-mirror daemon ==
 +
 +
We need to install the `rbd-mirror` first:
 +
root@site-b $ apt install rbd-mirror
 +
 +
Since we have our keyring files stored in the `/etc/pve/priv` directory which can only be read by the user `root`, we need to enable and modify the systemd unit file for the rbd-mirror.
 +
root@site-b $ systemctl enable ceph-rbd-mirror.target
 +
root@site-b $ cp /usr/lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
 +
root@site-b $ sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
 +
With this, we changed it so, that the rbd-mirror is run as root.
 +
Next we need to create and start the service. Here we need to make sure to call it as we called the local user for the target cluster that we created earlier, otherwise the daemon won't be able to authenticate against the target cluster (site B).
 +
root@site-b $ systemctl enable --now ceph-rbd-mirror@rbd-mirror.$(hostname).service
 +
 +
If we check the status and logs of the `ceph-rbd-mirror@rbd-mirror.<hostname>.service` service, we should see that it comes up and does not log any authentication errors.
  
e.g.
+
The source cluster (site A) should now have a peer configured and direction will be `tx-only`:
 +
root@site-a $ rbd mirror pool info <pool>
  
<pre>
+
== Configure images ==
# rbd info data/vm-100-disk-0
+
 
</pre>
+
Before we can start mirroring the images, we need to define which images should be mirrored.
 +
 
 +
The `mode` defines if the image is mirrored using snapshots or a journal.
 +
 
 +
To enable the mirroring of an image, run
 +
rbd mirror image enable <pool>/<image> <mode>
 +
 
 +
This needs to be done on the source, site A.
 +
 
 +
=== Snapshot based mirror ===
 +
To use snapshots, configure the image with `mode` `snapshot`, for example:
 +
root@site-a $ rbd mirror image enable rbd/vm-100-disk-0 snapshot
 +
 
 +
This command can take a moment or two.
 +
 
 +
Now, every time we want the current state to be mirrored to the target cluster (site B) we need a snapshot. We can create them manually with:
 +
rbd mirror image snapshot <pool>/<image>
 +
 
 +
==== Snapshot schedule ====
 +
Since it would be cumbersome to always create mirror snapshots manually, we can define a snapshot schedule so they will be taken automatically.
 +
rbd mirror snapshot schedule add --pool <pool> <interval>
 +
For example, every 5 minutes:
 +
root@site-a $ rbd mirror snapshot schedule add 5m
 +
 
 +
You can also use other suffixes for days (d) or hours (h) and specify it more explicitly for a single pool with the `--pool <pool>` parameter.
 +
 
 +
To verify the schedule run:
 +
root@site-a $ rbd mirror snapshot schedule status
 +
It can take a few moments for the newly created schedule to show up!
 +
 
 +
=== Journal based mirror ===
 +
 
 +
To enable journal based mirroring for an image, run the command with the `journal` mode. For example:
 +
root@site-a $ rbd mirror image enable rbd/vm-100-disk-0 journal
 +
 
 +
This will automatically enable the `journal` feature for the image. Compare the output of
 +
root@site-a $ rbd info <pool>/<image>
 +
before and after you enable journal based mirroring for the first time.
 +
 
 +
Journal based mirroring also needs the `exclusive-lock` feature enabled for the images, which should be the default.
 +
 
 +
{{Note|KRBD does not support journal based mirroring!}}
 +
 
 +
== Last steps ==
 +
 
 +
Once the rbd-mirror is up and running, you should see a peer configured in the source cluster (site A):
 +
root@site-a $ rbd mirror pool info <pool>
 +
Mode: image
 +
Site Name: ce99d398-91ab-4667-b4f2-307ba0bec358
 +
 +
Peer Sites:
 +
 +
UUID: 87441fdf-3a61-4840-a869-34b25b47a964
 +
Name: 44d5aca2-d47c-4f1f-bfa8-2c52281619ee
 +
Mirror UUID: 1abf773b-6c95-420c-8ceb-35ee346521db
 +
Direction: tx-only
 +
 
 +
On the target cluster (site B) you will see the image if you run
 +
root@site-b $ rbd ls --pool <pool>
 +
and if you used snapshot based mirroring, you should see snapshots appearing on the target cluster (site B) very quickly.
 +
rbd snap ls --all --pool <pool> <image>
 +
For example:
 +
root@site-b $ rbd snap ls --all --pool rbd vm-100-disk-0
 +
 
 +
You can get detailed information about the mirroring by running:
 +
root@site-b $ rbd mirror pool status <pool> --verbose
  
To enable a feature:
+
== Failover Recovery ==
  
<pre>
+
A common scenario is that the source cluster, site A in this guide, will have some kind of failure, and we want to fail over to the other cluster, site B.
# rbd feature enable data/vm-100-disk-0 journaling
 
</pre>
 
  
You need to do this on every image you want to mirror.
+
You will have to make sure that the VM and container configuration files are synced to the other site yourself. For example, with a recurring ''rsync'' job. The container configuration files for each node are located at
 +
/etc/pve/lxc
 +
and for VMs in
 +
/etc/pve/qemu-server
  
 +
Make sure that no guest has anything configured that is specific to only the source cluster, like an ISO image or a storage used for the disk images.
  
The next step is to set the mirroring mode on each pool you want to mirror.
+
If you would just try to start the guests on the remaining secondary cluster (site B), a container will not start, and a VM could start (if KRBD is disabled), but will report IO errors very quickly. This is due to the fact that the target images are marked as such (non-primary) and won't allow writing to them from our guests.
You can choose between pool mode or image mode, this has to be done on both clusters on the corresponding pools e.g. data/data.
 
<pre>
 
# rbd mirror pool enable <your_pool_name> <mode>
 
</pre>
 
  
e.g
+
=== Promote images on site B ===
  
<pre>
+
By promoting an image or a all images in a pool, we can tell Ceph that they are now the primary ones to be used. In a planned failover, we would first demote the images on site A before we promote the images on site B. In a recovery situation with site A down, we need to `--force` the promotion.
# rbd mirror pool enable data pool
 
</pre>
 
  
 +
To promote a single image, run the following command:
 +
root@site-b $ rbd mirror image promote <pool>/<image> --force
  
On one of the monitor hosts of the master cluster create a user:
+
To promote all images in a pool, run the following command:
<pre>
+
root@site-b $ rbd mirror pool promote <pool> --force
# ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/master.client.rbd-mirror.master.keyring
 
</pre>
 
  
'''Note:'''
+
After this, our guests should start fine.
You can restrict this to a specific pool if you write 'profile rbd pool=data'
 
  
 +
=== Resync and switch back to site A ===
  
Copy your ceph.conf file from your master cluster to your backup cluster "/etc/ceph/" directory under the name of master.conf (be careful to not overwrite your backup cluster's ceph.conf file).
+
Once site A is back up and operational, we want to plan our switch back. For this, we first need to demote the images on site A.
Copy the previously generated keyring-file (master.client.rbd-mirror.master.keyring) to your backup cluster "/etc/pve/priv/" directory.
 
This step is necessary as it is not possible to mirror two clusters with the same name, therefore we use a different name (master) which is only represented by the different config filename and the corresponding keyring file.
 
  
On a node of the backup cluster create a unique client id to be used for each rbd-mirror-daemon instance:
+
{{Note|Do not start guests on site A at this point!}}
<pre>
 
# ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring
 
</pre>
 
  
You should now be able to start the daemon (as root):
+
For all images in a pool:
 +
root@site-a $ rbd mirror pool demote <pool>
  
<pre>
+
For specific images:
# systemctl enable ceph-rbd-mirror.target
+
root@site-a $ rbd mirror image demote <pool>/<image>
# cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
 
# sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service
 
# systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service
 
# systemctl start ceph-rbd-mirror@rbd-mirror.backup.service
 
</pre>
 
The replacement of the ceph user in the unit file is only necessary if you put the keyring file under ''/etc/pve/priv/'' (to have the file available cluster-wide), as the user ceph can't access that directory. Ceph tools by default search in ''/etc/ceph/'' for files.
 
  
 +
We also need to set up an RBD mirror daemon on site A that connects to site B (two-way mirror). If not done yet, now is the time to set this up. The steps are the same, but in reverse order.
  
Add the master cluster as a peer to the backup cluster to start:
+
Once the RBD mirror daemon on site A is up and running, the images need to be flagged for a resync. Until then, the RBD mirror daemon on site A will log problems. Run the following commands for each image (or script it):
 +
$ rbd mirror image resync <pool>/<image>
  
<pre>
+
After a short time, the images should be mirrored from site B to site A now. You can verify it by running
# rbd mirror pool peer add <pool_name> <master_client_id>@<name_of_master_cluster>
+
rbd mirror pool status <pool> --verbose
</pre>
+
by checking the `last_update` line for each image.
e.g.
 
<pre>
 
# rbd mirror pool peer add data client.rbd-mirror.master@master
 
</pre>
 
  
Verify that the peering succeeded by the following command:
+
If you want to move a guest back, make sure that the configuration on site A is still valid and hasn't changed during the time on site B.
  
<pre>
+
Then power down the guest and wait for another successful mirroring to site A.
# rbd mirror pool info <pool_name>
+
Once we are sure that the disk images have been mirrored after we have shutdown the guest, we can demote the image(s) on site B and promote them on site A.
</pre>
+
root@site-b $ rbd mirror image demote <pool>/<image>
e.g
 
<pre>
 
# rbd mirror pool info data
 
</pre>
 
  
This should print the peer and the mirror mode if all went well, the uuid which is printed is necessary if you want to remove the peer anytime in the future.
+
Or for all primary images in a pool:
 +
root@site-b $ rbd mirror pool demote <pool>
  
You should now see each image in your backup cluster which is marked with the journaling feature in the master cluster. You can verify the current mirror state by the following command:  
+
Promote single images on site A:
<pre>
+
root@site-a $ rbd mirror image promote <pool>/<image>
# rbd mirror pool status data --verbose
 
</pre>
 
  
If you want to switch to the backup cluster, you need to promote the backup images to primary images. This should only be done when your master cluster crashed or you took the necessary steps on the master cluster before switching e.g. demoting the images on the master cluster.
+
Promote all non-primary images in a pool:
 +
root@site-a $ rbd mirror pool promote <pool>
  
Please also check out Ceph's rbd-mirror documentation.
+
After a short time, we should see that the images on site A are now the primary ones and on site B that the images are mirrored again:
http://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/
+
$ rbd mirror pool status <pool> --verbose
  
[[Category:HOWTO]]
+
[[Category:Staging]]

Revision as of 09:39, 16 September 2022

There are two possible ways to set up mirroring of RBD images to other Ceph clusters. One is using journaling, the other is using snapshots.

The journal based approach will cause more load on your cluster as each write operation needs to be written twice. Once to the actual data and then to the journal. The journal is read by the target cluster and replayed.

When using snapshot mirroring, the source image is being snapshotted according to a set schedule and the target cluster will fetch the new snapshots.

Journal based mirroring can run into the situation that the target cluster cannot replay the journal fast enough. Either because the network in between the two clusters is not fast enough or because the target cluster itself is too slow. This will result in the source cluster filling up with journal objects. In such a situation, consider switching over to snapshot based mirroring.

This guide is based on the official Ceph RBD mirror documentation with specifics in mind for a hyperconverged Proxmox VE + Ceph setup.

Overview

We assume two clusters, site A and site B. The target of this guide is to set up mirroring from A to B. Adding two way mirroring is doing most steps again, in the other direction.

The pools on both clusters need to be named the same.

To follow this guide, use any node on site A. On site B, run the commands on the node on which the RBD mirror daemon should be running.

Yellowpin.svg Note: Nodes with the RBD mirror daemon must be able to access all Ceph nodes in both clusters!
Yellowpin.svg Note: KRBD does not support journal based mirroring! This means that for LXC containers you need to use snapshot mirroring. For VMs you can disable KRBD in the Proxmox VE storage configuration.
┌────────────────┐     ┌────────────────┐
│     Site A     │     │     Site B     │
│ ┌────────────┐ │     │ ┌────────────┐ │
│ │   Node 1   │ │     │ │   Node 1   │ │
│ │            │ │     │ │            │ │
│ │            │>──>──>┼─┼─RBD Mirror │ │
│ └────────────┘ │     │ └────────────┘ │
│ ┌────────────┐ │     │ ┌────────────┐ │
│ │   Node 2   │ │     │ │   Node 2   │ │
│ └────────────┘ │     │ └────────────┘ │
│ ┌────────────┐ │     │ ┌────────────┐ │
│ │   Node 3   │ │     │ │   Node 3   │ │
│ └────────────┘ │     │ └────────────┘ │
└────────────────┘     └────────────────┘

The RBD Mirror daemon is responsible to fetch the journal or snapshots from the source and to apply it in the targer cluster. It runs on the target cluster.

Set up users

Site A

At the beginning, we need to set up the needed users. There will be two users. One on the source cluster (site A) with which the rbd-mirror daemon on the target cluster (site B) authenticates against site A. The second user is the one with which the rbd-mirror authenticates against the target cluster (site B).

Let's first create the user in the source cluster (site A):

root@site-a $ ceph auth get-or-create client.rbd-mirror-peer-a mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/site-b.client.rbd-mirror-peer-a.keyring

We need to make this file available over at the other cluster, site B. Either use SCP or copy the contents of the file manually to the following location on site B:

/etc/pve/priv/site-a.client.rbd-mirror-peer-a.keyring

The `site-a` part at the beginning defines how the source cluster will be called by the target cluster! If you use something else, make sure to use the same name throughout the guide!

Site B

We need to create a local user for the rbd-mirror daemon on the target cluster.

root@site-b $  ceph auth get-or-create client.rbd-mirror.$(hostname) mon 'profile rbd-mirror' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.$(hostname).keyring


Yellowpin.svg Note: We use `$(hostname)` to match the unique ID to what is used for other Ceph services such as monitors.
Yellowpin.svg Note: You can restrict the permissions to a specific pool if you write 'profile rbd pool=mypool'

Copy Ceph config of site-a to site-b

In order for the rbd-mirror to access the Ceph cluster on site A, we need to copy over the `ceph.conf` file from site A to site B and name it correctly.

We place it in the `/etc/pve` directory to make it available on all nodes and symlink it into the `/etc/ceph` directory.

For example:

root@site-a $ scp /etc/pve/ceph.conf root@<rbd_mirror_host_in_site_B>:/etc/pve/site-a.conf

Switch to the other cluster:

root@site-b $ ln -s /etc/pve/site-a.conf /etc/ceph/site-a.conf

Make sure that the name of the config file matches the name used in the keyring that stores the authentication infos.

Enable mirroring on pools

Run the following command on both clusters to enable mirroring:

$ rbd mirror pool enable <pool> <mode>

If you want to use journal based mirroring, you can set `<mode>` to `pool`. This will mirror all images that have the `journaling` feature enabled.

For snapshot based mirroring or if you want to manually enable mirroring in journal based mirroring, set `<mode>` to `image`.

For example if you want image based mirroring that allows you to choose between snapshot or journal based mirroring for each image:

$ rbd mirror pool enable <pool> image

Configure peers

Next we need to tell the pool on site B which keyring and Ceph config file it should use to connect to the peer (site A).

root@site-b $ rbd mirror pool peer add <pool> client.rbd-mirror-peer-a@site-a

You can check the settings by running

root@site-b $ rbd mirror pool info <pool>
Mode: image
Site Name: 44d5aca2-d47c-4f1f-bfa8-2c52281619ee

Peer Sites: 

UUID: aa08d6ab-a8a4-4cb4-ba92-6a03c738b8ca
Name: site-a
Mirror UUID: 
Direction: rx-tx
Client: client.rbd-mirror-peer-a

The direction should be `rx-tx` and the client should be set correctly to match the keyring file. The name should also be shown correctly (site A). Should you need to change any of these settings, you can do so with:

rbd mirror pool peer set <pool> <uuid> <property> <value>

The source cluster (site A) does not yet know about the peer, as it hasn't connected yet.

Set up the rbd-mirror daemon

We need to install the `rbd-mirror` first:

root@site-b $ apt install rbd-mirror

Since we have our keyring files stored in the `/etc/pve/priv` directory which can only be read by the user `root`, we need to enable and modify the systemd unit file for the rbd-mirror.

root@site-b $ systemctl enable ceph-rbd-mirror.target
root@site-b $ cp /usr/lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service
root@site-b $ sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service

With this, we changed it so, that the rbd-mirror is run as root. Next we need to create and start the service. Here we need to make sure to call it as we called the local user for the target cluster that we created earlier, otherwise the daemon won't be able to authenticate against the target cluster (site B).

root@site-b $ systemctl enable --now ceph-rbd-mirror@rbd-mirror.$(hostname).service

If we check the status and logs of the `ceph-rbd-mirror@rbd-mirror.<hostname>.service` service, we should see that it comes up and does not log any authentication errors.

The source cluster (site A) should now have a peer configured and direction will be `tx-only`:

root@site-a $ rbd mirror pool info <pool>

Configure images

Before we can start mirroring the images, we need to define which images should be mirrored.

The `mode` defines if the image is mirrored using snapshots or a journal.

To enable the mirroring of an image, run

rbd mirror image enable <pool>/<image> <mode>

This needs to be done on the source, site A.

Snapshot based mirror

To use snapshots, configure the image with `mode` `snapshot`, for example:

root@site-a $ rbd mirror image enable rbd/vm-100-disk-0 snapshot

This command can take a moment or two.

Now, every time we want the current state to be mirrored to the target cluster (site B) we need a snapshot. We can create them manually with:

rbd mirror image snapshot <pool>/<image>

Snapshot schedule

Since it would be cumbersome to always create mirror snapshots manually, we can define a snapshot schedule so they will be taken automatically.

rbd mirror snapshot schedule add --pool <pool> <interval>

For example, every 5 minutes:

root@site-a $ rbd mirror snapshot schedule add 5m

You can also use other suffixes for days (d) or hours (h) and specify it more explicitly for a single pool with the `--pool <pool>` parameter.

To verify the schedule run:

root@site-a $ rbd mirror snapshot schedule status

It can take a few moments for the newly created schedule to show up!

Journal based mirror

To enable journal based mirroring for an image, run the command with the `journal` mode. For example:

root@site-a $ rbd mirror image enable rbd/vm-100-disk-0 journal

This will automatically enable the `journal` feature for the image. Compare the output of

root@site-a $ rbd info <pool>/<image>

before and after you enable journal based mirroring for the first time.

Journal based mirroring also needs the `exclusive-lock` feature enabled for the images, which should be the default.

Yellowpin.svg Note: KRBD does not support journal based mirroring!

Last steps

Once the rbd-mirror is up and running, you should see a peer configured in the source cluster (site A):

root@site-a $ rbd mirror pool info <pool>
Mode: image
Site Name: ce99d398-91ab-4667-b4f2-307ba0bec358

Peer Sites: 

UUID: 87441fdf-3a61-4840-a869-34b25b47a964
Name: 44d5aca2-d47c-4f1f-bfa8-2c52281619ee
Mirror UUID: 1abf773b-6c95-420c-8ceb-35ee346521db
Direction: tx-only

On the target cluster (site B) you will see the image if you run

root@site-b $ rbd ls --pool <pool>

and if you used snapshot based mirroring, you should see snapshots appearing on the target cluster (site B) very quickly.

rbd snap ls --all --pool <pool> <image>

For example:

root@site-b $ rbd snap ls --all --pool rbd vm-100-disk-0

You can get detailed information about the mirroring by running:

root@site-b $ rbd mirror pool status <pool> --verbose

Failover Recovery

A common scenario is that the source cluster, site A in this guide, will have some kind of failure, and we want to fail over to the other cluster, site B.

You will have to make sure that the VM and container configuration files are synced to the other site yourself. For example, with a recurring rsync job. The container configuration files for each node are located at

/etc/pve/lxc

and for VMs in

/etc/pve/qemu-server

Make sure that no guest has anything configured that is specific to only the source cluster, like an ISO image or a storage used for the disk images.

If you would just try to start the guests on the remaining secondary cluster (site B), a container will not start, and a VM could start (if KRBD is disabled), but will report IO errors very quickly. This is due to the fact that the target images are marked as such (non-primary) and won't allow writing to them from our guests.

Promote images on site B

By promoting an image or a all images in a pool, we can tell Ceph that they are now the primary ones to be used. In a planned failover, we would first demote the images on site A before we promote the images on site B. In a recovery situation with site A down, we need to `--force` the promotion.

To promote a single image, run the following command:

root@site-b $ rbd mirror image promote <pool>/<image> --force

To promote all images in a pool, run the following command:

root@site-b $ rbd mirror pool promote <pool> --force

After this, our guests should start fine.

Resync and switch back to site A

Once site A is back up and operational, we want to plan our switch back. For this, we first need to demote the images on site A.

Yellowpin.svg Note: Do not start guests on site A at this point!

For all images in a pool:

root@site-a $ rbd mirror pool demote <pool>

For specific images:

root@site-a $ rbd mirror image demote <pool>/<image>

We also need to set up an RBD mirror daemon on site A that connects to site B (two-way mirror). If not done yet, now is the time to set this up. The steps are the same, but in reverse order.

Once the RBD mirror daemon on site A is up and running, the images need to be flagged for a resync. Until then, the RBD mirror daemon on site A will log problems. Run the following commands for each image (or script it):

$ rbd mirror image resync <pool>/<image>

After a short time, the images should be mirrored from site B to site A now. You can verify it by running

rbd mirror pool status <pool> --verbose

by checking the `last_update` line for each image.

If you want to move a guest back, make sure that the configuration on site A is still valid and hasn't changed during the time on site B.

Then power down the guest and wait for another successful mirroring to site A. Once we are sure that the disk images have been mirrored after we have shutdown the guest, we can demote the image(s) on site B and promote them on site A.

root@site-b $ rbd mirror image demote <pool>/<image>

Or for all primary images in a pool:

root@site-b $ rbd mirror pool demote <pool>

Promote single images on site A:

root@site-a $ rbd mirror image promote <pool>/<image>

Promote all non-primary images in a pool:

root@site-a $ rbd mirror pool promote <pool>

After a short time, we should see that the images on site A are now the primary ones and on site B that the images are mirrored again:

$ rbd mirror pool status <pool> --verbose