Ceph Quincy to Reef: Difference between revisions

Revision as of 08:14, 21 September 2023

Introduction

This article explains how to upgrade Ceph from Quincy (17.2+) to Reef (18.2+) on Proxmox VE 8.

Important Release Notes

Note: Filestore OSDs are deprecated. Before you proceed, destroy your Filestore OSDs and recreate them to be Bluestore OSDs one by one.

A health warning is now reported if the require-osd-release flag is not set to the appropriate release after a cluster upgrade.

For more information, see Release Notes

Assumption

We assume that all nodes are on the latest Proxmox VE 8.0 (or higher) version and Ceph is on version Quincy (17.2.6-pve1+3 or higher). If not, see the Ceph Pacific to Quincy upgrade guide.

Note: While in theory it is possible to upgrade from the older Ceph Pacific (16.2+) to Reef (18.2+) release directly, we do not provide builds of Ceph Pacific for Proxmox VE 8, making this impossible

The cluster must be healthy and working!

Note: All commands starting with ceph need to be run only once. It doesn't matter on which node in the Ceph cluster.

Enable msgrv2 protocol and update Ceph configuration

If you did not already do so when you upgraded to Nautilus, Octopus or Pacific, you must enable the new v2 network protocol. Issue the following command:

ceph mon enable-msgr2

This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run

ceph mon dump

and verify that each monitor has both a v2: and v1: address listed.

Preparation on each Ceph cluster node

Please note also that currently (2023-09-15) Ceph Reef is only available on as test and no-subscription repository. While we do not know of any issue from the Ceph Reef release, we still recommend that production systems hold out until we make the release available on our enterprise repository.

Change the current Ceph repositories from Quincy to Reef.

sed -i 's/quincy/reef/' /etc/apt/sources.list.d/ceph.list

Note that the main repository does not exist anymore and is now split into a public no-subscription and a for production recommended enterprise repository. The latter is accessible with any Proxmox VE Subscription.

Your /etc/apt/sources.list.d/ceph.list should now look like this

deb https://enterprise.proxmox.com/debian/ceph-reef bookworm enterprise

Note, with Proxmox VE 8 we introduced an enterprise repository for Ceph, which is accessible with a valid Proxmox VE subscription. If you do not have a valid subscription you can use the publicly available no-subscription or test repositories, for example:

deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription

Set the 'noout' flag

Set the noout flag for the duration of the upgrade (optional, but recommended):

ceph osd set noout

Or via the GUI in the OSD tab (Manage Global Flags).

Upgrade on each Ceph cluster node

Upgrade all your nodes with the following commands or by installing the latest updates via the GUI. It will upgrade the Ceph on your node to Reef.

apt update
apt full-upgrade

After the update, your setup will still be running the old Pacific binaries.

Restart the monitor daemon

Note: You can use the web-interface or the command-line to restart ceph services.

After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.

systemctl restart ceph-mon.target

Do so one node at a time. Wait after each restart and periodically check the status of the cluster:

ceph -s

It should be in HEALTH_OK or

HEALTH_WARN
noout flag(s) set

Once all monitors are up, verify that the monitor upgrade is complete. Look for the Reef string in the mon map. The command

ceph mon dump | grep min_mon_release

should report

min_mon_release 18 (reef)

If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.

Restart the manager daemons on all nodes

If the managers did not automatically restart with the monitors, restart them now on all nodes

systemctl restart ceph-mgr.target

Verify that the ceph-mgr daemons are running by checking ceph -s

ceph -s

...
 services:
  mon: 3 daemons, quorum foo,bar,baz
  mgr: foo(active), standbys: bar, baz
...

Restart the OSD daemon on all nodes

Restart all OSDs. Only restart OSDs on one node at a time to avoid loss of data redundancy. To restart all OSDs on a node, run the following command:

systemctl restart ceph-osd.target

Wait after each restart and periodically checking the status of the cluster:

ceph status

It should be in HEALTH_OK or

HEALTH_WARN
noout flag(s) set

Once all OSDs are running with the latest versions, the following warning can appear:

all OSDs are running reef or later but require_osd_release < reef

Disallow pre-Reef OSDs and enable all new Reef-only functionality

ceph osd require-osd-release reef

Upgrade all CephFS MDS daemons

For each CephFS file system you need to apply the following steps. Please note that you can list the file systems with ceph fs ls or check the web UI under Node -> Ceph -> CephFS.

Disable standby_replay

ceph fs set <fs_name> allow_standby_replay false

If you have increases the ranks (maximal MDS instances active per a single CephFS instance) for some CephFS instances, you must reduce all instances to a single rank (set max_mds to 1) before you continue.
Please note that if you plan to restore the rank later, first take notes of the original number of MDS daemons.
```
ceph status
ceph fs get <fs_name> | grep max_mds
ceph fs set <fs_name> max_mds 1
```
Wait for the cluster to deactivate any extra active MDS (ranks) by periodically checking the status of Ceph.:
```
ceph status
```
The number of active MDS should go down to the number of file systems you have, i.e., only one active MDS for each file system.

Alternatively, check in the MDS list in the CephFS panel on the web UI that each Ceph filesystem has only one active MDS
Stop all standby MDS daemons.
You can do so via either the CephFS panel on the web UI, or alternatively, by using the following CLI command
```
systemctl stop ceph-mds@ID.service
```
(for a single ID)
Confirm that only one MDS is online and is on rank 0 for your FS:
```
ceph status
```
Upgrade all remaining (active) MDS daemons and restart the standby ones in one go by restarting the whole systemd MDS-target via CLI:
```
systemctl restart ceph-mds.target
```
If you had a higher rank set, you can now restore the original rank value (max_mds) for the file system instance again:
```
ceph fs set <fs_name> max_mds <original_max_mds>
```

Unset the 'noout' flag

Once the upgrade process is finished, don't forget to unset the noout flag.

ceph osd unset noout

Or via the GUI in the OSD tab (Manage Global Flags).

Notes

When restarting a MGR, log lines containing "has missing NOTIFY_TYPES member" can be ignored

@@ Line 132: / Line 132: @@
 == Upgrade all CephFS MDS daemons ==
-For each CephFS file system, (you can list the file systems with <code>ceph fs ls</code>)
+For each CephFS file system you need to apply the following steps.
+Please note that you can list the file systems with <code>ceph fs ls</code> or check the web UI under Node -> Ceph -> CephFS.
-# Disable standby_replay
+# Disable <code>standby_replay</code>
 #:<pre>ceph fs set <fs_name> allow_standby_replay false</pre>
-# Reduce the number of [https://docs.ceph.com/en/latest/cephfs/standby/#terminology ranks] to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
+# If you have increases the [https://docs.ceph.com/en/latest/cephfs/standby/#terminology ranks] (maximal MDS instances active per a single CephFS instance) for some CephFS instances, you must reduce all instances to a single rank (set <code>max_mds</code> to 1) before you continue.
+#:Please note that if you plan to restore the rank later, first take notes of the original number of MDS daemons.
 #:<pre>ceph status&#10;ceph fs get <fs_name> | grep max_mds&#10;ceph fs set <fs_name> max_mds 1</pre>
-#:With a rank higher than 1 you will see more than one MDS active for that Ceph FS.
+# Wait for the cluster to deactivate any extra active MDS (ranks) by periodically checking the status of Ceph.:
-# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status of Ceph.:
 #:<pre>ceph status</pre>
-#:The number of active MDS should go down to the number of file systems you have
+#:The number of active MDS should go down to the number of file systems you have, i.e., only one active MDS for each file system.
-#:Alternatively, check in the CephFS panel in the GUI that each Ceph filesystem has only one active MDS
+#:Alternatively, check in the MDS list in the CephFS panel on the web UI that each Ceph filesystem has only one active MDS
-# Take all standby MDS daemons offline on the appropriate hosts with:
+# Stop all ''standby'' MDS daemons.
-#:<pre>systemctl stop ceph-mds.target</pre>
+#: You can do so via either the CephFS panel on the web UI, or alternatively, by using the following CLI command
+#:<pre>systemctl stop ceph-mds@ID.service</pre> (for a single ID)
 # Confirm that only one MDS is online and is on rank 0 for your FS:
 #:<pre>ceph status</pre>
-# Upgrade the last remaining MDS daemon by restarting the daemon:
+# Upgrade all remaining (active) MDS daemons and restart the standby ones in one go by restarting the whole systemd MDS-target via CLI:
 #:<pre>systemctl restart ceph-mds.target</pre>
-# Restart all standby MDS daemons that were taken offline:
+# If you had a higher rank set, you can now restore the original rank value (<code>max_mds</code>) for the file system instance again:
-#:<pre>systemctl start ceph-mds.target</pre>
-# Restore the original value of max_mds for the volume:
 #:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre>

Ceph Quincy to Reef: Difference between revisions

Revision as of 08:14, 21 September 2023

Contents

Introduction

Important Release Notes

Assumption

Enable msgrv2 protocol and update Ceph configuration

Preparation on each Ceph cluster node

Set the 'noout' flag

Upgrade on each Ceph cluster node

Restart the monitor daemon

Restart the manager daemons on all nodes

Restart the OSD daemon on all nodes

Disallow pre-Reef OSDs and enable all new Reef-only functionality

Upgrade all CephFS MDS daemons

Unset the 'noout' flag

Notes

See Also

Navigation menu

Ceph Quincy to Reef: Difference between revisions

Revision as of 08:14, 21 September 2023

Introduction

Important Release Notes

Assumption

Enable msgrv2 protocol and update Ceph configuration

Preparation on each Ceph cluster node

Set the 'noout' flag

Upgrade on each Ceph cluster node

Restart the monitor daemon

Restart the manager daemons on all nodes

Restart the OSD daemon on all nodes

Disallow pre-Reef OSDs and enable all new Reef-only functionality

Upgrade all CephFS MDS daemons

Unset the 'noout' flag

Notes

See Also

Navigation menu

Search