Ceph Pacific to Quincy
Introduction
This article explains how to upgrade Ceph from Pacific to Quincy (17.2.0 or higher) on Proxmox VE 7.2 and newer 7.x releases.
Important Release Notes
- Filestore OSDs are deprecated. Before you proceed, destroy your Filestore OSDs and recreate them to be Bluestore OSDs one by one.
- The support for LevelDB has been dropped in Quincy. Bluestore OSDs should always be using RocksDB, but old monitors that were set up prior to Luminous (v12) could still be using LevelDB. Verify it by running
head /var/lib/ceph/mon/*/kv_backend
on your Ceph monitor hosts. The result should be "rocksdb". If it is not, destroy and recreate that monitor.
- The device_health_metrics pool has been renamed to .mgr. It is now used as a common store for all ceph-mgr modules. After upgrading to Quincy, the device_health_metrics pool will be renamed to .mgr on existing clusters.
- A health warning is now reported if the require-osd-release flag is not set to the appropriate release after a cluster upgrade.
For more information, see Release Notes
Assumption
We assume that all nodes are on the latest Proxmox VE 7.2 (or higher) version and Ceph is on version Pacific (16.2.9-pve1 or higher). If not, see the Ceph Octopus Pacific upgrade guide.
- Read the Known Issues section to avoid encountering them, for example when performing steps not described in this guide.
Note: While in theory it is possible to upgrade from Ceph Octopus to Quincy directly, we highly recommend upgrading to Pacific first. |
The cluster must be healthy and working!
Enable msgrv2 Protocol and Update Ceph Configuration
If you did not already do so when you upgraded to Nautilus, Octopus or Pacific, you must enable the new v2 network protocol. Issue the following command:
ceph mon enable-msgr2
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run
ceph mon dump
and verify that each monitor has both a v2: and v1: address listed.
Preparation on each Ceph Cluster Node
Change the current Ceph repositories from Pacific to Quincy.
sed -i 's/pacific/quincy/' /etc/apt/sources.list.d/ceph.list
Your /etc/apt/sources.list.d/ceph.list should now look like this
deb http://download.proxmox.com/debian/ceph-quincy bullseye main
Set the 'noout' Flag
Set the noout flag for the duration of the upgrade (optional, but recommended):
ceph osd set noout
Or via the GUI in the OSD tab (Manage Global Flags).
Upgrade on each Ceph Cluster Node
Upgrade all your nodes with the following commands or by installing the latest updates via the GUI. It will upgrade the Ceph on your node to Quincy.
apt update apt full-upgrade
After the update, your setup will still be running the old Pacific binaries.
Restart the Monitor Daemon
Note: You can use the web-interface or the command-line to restart ceph services. |
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Quincy string in the mon map. The command
ceph mon dump | grep min_mon_release
should report
min_mon_release 17 (quincy)
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.
Restart the Manager Daemons on all Nodes
If the monitors did not automatically restart with the monitors, restart them now on all nodes
systemctl restart ceph-mgr.target
Verify that the ceph-mgr daemons are running by checking ceph -s
ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), standbys: bar, baz ...
Restart the OSD Daemon on all Nodes
Restart all OSDs. Only restart OSDs on one node at a time to avoid loss of data redundancy. To restart all OSDs on a node, run the following command:
systemctl restart ceph-osd.target
Wait after each restart and periodically checking the status of the cluster:
ceph status
It should be in HEALTH_OK or
HEALTH_WARN noout flag(s) set
Once all OSDs are running with the latest versions, the following warning can appear:
all OSDs are running quincy or later but require_osd_release < quincy
Disallow pre-Quincy OSDs and Enable all new Quincy-only Functionality
ceph osd require-osd-release quincy
Upgrade all CephFS MDS Daemons
For each CephFS file system,
- Disable standby_replay
ceph fs set <fs_name> allow_standby_replay false
- Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
ceph status ceph fs get <fs_name> | grep max_mds ceph fs set <fs_name> max_mds 1
- With a rank higher than 1 you will see more than one MDS active for that Ceph FS.
- Wait for the cluster to deactivate any non-zero ranks by periodically checking the status of Ceph.:
ceph status
- The number of active MDS should go down to the number of file systems you have
- Alternatively, check in the CephFS panel in the GUI that each Ceph filesystem has only one active MDS
- Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
- Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
- Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
- Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
- Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Unset the 'noout' Flag
Once the upgrade process is finished, don't forget to unset the noout flag.
ceph osd unset noout
Or via the GUI in the OSD tab (Manage Global Flags).
Notes
- When restarting a MGR, log lines containing "has missing NOTIFY_TYPES member" can be ignored
Known Issues
Guest images are stored on pool device_health_metrics
If the guest images are stored in the "device_health_metrics" pool, they will be broken after the upgrade!
To avoid the issue, create a new Ceph Pool with the "Add Storage" option enabled. Then use the "Disk Action -> Move Storage" for VMs or "Volume Actions -> Move Storage" for containers to move the guest images away from the "device_health_metrics" pool before you upgrade to Quincy.