Ceph Luminous to Nautilus
Note: Proxmox VE 6.x with Ceph Nautilus is still in beta status |
Introduction
This article explains how to upgrade from Ceph Luminous to Nautilus (14.2.0 or higher) on Proxmox VE 6.x.
For more information see Release Notes
Assumption
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Luminous (12.2.12-pve1).
The cluster must be healthy and working.
Note
- After upgrading to Proxmox VE 6.x and before upgrading to Ceph Nautilus,
- Do not use the Proxmox VE 6.x tools for Ceph (pveceph), as they are not intended to work with Ceph Luminous.
- If it's absolutely necessary to change the Ceph cluster before upgrading to Nautilus, use the Ceph native tools instead.
- During the upgrade from Luminous to Nautilus it will not be possible to create a new OSD using a Luminous ceph-osd daemon after the monitors have been upgraded to Nautilus. Avoid adding or replacing any OSDs while the upgrade is in progress.
- Avoid creating any RADOS pools while the upgrade is in progress.
- You can monitor the progress of your upgrade anytime with the ceph versions command. This will tell you which Ceph version(s) are running for each type of daemon.
Cluster Preparation
If your cluster was originally installed with a version prior to Luminous, ensure that it has completed at least one full scrub of all PGs while running Luminous. Failure to do so will cause your monitor daemons to refuse to join the quorum on start, leaving them non-functional.
If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, check the state of your cluster by running:
ceph osd dump | grep ^flags
In order to be able to proceed to Nautilus, your OSD map must include the flags
- recovery_deletes flag
- purged_snapdirs flag
If your OSD map does not contain both these flags, you can simply wait for approximately 24-48 hours. In a standard cluster configuration this should be the ample time for all your placement groups to be scrubbed at least once. Then repeat the above process to recheck.
In case that you have just completed an upgrade to Luminous and want to proceed to Nautilus in short order, you can force a scrub on all placement groups with a one-line shell command, like:
ceph pg dump pgs_brief | cut -d " " -f 1 | xargs -n1 ceph pg scrub
Consider that this forced scrub may possibly have a negative impact on the performance of your Ceph clients.
Adapt /etc/pve/ceph.conf
Since Nautilus, all daemons use the 'keyring' option for its keyring, so you have to adapt this. The easiest way is to move the 'keyring' option into the 'client' section, and remove it everywhere else. Create the 'client' section if you don't have one.
For example:
From:
[global] ... keyring = /etc/pve/priv/$cluster.$name.keyring [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring
To:
[global] ... [client] keyring = /etc/pve/priv/$cluster.$name.keyring
Preparation on each Ceph cluster node
Change the current Ceph repositories from Luminous to Nautilus.
sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list
Your /etc/apt/sources.list.d/ceph.list should look like this
deb http://download.proxmox.com/debian/ceph-nautilus buster main
Set the 'noout' flag
Set the noout flag for the duration of the upgrade (optional, but recommended):
ceph osd set noout
Or via the GUI in the OSD tab.
Upgrade on each Ceph cluster node
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Nautilus.
apt update && apt dist-upgrade
After the update you still run the old Luminous binaries.
Restart the monitor daemon
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Once all monitors are up, verify that the monitor upgrade is complete. Look for the nautilus string in the mon map. The command
ceph mon dump | grep min_mon_release
should report
min_mon_release 14 (nautilus)
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.
Restart the manager daemons on all nodes
Then restart all managers on all nodes
systemctl restart ceph-mgr.target
Verify that the ceph-mgr daemons are running by checking ceph -s
ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), standbys: bar, baz ...
Restart the OSD daemon on all nodes
Important Steps before restarting OSD
If you have a cluster with IPv6 only, you need to set the following command in the global section of the ceph config
ms_bind_ipv4 = false ms_bind_ipv6 = true
Otherwise, each OSD trys to bind to an IPv4 in addition to the IPv6 and fails if it cannot find an IPv4 address in the given public/cluster networks.
Next, restart all OSDs on all nodes
systemctl restart ceph-osd.target
On each host, tell ceph-volume to adapt the OSDs created with ceph-disk using the following two commands:
ceph-volume simple scan ceph-volume simple activate --all
If you get a failure, your OSDs will not be recognized after a reboot.
To verify that the OSDs start up automatically, it's recommended that each OSD host is rebooted following the step above.
Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.
You will need to scan the main data partition for each ceph-disk OSD explicitly, if
- the OSD isn’t currently running when the above scan command is run,
- a ceph-disk-based OSD is moved to a new host,
- the host OSD is reinstalled,
- or the /etc/ceph/osd directory is lost.
For example:
ceph-volume simple scan /dev/sdb1
The output will include the appopriate ceph-volume simple activate command to enable the OSD.
Upgrade all CephFS MDS daemons
For each CephFS file system,
- Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
ceph status ceph fs set <fs_name> max_mds 1
- Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:
ceph status
- Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
- Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
- Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
- Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
- Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Disallow pre-Nautilus OSDs and enable all new Nautilus-only functionality
ceph osd require-osd-release nautilus
Unset 'noout' and check cluster status
Unset the 'noout' flag. You can do this in the GUI or with this command.
ceph osd unset noout
Now check if your Ceph cluster is healthy.
ceph -s
Upgrade Tunables
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:
ceph config set mon mon_crush_min_required_version firefly
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:
ceph osd getcrushmap -o backup-crushmap ceph osd crush set-all-straw-buckets-to-straw2
If there are problems, you can easily revert with:
ceph osd setcrushmap -i backup-crushmap
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.
Enable msgrv2 protocol and update Ceph configuration
To enable the new v2 network protocol, issue the following command:
ceph mon enable-msgr2
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run
ceph mon dump
and verify that each monitor has both a v2: and v1: address listed.
Updating /etc/pve/ceph.conf
For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.
It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:
[global] ... mon_host = 10.0.0.100 10.0.0.101 10.0.0.102 ...
For details see: Messenger V2