Ceph Luminous to Nautilus

From Proxmox VE
Revision as of 18:34, 4 July 2019 by Martin (talk | contribs) (Proxmox VE 6.0 beta1 release)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Yellowpin.svg Note: Proxmox VE 6.x with Ceph Nautilus is still in beta status

Introduction

This article explains how to upgrade from Ceph Luminous to Nautilus (14.2.0 or higher) on Proxmox VE 6.x.

For more information see Release Notes

Assumption

We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Luminous (12.2.12-pve1).

The cluster must be healthy and working.

Note

  • After upgrading to Proxmox VE 6.x and before upgrading to Ceph Nautilus,
  • Do not use the Proxmox VE 6.x tools for Ceph (pveceph), as they are not intended to work with Ceph Luminous.
  • If it's absolutely necessary to change the Ceph cluster before upgrading to Nautilus, use the Ceph native tools instead.
  • During the upgrade from Luminous to Nautilus it will not be possible to create a new OSD using a Luminous ceph-osd daemon after the monitors have been upgraded to Nautilus. Avoid adding or replacing any OSDs while the upgrade is in progress.
  • Avoid creating any RADOS pools while the upgrade is in progress.
  • You can monitor the progress of your upgrade anytime with the ceph versions command. This will tell you which Ceph version(s) are running for each type of daemon.

Cluster Preparation

If your cluster was originally installed with a version prior to Luminous, ensure that it has completed at least one full scrub of all PGs while running Luminous. Failure to do so will cause your monitor daemons to refuse to join the quorum on start, leaving them non-functional.

If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, check the state of your cluster by running:

ceph osd dump | grep ^flags

In order to be able to proceed to Nautilus, your OSD map must include the flags

  • recovery_deletes flag
  • purged_snapdirs flag

If your OSD map does not contain both these flags, you can simply wait for approximately 24-48 hours. In a standard cluster configuration this should be the ample time for all your placement groups to be scrubbed at least once. Then repeat the above process to recheck.

In case that you have just completed an upgrade to Luminous and want to proceed to Nautilus in short order, you can force a scrub on all placement groups with a one-line shell command, like:

ceph pg dump pgs_brief | cut -d " " -f 1 | xargs -n1 ceph pg scrub

Consider that this forced scrub may possibly have a negative impact on the performance of your Ceph clients.

Adapt /etc/pve/ceph.conf

Since Nautilus, all daemons use the 'keyring' option for its keyring, so you have to adapt this. The easiest way is to move the 'keyring' option into the 'client' section, and remove it everywhere else. Create the 'client' section if you don't have one.

For example:

From:

[global]
    ...
    keyring = /etc/pve/priv/$cluster.$name.keyring
[osd]
    keyring = /var/lib/ceph/osd/ceph-$id/keyring

To:

[global]
    ...
[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

Preparation on each Ceph cluster node

Change the current Ceph repositories from Luminous to Nautilus.

sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list

Your /etc/apt/sources.list.d/ceph.list should look like this

deb http://download.proxmox.com/debian/ceph-nautilus buster main

Set the 'noout' flag

Set the noout flag for the duration of the upgrade (optional, but recommended):

ceph osd set noout

Or via the GUI in the OSD tab.

Upgrade on each Ceph cluster node

Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Nautilus.

apt update && apt dist-upgrade

After the update you still run the old Luminous binaries.

Restart the monitor daemon

After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.

systemctl restart ceph-mon.target

Once all monitors are up, verify that the monitor upgrade is complete. Look for the nautilus string in the mon map. The command

ceph mon dump | grep min_mon_release

should report

min_mon_release 14 (nautilus)

If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.

Restart the manager daemons on all nodes

Then restart all managers on all nodes

systemctl restart ceph-mgr.target

Verify that the ceph-mgr daemons are running by checking ceph -s

ceph -s
...
 services:
  mon: 3 daemons, quorum foo,bar,baz
  mgr: foo(active), standbys: bar, baz
...

Restart the OSD daemon on all nodes

Important Steps before restarting OSD

If you have a cluster with IPv6 only, you need to set the following command in the global section of the ceph config

ms_bind_ipv4 = false
ms_bind_ipv6 = true

Otherwise, each OSD trys to bind to an IPv4 in addition to the IPv6 and fails if it cannot find an IPv4 address in the given public/cluster networks.

Next, restart all OSDs on all nodes

systemctl restart ceph-osd.target

On each host, tell ceph-volume to adapt the OSDs created with ceph-disk using the following two commands:

ceph-volume simple scan
ceph-volume simple activate --all

If you get a failure, your OSDs will not be recognized after a reboot.

To verify that the OSDs start up automatically, it's recommended that each OSD host is rebooted following the step above.

Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.

You will need to scan the main data partition for each ceph-disk OSD explicitly, if

  • the OSD isn’t currently running when the above scan command is run,
  • a ceph-disk-based OSD is moved to a new host,
  • the host OSD is reinstalled,
  • or the /etc/ceph/osd directory is lost.

For example:

ceph-volume simple scan /dev/sdb1

The output will include the appopriate ceph-volume simple activate command to enable the OSD.

Upgrade all CephFS MDS daemons

For each CephFS file system,

  1. Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
    ceph status
    ceph fs set <fs_name> max_mds 1
  2. Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:
    ceph status
  3. Take all standby MDS daemons offline on the appropriate hosts with:
    systemctl stop ceph-mds.target
  4. Confirm that only one MDS is online and is on rank 0 for your FS:
    ceph status
  5. Upgrade the last remaining MDS daemon by restarting the daemon:
    systemctl restart ceph-mds.target
  6. Restart all standby MDS daemons that were taken offline:
    systemctl start ceph-mds.target
  7. Restore the original value of max_mds for the volume:
    ceph fs set <fs_name> max_mds <original_max_mds>

Disallow pre-Nautilus OSDs and enable all new Nautilus-only functionality

ceph osd require-osd-release nautilus

Unset 'noout' and check cluster status

Unset the 'noout' flag. You can do this in the GUI or with this command.

ceph osd unset noout

Now check if your Ceph cluster is healthy.

ceph -s

Upgrade Tunables

If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:

ceph config set mon mon_crush_min_required_version firefly

If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:

ceph osd getcrushmap -o backup-crushmap
ceph osd crush set-all-straw-buckets-to-straw2

If there are problems, you can easily revert with:

ceph osd setcrushmap -i backup-crushmap

Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.

Enable msgrv2 protocol and update Ceph configuration

To enable the new v2 network protocol, issue the following command:

ceph mon enable-msgr2

This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run

ceph mon dump

and verify that each monitor has both a v2: and v1: address listed.

Updating /etc/pve/ceph.conf

For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.

It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:

[global]
    ...
    mon_host = 10.0.0.100 10.0.0.101 10.0.0.102
    ...

For details see: Messenger V2