Ceph Nautilus to Octopus

From Proxmox VE
Revision as of 09:22, 27 August 2020 by A.antreich (talk | contribs) (→‎PG count warning for pools: add link to ceph documentation for further explanation of the autoscaler)
Jump to navigation Jump to search
Yellowpin.svg Note: This is still work in progress, Ceph Octopus packages are only available through the test repository.

Introduction

This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.

For more information see Release Notes

Assumption

We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher). If not see the Ceph Luminous to Nautilus upgrade guide.

Yellowpin.svg Note: It is not possible to upgrade from Ceph Luminous to Octopus directly.

The cluster must be healthy and working!

Note

  • During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).

Preparation on each Ceph cluster node

NOTE: Currently (July 2020) the packages are not yet available on the "main" repository, only on the "test" one.

Change the current Ceph repositories from Nautilus to Octopus.

sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list

Your /etc/apt/sources.list.d/ceph.list should look like this

deb http://download.proxmox.com/debian/ceph-octopus buster main

Set the 'noout' flag

Set the noout flag for the duration of the upgrade (optional, but recommended):

ceph osd set noout

Or via the GUI in the OSD tab (Manage Global Flags).

Upgrade on each Ceph cluster node

Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.

apt update
apt dist-upgrade

After the update you still run the old Nautilus binaries.

Restart the monitor daemon

Yellowpin.svg Note: You can use the web-interface or the command-line to restart ceph services.

After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.

systemctl restart ceph-mon.target

Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command

ceph mon dump | grep min_mon_release

should report

min_mon_release 15 (octopus)

If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.

Restart the manager daemons on all nodes

Then restart all managers on all nodes

systemctl restart ceph-mgr.target

Verify that the ceph-mgr daemons are running by checking ceph -s

ceph -s
...
 services:
  mon: 3 daemons, quorum foo,bar,baz
  mgr: foo(active), standbys: bar, baz
...

Restart the OSD daemon on all nodes

Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).

You can disable this automatic conversion with:

 ceph config set osd bluestore_fsck_quick_fix_on_mount false

But the conversion should be made as soon as possible.

Disallow pre-Octopus OSDs and enable all new Octopus-only functionality

ceph osd require-osd-release octopus

Upgrade all CephFS MDS daemons

For each CephFS file system,

  1. Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
    ceph status
    ceph fs set <fs_name> max_mds 1
  2. Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:
    ceph status
  3. Take all standby MDS daemons offline on the appropriate hosts with:
    systemctl stop ceph-mds.target
  4. Confirm that only one MDS is online and is on rank 0 for your FS:
    ceph status
  5. Upgrade the last remaining MDS daemon by restarting the daemon:
    systemctl restart ceph-mds.target
  6. Restart all standby MDS daemons that were taken offline:
    systemctl start ceph-mds.target
  7. Restore the original value of max_mds for the volume:
    ceph fs set <fs_name> max_mds <original_max_mds>

Unset the 'noout' flag

Once the upgrade process is finished, don't forget to unset the noout flag.

ceph osd unset noout

Or via the GUI in the OSD tab (Manage Global Flags).

Upgrade Tunables

If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:

ceph config set mon mon_crush_min_required_version firefly

If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:

# create a backup first
ceph osd getcrushmap -o backup-crushmap
ceph osd crush set-all-straw-buckets-to-straw2

If there are problems, you can easily revert with:

ceph osd setcrushmap -i backup-crushmap

Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.

Enable msgrv2 protocol and update Ceph configuration

If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:

ceph mon enable-msgr2

This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run

ceph mon dump

and verify that each monitor has both a v2: and v1: address listed.

PG count warning for pools

The PG autoscaler feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.

Additionally, you may need to enable the PG autoscaler for upgraded pools:

 ceph osd pool set POOLNAME pg_autoscale_mode on