Ceph Octopus to Pacific: Difference between revisions
mNo edit summary |
|||
(18 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
This article explains how to upgrade Ceph from Octopus to Pacific (16.2.4 or higher) on Proxmox VE 7.x. | This article explains how to upgrade Ceph from Octopus to Pacific (16.2.4 or higher) on Proxmox VE 7.x. | ||
The How-To must be read completely to the end and followed exactly and in the described order! | |||
For more information see | For more information see | ||
Line 18: | Line 18: | ||
*: If no, please see [[[[Upgrade from 6.x to 7.0]]]] guide. | *: If no, please see [[[[Upgrade from 6.x to 7.0]]]] guide. | ||
* The cluster must be '''healthy and working'''! | * The cluster must be '''healthy and working'''! | ||
* Read the [[#Known Issues|Known Issues]] section to avoid encountering them, for example when performing steps not described in this guide. | |||
== Enable msgrv2 protocol and update Ceph configuration == | == Enable msgrv2 protocol and update Ceph configuration == | ||
Line 30: | Line 31: | ||
and verify that each monitor has both a v2: and v1: address listed. | and verify that each monitor has both a v2: and v1: address listed. | ||
== Check if bluestore_fsck_quick_fix_on_mount is disabled == | |||
If you are upgrading to (the now outdated) v.'''16.2.6''' be aware that there is a [https://tracker.ceph.com/issues/53062 bug] in Ceph Pacific for the conversion of the on disk OMAP data that corrupts the data. Therefore, we need to make sure to disable it for the time being. Run the following command to get the current setting: | |||
ceph config get osd bluestore_fsck_quick_fix_on_mount | |||
If it returns <code>false</code>, everything is good and the OMAP data won't be converted. Should it return <code>true</code> and you plan to upgrade to the outdated v.'''16.2.6''', not the current v.'''16.2.7''', set it to <code>false</code> with the following command and verify again: | |||
ceph config set osd bluestore_fsck_quick_fix_on_mount false | |||
If you are upgrading to a later version, for example '''16.2.7 or higher''', this issue is fixed, and you should enable this option. The first start of the OSDs can take some time though as they are converting the on disk format. | |||
ceph config set osd bluestore_fsck_quick_fix_on_mount true | |||
== Preparation on each Ceph cluster node == | == Preparation on each Ceph cluster node == | ||
Line 93: | Line 108: | ||
== Restart the OSD daemon on all nodes == | == Restart the OSD daemon on all nodes == | ||
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for | Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for <code>omap</code> data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data). | ||
Best to restart the OSDs on one node at a time after | Best to restart the OSDs on one node at a time after | ||
Line 104: | Line 119: | ||
HEALTH_WARN | HEALTH_WARN | ||
noout flag(s) set | noout flag(s) set | ||
== Disallow pre-Pacific OSDs and enable all new Pacific-only functionality == | == Disallow pre-Pacific OSDs and enable all new Pacific-only functionality == | ||
Line 148: | Line 157: | ||
== Upgrade Tunables == | == Upgrade Tunables == | ||
{{Note|These are not needed if already done with the [[Ceph Nautilus to Octopus|Nautilus to Octopus]] upgrade.|info}} | |||
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with: | |||
# move to older minimum required version, only required if ceph complains | |||
ceph config set mon mon_crush_min_required_version firefly | ceph config set mon mon_crush_min_required_version firefly | ||
== | If Ceph does not complain about old CRUSH tunables, then we recommend that you also switch any existing CRUSH buckets to <code>straw2</code>, which was added back in the Hammer release. If you have any <code>straw</code> buckets, this will result in a modest amount of data movement, but generally nothing too severe: | ||
# create a backup first | |||
ceph osd getcrushmap -o backup-crushmap | |||
ceph osd crush set-all-straw-buckets-to-straw2 | |||
If there are problems, you can easily revert with: | |||
ceph osd setcrushmap -i backup-crushmap | |||
Moving to <code>straw2</code> buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus. | |||
== Known Issues == | |||
=== RocksDB resharding broken before 16.2.6 === | |||
Setting up RocksDB resharding after upgrading was broken until the <code>16.2.6</code> release. So please ensure that your running ceph in version <code>16.2.6</code> or newer before triggering any reshard. | |||
=== Monitor crashes after upgrade === | |||
For old clusters (pre-jewel) which did not use CephFS, it could be that the monitor will crash once updated to Pacific due to some old data structures still being around which it does not understand. Either follow the workaround in the [https://tracker.ceph.com/issues/51673#note-6 Ceph Bug tracker] or wait until [https://ceph.io/en/news/blog/2021/v15-2-14-octopus-released/ Ceph Octopus v15.2.14] can be installed before you upgrade to Pacific. | |||
=== Monitor crashes after minor 16.2.6 to 16.2.7 upgrade === | |||
For minor upgrades from 16.2.6 to 16.2.7 it is possible, that monitors will not start up anymore (always try to restart one at a time). In this case, try to add the following to the ''/etc/pve/ceph.conf'': | |||
[mon] | |||
mon_mds_skip_sanity = true | |||
Attempt another restart of the failed monitor. You can remove that setting once all monitors are up and running with the latest version. | |||
=== Data corruption on OMAP conversion === | |||
In version '''16.2.6''' there is a [https://tracker.ceph.com/issues/53062 bug in the OMAP conversion code] that can cause data corruption. Therefore, make sure, before you upgrade or restart the OSDs after the upgrade, that the automatic conversion is disabled: | |||
ceph config set osd bluestore_fsck_quick_fix_on_mount false | |||
With version '''16.2.7''', this has been fixed. If you did disable it earlier, enable the option now and restart your OSDs one at a time. It is possible that they need a bit longer to come back up if they need to convert the on disk data. | |||
ceph config set osd bluestore_fsck_quick_fix_on_mount true | |||
== See Also == | == See Also == | ||
Line 160: | Line 209: | ||
* [[Ceph Nautilus to Octopus]] | * [[Ceph Nautilus to Octopus]] | ||
[[Category: HOWTO]][[Category: Installation]] | [[Category: HOWTO]][[Category: Installation]][[Category: Ceph Upgrade]] |
Latest revision as of 14:22, 31 May 2023
Introduction
This article explains how to upgrade Ceph from Octopus to Pacific (16.2.4 or higher) on Proxmox VE 7.x.
The How-To must be read completely to the end and followed exactly and in the described order!
For more information see Release Notes
Assumption
We assume that all nodes are on the latest Proxmox VE 7.0 (or higher) version and Ceph is on version Octopus (15.2.13-pve1 or higher).
- Ceph version is 15.2.x Octopus
- If not, please see the Ceph Nautilus to Octopus upgrade guide.
- Note: While in theory one could upgrade from Ceph Nautilus to Pacific directly, Proxmox VE only supports the upgrade from Octopus to Pacific.
- Already upgraded to Proxmox VE 7.x
- If no, please see [[Upgrade from 6.x to 7.0]] guide.
- The cluster must be healthy and working!
- Read the Known Issues section to avoid encountering them, for example when performing steps not described in this guide.
Enable msgrv2 protocol and update Ceph configuration
If you did not already do so when you upgraded to Nautilus or Octopus, you must enable the new v2 network protocol. Issue the following command:
ceph mon enable-msgr2
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run
ceph mon dump
and verify that each monitor has both a v2: and v1: address listed.
Check if bluestore_fsck_quick_fix_on_mount is disabled
If you are upgrading to (the now outdated) v.16.2.6 be aware that there is a bug in Ceph Pacific for the conversion of the on disk OMAP data that corrupts the data. Therefore, we need to make sure to disable it for the time being. Run the following command to get the current setting:
ceph config get osd bluestore_fsck_quick_fix_on_mount
If it returns false
, everything is good and the OMAP data won't be converted. Should it return true
and you plan to upgrade to the outdated v.16.2.6, not the current v.16.2.7, set it to false
with the following command and verify again:
ceph config set osd bluestore_fsck_quick_fix_on_mount false
If you are upgrading to a later version, for example 16.2.7 or higher, this issue is fixed, and you should enable this option. The first start of the OSDs can take some time though as they are converting the on disk format.
ceph config set osd bluestore_fsck_quick_fix_on_mount true
Preparation on each Ceph cluster node
Change the current Ceph repositories from Octopus to Pacific.
sed -i 's/octopus/pacific/' /etc/apt/sources.list.d/ceph.list
Your /etc/apt/sources.list.d/ceph.list should now look like this
deb http://download.proxmox.com/debian/ceph-pacific bullseye main
Set the 'noout' flag
Set the noout flag for the duration of the upgrade (optional, but recommended):
ceph osd set noout
Or via the GUI in the OSD tab (Manage Global Flags).
Upgrade on each Ceph cluster node
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Pacific.
apt update apt full-upgrade
After the update, your setup will still be running the old Octopus binaries.
Restart the monitor daemon
Note: You can use the web-interface or the command-line to restart ceph services. |
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Pacific string in the mon map. The command
ceph mon dump | grep min_mon_release
should report
min_mon_release 16 (pacific)
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.
Restart the manager daemons on all nodes
Then restart all managers on all nodes
systemctl restart ceph-mgr.target
Verify that the ceph-mgr daemons are running by checking ceph -s
ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), standbys: bar, baz ...
Restart the OSD daemon on all nodes
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for omap
data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).
Best to restart the OSDs on one node at a time after
systemctl restart ceph-osd.target
Wait after each restart and periodically checking the status of the cluster:
ceph status
It should be in HEALTH_OK or
HEALTH_WARN noout flag(s) set
Disallow pre-Pacific OSDs and enable all new Pacific-only functionality
ceph osd require-osd-release pacific
NOTE: Missing this step breaks starting OSD from which have their required release on Ceph Luminous or older (for example, if you upgraded from Luminous -> Nautilus -> Octopus)
Upgrade all CephFS MDS daemons
For each CephFS file system,
- Make sure only one MDS is running
- The default installation uses one active MDS. To check if this is the case on your cluster, check the output of ceph status and verify that there is only one active MDS.
- Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
ceph status ceph fs get <fs_name> | grep max_mds ceph fs set <fs_name> max_mds 1
- Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:
ceph status
- Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
- Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
- Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
- Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
- Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Unset the 'noout' flag
Once the upgrade process is finished, don't forget to unset the noout flag.
ceph osd unset noout
Or via the GUI in the OSD tab (Manage Global Flags).
Upgrade Tunables
Note: These are not needed if already done with the Nautilus to Octopus upgrade. |
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:
# move to older minimum required version, only required if ceph complains ceph config set mon mon_crush_min_required_version firefly
If Ceph does not complain about old CRUSH tunables, then we recommend that you also switch any existing CRUSH buckets to straw2
, which was added back in the Hammer release. If you have any straw
buckets, this will result in a modest amount of data movement, but generally nothing too severe:
# create a backup first ceph osd getcrushmap -o backup-crushmap ceph osd crush set-all-straw-buckets-to-straw2
If there are problems, you can easily revert with:
ceph osd setcrushmap -i backup-crushmap
Moving to straw2
buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.
Known Issues
RocksDB resharding broken before 16.2.6
Setting up RocksDB resharding after upgrading was broken until the 16.2.6
release. So please ensure that your running ceph in version 16.2.6
or newer before triggering any reshard.
Monitor crashes after upgrade
For old clusters (pre-jewel) which did not use CephFS, it could be that the monitor will crash once updated to Pacific due to some old data structures still being around which it does not understand. Either follow the workaround in the Ceph Bug tracker or wait until Ceph Octopus v15.2.14 can be installed before you upgrade to Pacific.
Monitor crashes after minor 16.2.6 to 16.2.7 upgrade
For minor upgrades from 16.2.6 to 16.2.7 it is possible, that monitors will not start up anymore (always try to restart one at a time). In this case, try to add the following to the /etc/pve/ceph.conf:
[mon] mon_mds_skip_sanity = true
Attempt another restart of the failed monitor. You can remove that setting once all monitors are up and running with the latest version.
Data corruption on OMAP conversion
In version 16.2.6 there is a bug in the OMAP conversion code that can cause data corruption. Therefore, make sure, before you upgrade or restart the OSDs after the upgrade, that the automatic conversion is disabled:
ceph config set osd bluestore_fsck_quick_fix_on_mount false
With version 16.2.7, this has been fixed. If you did disable it earlier, enable the option now and restart your OSDs one at a time. It is possible that they need a bit longer to come back up if they need to convert the on disk data.
ceph config set osd bluestore_fsck_quick_fix_on_mount true