Ceph Nautilus to Octopus: Difference between revisions
mNo edit summary |
|||
(20 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2. | This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.8 or higher) on Proxmox VE 6.x. | ||
For more information see | For more information see | ||
Line 8: | Line 6: | ||
== Assumption == | == Assumption == | ||
We assume that all nodes are on the latest Proxmox VE 6. | We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher). | ||
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide. | If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide. | ||
Line 19: | Line 17: | ||
== Preparation on each Ceph cluster node == | == Preparation on each Ceph cluster node == | ||
Change the current Ceph repositories from Nautilus to Octopus. | Change the current Ceph repositories from Nautilus to Octopus. | ||
Line 40: | Line 36: | ||
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus. | Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus. | ||
apt update | apt update | ||
apt | apt full-upgrade | ||
After the update you still run the old Nautilus binaries. | After the update you still run the old Nautilus binaries. | ||
Line 81: | Line 77: | ||
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data). | Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data). | ||
Best to restart the OSDs on one node at a time after | |||
systemctl restart ceph-osd.target | |||
Wait after each restart and periodically checking the status of the cluster: | |||
ceph status | |||
It should be in '''HEALTH_OK''' or | |||
HEALTH_WARN | |||
noout flag(s) set | |||
You can disable this automatic conversion with: | You can disable this automatic conversion with: | ||
Line 91: | Line 97: | ||
ceph osd require-osd-release octopus | ceph osd require-osd-release octopus | ||
'''NOTE''': Missing this step breaks starting OSD from which have their required release on Ceph Luminous or older (for example, if you upgraded from Luminous -> Nautilus -> Octopus) | |||
== Upgrade all CephFS MDS daemons == | == Upgrade all CephFS MDS daemons == | ||
Line 97: | Line 105: | ||
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).: | # Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).: | ||
#:<pre>ceph status ceph fs set <fs_name> max_mds 1</pre> | #:<pre>ceph status ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS. | ||
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status: | # Wait for the cluster to deactivate any non-zero ranks by periodically checking the status: | ||
#:<pre>ceph status</pre> | #:<pre>ceph status</pre> | ||
Line 119: | Line 127: | ||
== Upgrade Tunables == | == Upgrade Tunables == | ||
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with: | If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with: | ||
ceph config set mon mon_crush_min_required_version firefly | ceph config set mon mon_crush_min_required_version firefly | ||
Line 147: | Line 155: | ||
and verify that each monitor has both a v2: and v1: address listed. | and verify that each monitor has both a v2: and v1: address listed. | ||
== PG count warning for pools == | == Placement Group (PG) count warning for pools == | ||
The PG autoscaler feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to | The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention. | ||
You may want to enable the PG autoscaler for upgraded pools. It is advisable to check the recommendation of the autoscaler before activation. And additionally see the discussion on the forum [https://forum.proxmox.com/threads/ceph-octopus-upgrade-notes-think-twice-before-enabling-auto-scale.80105 Ceph Octopus upgrade notes - Think twice before enabling auto scale]. | |||
ceph osd pool set POOLNAME pg_autoscale_mode on | ceph osd pool set POOLNAME pg_autoscale_mode on | ||
[[Category: HOWTO]][[Category: Installation]] | Be aware that enabling the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in '''higher load and resource usage''', you should enable it one pool at a time. | ||
If they are not providing the expected/wished results, you can adjust the calculation of the autoscaler by [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-expected-pool-size specifying the expected pool size] and/or a [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-bounds-on-a-pool-s-pgs lower bound] on the number of PGs. | |||
== Resolving the `insecure global_id reclaim` Warning == | |||
With Ceph Octopus version 15.2.11 we released an update to fix a security issue (CVE-2021-20288) where Ceph was not ensuring that reconnecting/renewing clients were presenting an existing ticket when reclaiming their global_id value. | |||
An attacker that was able to authenticate could claim a global_id in use by a different client and potentially disrupt other cluster services.''' | |||
'''Affected Versions''': | |||
* for server: all previous versions | |||
* for clients: | |||
** kernel: none | |||
** user-space: all since (and including) Luminous 12.2.0 | |||
'''Attacker Requirements/Impact''': | |||
Don't panic, the risk on a default Proxmox VE managed ceph setup is rather low, we still recommend upgrading in a timely manner. | |||
Any attacker would require all of the following points: | |||
* have a valid authentication key for the cluster | |||
* know or guess the global_id of another client | |||
* run a modified version of the Ceph client code to reclaim another client’s global_id | |||
* construct appropriate client messages or requests to disrupt service or exploit Ceph daemon assumptions about global_id uniqueness | |||
=== Addressing the Health Warnings === | |||
You will then still see two HEALTH warnings: | |||
# <code>client is using insecure global_id reclaim</code> | |||
# <code>mons are allowing insecure global_id reclaim</code> | |||
To address those you need to first either ensure all VMs using ceph on a storage without KRBD run the newer client library. For that, either fully restart the VMs (reboot over API or stop ad start), or migrate them to another node in the cluster that has that ceph update already installed. | |||
You also need to restart the pvestatd and pvedaemon Proxmox VE daemons accessing the ceph cluster periodically to gather status data or to execute API calls. Either use the web-interface (Node -> System) or the command-line: | |||
systemctl try-reload-or-restart pvestatd.service pvedaemon.service | |||
Next you can resolve the monitor warning by enforcing the stricter behavior that is possible now. | |||
Execute the following command on one of the nodes in the Proxmox VE Ceph cluster: | |||
ceph config set mon auth_allow_insecure_global_id_reclaim false | |||
Note: As said, that will cut-off any old client after the ticket validity times out (72h), so only execute that once the client warning was resolved and disappeared. | |||
See the following forum post for details and discussion: | |||
https://forum.proxmox.com/threads/ceph-nautilus-and-octopus-security-update-for-insecure-global_id-reclaim-cve-2021-20288.88038/#post-385756 | |||
== See Also == | |||
* [[Ceph Luminous to Nautilus]] | |||
[[Category: HOWTO]] [[Category: Installation]][[Category: Ceph Upgrade]] |
Latest revision as of 14:22, 31 May 2023
Introduction
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.8 or higher) on Proxmox VE 6.x.
For more information see Release Notes
Assumption
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher). If not see the Ceph Luminous to Nautilus upgrade guide.
Note: It is not possible to upgrade from Ceph Luminous to Octopus directly. |
The cluster must be healthy and working!
Note
- During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).
Preparation on each Ceph cluster node
Change the current Ceph repositories from Nautilus to Octopus.
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list
Your /etc/apt/sources.list.d/ceph.list should look like this
deb http://download.proxmox.com/debian/ceph-octopus buster main
Set the 'noout' flag
Set the noout flag for the duration of the upgrade (optional, but recommended):
ceph osd set noout
Or via the GUI in the OSD tab (Manage Global Flags).
Upgrade on each Ceph cluster node
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.
apt update apt full-upgrade
After the update you still run the old Nautilus binaries.
Restart the monitor daemon
Note: You can use the web-interface or the command-line to restart ceph services. |
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command
ceph mon dump | grep min_mon_release
should report
min_mon_release 15 (octopus)
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.
Restart the manager daemons on all nodes
Then restart all managers on all nodes
systemctl restart ceph-mgr.target
Verify that the ceph-mgr daemons are running by checking ceph -s
ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), standbys: bar, baz ...
Restart the OSD daemon on all nodes
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).
Best to restart the OSDs on one node at a time after
systemctl restart ceph-osd.target
Wait after each restart and periodically checking the status of the cluster:
ceph status
It should be in HEALTH_OK or
HEALTH_WARN noout flag(s) set
You can disable this automatic conversion with:
ceph config set osd bluestore_fsck_quick_fix_on_mount false
But the conversion should be made as soon as possible.
Disallow pre-Octopus OSDs and enable all new Octopus-only functionality
ceph osd require-osd-release octopus
NOTE: Missing this step breaks starting OSD from which have their required release on Ceph Luminous or older (for example, if you upgraded from Luminous -> Nautilus -> Octopus)
Upgrade all CephFS MDS daemons
For each CephFS file system,
- Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:
ceph status ceph fs set <fs_name> max_mds 1
The default installation uses one active MDS. To check if this is the case on your cluster, check the output of ceph status and very that there is only one active MDS.
- Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:
ceph status
- Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
- Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
- Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
- Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
- Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Unset the 'noout' flag
Once the upgrade process is finished, don't forget to unset the noout flag.
ceph osd unset noout
Or via the GUI in the OSD tab (Manage Global Flags).
Upgrade Tunables
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:
ceph config set mon mon_crush_min_required_version firefly
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:
# create a backup first ceph osd getcrushmap -o backup-crushmap ceph osd crush set-all-straw-buckets-to-straw2
If there are problems, you can easily revert with:
ceph osd setcrushmap -i backup-crushmap
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.
Enable msgrv2 protocol and update Ceph configuration
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:
ceph mon enable-msgr2
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run
ceph mon dump
and verify that each monitor has both a v2: and v1: address listed.
Placement Group (PG) count warning for pools
The PG autoscaler feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.
You may want to enable the PG autoscaler for upgraded pools. It is advisable to check the recommendation of the autoscaler before activation. And additionally see the discussion on the forum Ceph Octopus upgrade notes - Think twice before enabling auto scale.
ceph osd pool set POOLNAME pg_autoscale_mode on
Be aware that enabling the PG autoscaler will trigger recovery on merging/splitting PGs. This will result in higher load and resource usage, you should enable it one pool at a time.
If they are not providing the expected/wished results, you can adjust the calculation of the autoscaler by specifying the expected pool size and/or a lower bound on the number of PGs.
Resolving the `insecure global_id reclaim` Warning
With Ceph Octopus version 15.2.11 we released an update to fix a security issue (CVE-2021-20288) where Ceph was not ensuring that reconnecting/renewing clients were presenting an existing ticket when reclaiming their global_id value. An attacker that was able to authenticate could claim a global_id in use by a different client and potentially disrupt other cluster services.
Affected Versions:
- for server: all previous versions
- for clients:
- kernel: none
- user-space: all since (and including) Luminous 12.2.0
Attacker Requirements/Impact: Don't panic, the risk on a default Proxmox VE managed ceph setup is rather low, we still recommend upgrading in a timely manner. Any attacker would require all of the following points:
- have a valid authentication key for the cluster
- know or guess the global_id of another client
- run a modified version of the Ceph client code to reclaim another client’s global_id
- construct appropriate client messages or requests to disrupt service or exploit Ceph daemon assumptions about global_id uniqueness
Addressing the Health Warnings
You will then still see two HEALTH warnings:
client is using insecure global_id reclaim
mons are allowing insecure global_id reclaim
To address those you need to first either ensure all VMs using ceph on a storage without KRBD run the newer client library. For that, either fully restart the VMs (reboot over API or stop ad start), or migrate them to another node in the cluster that has that ceph update already installed. You also need to restart the pvestatd and pvedaemon Proxmox VE daemons accessing the ceph cluster periodically to gather status data or to execute API calls. Either use the web-interface (Node -> System) or the command-line:
systemctl try-reload-or-restart pvestatd.service pvedaemon.service
Next you can resolve the monitor warning by enforcing the stricter behavior that is possible now. Execute the following command on one of the nodes in the Proxmox VE Ceph cluster:
ceph config set mon auth_allow_insecure_global_id_reclaim false
Note: As said, that will cut-off any old client after the ticket validity times out (72h), so only execute that once the client warning was resolved and disappeared.
See the following forum post for details and discussion: https://forum.proxmox.com/threads/ceph-nautilus-and-octopus-security-update-for-insecure-global_id-reclaim-cve-2021-20288.88038/#post-385756