Ceph Reef to Squid: Difference between revisions
m (→Assumption) |
m (drop very prominent sticky note after talking with aaron.) |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
Line 30: | Line 28: | ||
# TODO: switch this from the "no-subscription" repo to the "enterprise" one | # TODO: switch this from the "no-subscription" repo to the "enterprise" one | ||
# once Ceph Squid graduates from tech-preview to fully supported by Proxmox. | # once Ceph Squid graduates from tech-preview to fully supported by Proxmox. | ||
deb | deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription | ||
<!-- TODO: uncomment once enterprise repo is available | <!-- TODO: uncomment once enterprise repo is available | ||
Line 38: | Line 36: | ||
deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription | deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription | ||
--> | --> | ||
== Set the 'noout' flag == | == Set the 'noout' flag == | ||
Line 153: | Line 152: | ||
If you are comfortable with the data that is being sent, you can opt-in to automatically report the high-level cluster metadata with: | If you are comfortable with the data that is being sent, you can opt-in to automatically report the high-level cluster metadata with: | ||
ceph telemetry on | ceph telemetry on --license sharing-1-0 | ||
You will most likely get a notification that not all telemetry channels are enabled. To enable the <code>perf</code> channel run: | |||
ceph telemetry enable channel perf | |||
The public dashboard that aggregates Ceph telemetry can be found at https://telemetry-public.ceph.com/. | The public dashboard that aggregates Ceph telemetry can be found at https://telemetry-public.ceph.com/. |
Latest revision as of 11:17, 21 November 2024
Introduction
This article explains how to upgrade Ceph from Reef (18.2+) to Squid (19.2+) on Proxmox VE 8.
Important Release Notes
Please read the upstream release notes closely.
Assumption
We assume that all nodes are on the latest Proxmox VE 8.2 (or higher) version and Ceph is on version Reef (18.2.4-pve3
or higher).
If not, see the Ceph Quincy to Reef upgrade guide.
Note, while it is possible to upgrade from the older Ceph Quincy (17.2+) to Squid (19.2+) release directly, but we primarily test and recommend upgrading to Ceph Reef first before upgrading to Ceph Squid. If you want to skip one upgrade we recommend testing this first on a non-production setup. The upgrade steps are the same as for upgrading from Reef to Squid, but you must ensure that you got no FileStore based OSD left, as FileStore support was removed with Ceph 18.2 Reef.
The cluster must be healthy and working!
Note: All commands starting with ceph need to be run only once. It doesn't matter on which node in the Ceph cluster.
|
Preparation on each Ceph cluster node
Change the current Ceph repositories from Reef to Squid.
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
Your /etc/apt/sources.list.d/ceph.list should now look like this
# TODO: switch this from the "no-subscription" repo to the "enterprise" one # once Ceph Squid graduates from tech-preview to fully supported by Proxmox. deb http://download.proxmox.com/debian/ceph-squid bookworm no-subscription
Set the 'noout' flag
Set the noout flag for the duration of the upgrade (optional, but recommended):
ceph osd set noout
Or via the web UI in the OSD tab (Manage Global Flags).
Upgrade on each Ceph cluster node
Upgrade all your nodes with the following commands or by installing the latest updates via the GUI. It will upgrade the Ceph on your node to Squid.
apt update apt full-upgrade
After the update, your setup will still be running the old Quincy binaries.
Restart the monitor daemon
Note: You can use the web-interface or the command-line to restart ceph services. |
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.
systemctl restart ceph-mon.target
Do so one node at a time. Wait after each restart and periodically check the status of the cluster:
ceph -s
It should be in HEALTH_OK or
HEALTH_WARN noout flag(s) set
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Squid string in the mon map. The command
ceph mon dump | grep min_mon_release
should report
min_mon_release 19 (squid)
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.
Restart the manager daemons on all nodes
If the managers did not automatically restart with the monitors, restart them now on all nodes
systemctl restart ceph-mgr.target
Verify that the ceph-mgr daemons are running by checking ceph -s
ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), standbys: bar, baz ...
Restart the OSD daemon on all nodes
Restart all OSDs. Only restart OSDs on one node at a time to avoid loss of data redundancy. To restart all OSDs on a node, run the following command:
systemctl restart ceph-osd.target
Wait after each restart and periodically checking the status of the cluster:
ceph status
It should be in HEALTH_OK or
HEALTH_WARN noout flag(s) set
Once all OSDs are running with the latest versions, the following warning can appear:
all OSDs are running squid or later but require_osd_release < squid
Disallow pre-Reef OSDs and enable all new Squid-only functionality
Note: Before raising the minimum required OSD version, you should ensure all OSDs got upgraded successfully and report running a Ceph 19.2 version.
ceph osd require-osd-release squid
Upgrade all CephFS MDS daemons
For each CephFS file system, (you can list the file systems with ceph fs ls
)
- Disable standby_replay
ceph fs set <fs_name> allow_standby_replay false
- Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons). This is only necessary if you use more than one MDS per CephFS:
ceph status ceph fs get <fs_name> | grep max_mds ceph fs set <fs_name> max_mds 1
- With a rank higher than 1 you will see more than one MDS active for that Ceph FS.
- Wait for the cluster to deactivate any non-zero ranks by periodically checking the status of Ceph.:
ceph status
- The number of active MDS should go down to the number of file systems you have
- Alternatively, check in the CephFS panel in the GUI that each Ceph filesystem has only one active MDS
- Take all standby MDS daemons offline on the appropriate hosts with:
systemctl stop ceph-mds.target
- Confirm that only one MDS is online and is on rank 0 for your FS:
ceph status
- Upgrade the last remaining MDS daemon by restarting the daemon:
systemctl restart ceph-mds.target
- Restart all standby MDS daemons that were taken offline:
systemctl start ceph-mds.target
- Restore the original value of max_mds for the volume:
ceph fs set <fs_name> max_mds <original_max_mds>
Unset the 'noout' flag
Once the upgrade process is finished, don't forget to unset the noout flag.
ceph osd unset noout
Or via the GUI in the OSD tab (Manage Global Flags).
Consider Enabling Telemetry
Enabling the telemetry module will send anonymized usage statistics and crash information to the Ceph upstream developers. To see what would be reported (with sending anything), you can run:
ceph telemetry preview-all
If you are comfortable with the data that is being sent, you can opt-in to automatically report the high-level cluster metadata with:
ceph telemetry on --license sharing-1-0
You will most likely get a notification that not all telemetry channels are enabled. To enable the perf
channel run:
ceph telemetry enable channel perf
The public dashboard that aggregates Ceph telemetry can be found at https://telemetry-public.ceph.com/.
Known Issues
- iSCSI users are advised that the upstream developers of Ceph encountered a bug during an upgrade from Ceph 19.1.1 to Ceph 19.2.0. Read Tracker Issue 68215 before attempting an upgrade to 19.2.0.