https://pve.proxmox.com/mediawiki/api.php?action=feedcontributions&user=A.antreich&feedformat=atomProxmox VE - User contributions [en]2024-03-28T23:57:17ZUser contributionsMediaWiki 1.35.11https://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10958Ceph Nautilus to Octopus2021-01-11T10:19:55Z<p>A.antreich: /* Placement Group (PG) count warning for pools */ add forum link to autoscaler discussion</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart and periodically checking the status of the cluster:<br />
ceph status<br />
<br />
It should be in '''HEALTH_OK''' or<br />
HEALTH_WARN<br />
noout flag(s) set<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== Placement Group (PG) count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.<br />
<br />
You may want to enable the PG autoscaler for upgraded pools. It is advisable to check the recommendation of the autoscaler before activation. And additionally see the discussion on the forum [https://forum.proxmox.com/threads/ceph-octopus-upgrade-notes-think-twice-before-enabling-auto-scale.80105 Ceph Octopus upgrade notes - Think twice before enabling auto scale].<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
Be aware that enabling the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in '''higher load and resource usage''', you should enable it one pool at a time.<br />
<br />
If they are not providing the expected/wished results, you can adjust the calculation of the autoscaler by [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-expected-pool-size specifying the expected pool size] and/or a [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-bounds-on-a-pool-s-pgs lower bound] on the number of PGs.<br />
<br />
== See Also ==<br />
<br />
* [[Ceph Luminous to Nautilus]]<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10934Ceph Nautilus to Octopus2020-12-04T16:57:11Z<p>A.antreich: /* PG count warning for pools */ more cautious wording and how to adjust the pg_autoscaler</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart and periodically checking the status of the cluster:<br />
ceph status<br />
<br />
It should be in '''HEALTH_OK''' or<br />
HEALTH_WARN<br />
noout flag(s) set<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.<br />
<br />
You may want to enable the PG autoscaler for upgraded pools. It is advisable to check the recommendation of the autoscaler before activation.<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
Be aware that the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in higher load and resource usage. And should be done a pool at a time.<br />
<br />
If they are not providing the expected/wished results, you can adjust the calculation of the autoscaler by [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-expected-pool-size specifying the expected pool size] and/or a [https://docs.ceph.com/en/octopus/rados/operations/placement-groups/#specifying-bounds-on-a-pool-s-pgs lower bound] on the number of PGs.<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10933Ceph Nautilus to Octopus2020-12-04T13:13:49Z<p>A.antreich: /* Restart the OSD daemon on all nodes */ add HEALTH_WARN when noout is set</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart and periodically checking the status of the cluster:<br />
ceph status<br />
<br />
It should be in '''HEALTH_OK''' or<br />
HEALTH_WARN<br />
noout flag(s) set<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.<br />
<br />
You may want to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
Be aware that the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in higher load and resource usage. And should be done a pool at a time.<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10932Ceph Nautilus to Octopus2020-12-04T11:05:36Z<p>A.antreich: /* PG count warning for pools */ be more explicit about pg_autoscale_mode on</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.<br />
<br />
You may want to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
Be aware that the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in higher load and resource usage. And should be done a pool at a time.<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10931Ceph Nautilus to Octopus2020-12-04T10:58:58Z<p>A.antreich: /* PG count warning for pools */ added an additional warning to reflect higher resource usage</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.3 (or higher) version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature, introduced in Nautilus, is enabled for new pools by default, allowing new clusters to auto-tune pg num without any user intervention.<br />
<br />
You may want to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
Be aware that the PG autoscaler will trigger '''recovery''' on merging/splitting PGs. This will result in higher load and resource usage.<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10896Ceph Nautilus to Octopus2020-11-25T08:25:42Z<p>A.antreich: /* Upgrade Tunables */ mark that the tunables need change coming from a very old ceph version</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are '''older than Hammer''', Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10895Ceph Nautilus to Octopus2020-11-25T08:13:34Z<p>A.antreich: /* Upgrade all CephFS MDS daemons */ add remark about MDS ranks and how to verify</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre>The default installation uses one active MDS. To check if this is the case on your cluster, check the output of '''ceph status''' and very that there is only one active MDS.<br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10894Ceph Nautilus to Octopus2020-11-25T08:07:40Z<p>A.antreich: /* Upgrade on each Ceph cluster node */ change to full-upgrade</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt full-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10893Ceph Nautilus to Octopus2020-11-25T08:06:39Z<p>A.antreich: /* Restart the OSD daemon on all nodes */ add health check for service restart</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time after <br />
systemctl restart ceph-osd.target<br />
<br />
Wait after each restart for the cluster to be in '''HEALTH_OK''' by periodically checking the status:<br />
ceph status<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10892Ceph Nautilus to Octopus2020-11-25T08:02:44Z<p>A.antreich: /* Restart the OSD daemon on all nodes */ add osd target for restart</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
Best to restart the OSDs on one node at a time.<br />
systemctl restart ceph-osd.target<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_RBD_Mirroring&diff=10843Ceph RBD Mirroring2020-09-24T07:53:32Z<p>A.antreich: docs.ceph.com migrated to readthedocs, minor uri change</p>
<hr />
<div>This page describes how to use ''rbd-mirror'' to mirror Ceph images to another Ceph cluster in a one-way-mirror.<br />
<br />
For more details on the used commands check the official [https://docs.ceph.com/en/nautilus/rbd/rbd-mirroring/ Ceph Documentation].<br />
<br />
== Requirements ==<br />
* Two Ceph clusters<br />
* Nodes on both clusters can connect to the nodes in the other cluster<br />
* At least one pool in both clusters. Pools that should be mirrored need the same name.<br />
* '''rbd-mirror''' installed on the backup cluster ONLY ('''apt install rbd-mirror''').<br />
''rbd-mirror'' can be installed and used on multiple nodes on the backup cluster. For simplicity this guide is using only one backup node with ''rbd-mirror'' installed.<br />
<br />
== Introduction ==<br />
<br />
This guide assumes that you have two clusters, one called '''master''' which contains images that are used in production and a '''backup cluster''' to which the images are mirrored for disaster recovery. The general idea is, that one or more ''rbd-mirror daemons'' on the backup cluster are pulling changes from the master cluster.<br />
<br />
This approach should be appropriate to maintain a crash consistent copy of the original image. It will not allow you to failback to the master cluster. You need two-way mirroring for that. You can set it up at the time you want to failback.<br />
<br />
== Prepare pool ==<br />
<br />
=== Image features ===<br />
<br />
Only images with the ''exclusive-lock'' and ''journaling'' feature will be mirrored. Because ''journaling'' depends on ''exclusive-lock'' you need to enable both features.<br />
<br />
To check whether or not these features are already enabled on an image run the following command on the master cluster:<br />
<br />
<pre><br />
# rbd info <your_pool_name>/<your_vm_disk_image><br />
</pre><br />
<br />
e.g.:<br />
<br />
<pre><br />
# rbd info data/vm-100-disk-0<br />
</pre><br />
<br />
To enable a feature:<br />
<br />
<pre><br />
# rbd feature enable data/vm-100-disk-0 journaling<br />
</pre><br />
<br />
You need to do this for every image you want to mirror.<br />
<br />
=== Mirror mode ===<br />
<br />
The next step is to set the mirroring mode on each pool you want to mirror.<br />
You can choose between ''pool'' mode and ''image'' mode. This has to be done on '''both clusters''' on the corresponding pools.<br />
<pre><br />
# rbd mirror pool enable <your_pool_name> <mode><br />
</pre><br />
<br />
e.g.:<br />
<br />
<pre><br />
# rbd mirror pool enable data pool<br />
</pre><br />
<br />
== User creation ==<br />
<br />
On one of the monitor hosts of the master cluster create a user:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/master.client.rbd-mirror.master.keyring<br />
</pre><br />
<br />
'''Note:'''<br />
You can restrict this to a specific pool if you write 'profile rbd pool=data'<br />
<br />
== Copy configs and keys ==<br />
<br />
Copy the ''ceph.conf'' file from the master cluster to the backup cluster's ''/etc/ceph/'' directory under the name of '''master.conf''' (be careful to not overwrite your backup cluster's ceph.conf file!).<br />
<br />
<pre><br />
# scp /etc/ceph/ceph.conf root@<rbd-mirror-node>:/etc/ceph/master.conf<br />
</pre><br />
<br />
Copy the previously generated keyring-file (''master.client.rbd-mirror.master.keyring'') to the backup cluster's ''/etc/pve/priv/'' directory.<br />
<pre><br />
# scp /etc/pve/priv/master.client.rbd-mirror.master.keyring root@<rbd-mirror-node>:/etc/pve/priv/<br />
</pre><br />
While each cluster sees itself as ''ceph'' the backup cluster sees the master cluster as ''master''. This is set by the name of the config and keyring file.<br />
<br />
== Create client ID ==<br />
<br />
On a node of the backup cluster create a unique client id to be used for each rbd-mirror-daemon instance:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring<br />
</pre><br />
<br />
== Start rbd-mirror daemon ==<br />
<br />
You should now be able to start the daemon (as root).<br />
<br />
Run the following on the ''rbd-mirror'' node in the backup cluster:<br />
<br />
<pre><br />
# systemctl enable ceph-rbd-mirror.target<br />
# cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service<br />
# sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service<br />
# systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service<br />
# systemctl start ceph-rbd-mirror@rbd-mirror.backup.service<br />
</pre><br />
The replacement of the ceph user in the unit file is only necessary if you put the keyring file under ''/etc/pve/priv/'' (to have the file available cluster-wide), as the user ceph can't access that directory. Ceph searches by default in ''/etc/ceph/''.<br />
<br />
== Add peer ==<br />
<br />
In the backup cluster add the master pool as peer:<br />
<br />
<pre><br />
# rbd mirror pool peer add <pool_name> <master_client_id>@<name_of_master_cluster><br />
</pre><br />
e.g.<br />
<pre><br />
# rbd mirror pool peer add data client.rbd-mirror.master@master<br />
</pre><br />
<br />
== Verify ==<br />
<br />
Verify that the peering succeeded by the following command:<br />
<br />
<pre><br />
# rbd mirror pool info <pool_name><br />
</pre><br />
e.g<br />
<pre><br />
# rbd mirror pool info data<br />
</pre><br />
<br />
This should print the peer and the mirror mode if all went well, the UUID is necessary if you want to remove the peer in the future.<br />
<br />
You should now see each image in your backup cluster which is marked with the journaling feature in the master cluster. You can verify the current mirror state by the following command: <br />
<pre><br />
# rbd mirror pool status data --verbose<br />
</pre><br />
<br />
If you want to switch to the backup cluster, you need to promote the backup images to primary images. This should only be done when your master cluster crashed or you took the necessary steps on the master cluster before switching e.g. demoting the images on the master cluster.<br />
<br />
Please also check out Ceph's rbd-mirror documentation.<br />
http://docs.ceph.com/en/nautilus/rbd/rbd-mirroring/<br />
<br />
[[Category:HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Nautilus_to_Octopus&diff=10822Ceph Nautilus to Octopus2020-08-27T09:22:37Z<p>A.antreich: /* PG count warning for pools */ add link to ceph documentation for further explanation of the autoscaler</p>
<hr />
<div>{{Note|This is still work in progress, Ceph Octopus packages are only available through the test repository.|warn}}<br />
<br />
== Introduction ==<br />
This article explains how to upgrade Ceph from Nautilus to Octopus (15.2.3 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[https://docs.ceph.com/docs/master/releases/octopus/#upgrading-from-mimic-or-nautilus Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Nautilus (14.2.9-pve1 or higher).<br />
If not see the [https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Ceph Luminous to Nautilus] upgrade guide.<br />
<br />
{{Note|It is '''not''' possible to upgrade from Ceph Luminous to Octopus directly.|warn}}<br />
<br />
'''The cluster must be healthy and working!'''<br />
<br />
== Note ==<br />
* During the upgrade from Nautilus to Octopus the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data).<br />
<br />
== Preparation on each Ceph cluster node ==<br />
<br />
'''NOTE''': Currently (July 2020) the packages are '''not yet available''' on the "main" repository, '''only''' on the '''"test"''' one.<br />
<br />
Change the current Ceph repositories from Nautilus to Octopus.<br />
<br />
sed -i 's/nautilus/octopus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-octopus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Nautilus binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
{{Note|You can use the web-interface or the command-line to restart ceph services.|info}}<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the Octopus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 15 (octopus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Important: After the upgrade, the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. It may take a few minutes or up to a few hours (eg. on HDD with lots of omap data).<br />
<br />
You can disable this automatic conversion with:<br />
<br />
ceph config set osd bluestore_fsck_quick_fix_on_mount false<br />
<br />
But the conversion should be made as soon as possible.<br />
<br />
== Disallow pre-Octopus OSDs and enable all new Octopus-only functionality ==<br />
<br />
ceph osd require-osd-release octopus<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Unset the 'noout' flag ==<br />
Once the upgrade process is finished, don't forget to unset the noout flag.<br />
<br />
ceph osd unset noout<br />
<br />
Or via the GUI in the OSD tab (Manage Global Flags).<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe:<br />
<br />
# create a backup first<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Nautilus.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
If you did not already do so when you upgraded to Nautilus, we recommend enabling the new v2 network protocol. Issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
== PG count warning for pools ==<br />
The [https://docs.ceph.com/docs/octopus/rados/operations/placement-groups/#autoscaling-placement-groups PG autoscaler] feature introduced in Nautilus is enabled for new pools by default, allowing new clusters to autotune pg num without any user intervention.<br />
<br />
Additionally, you may need to enable the PG autoscaler for upgraded pools:<br />
ceph osd pool set POOLNAME pg_autoscale_mode on<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Luminous_to_Nautilus&diff=10705Ceph Luminous to Nautilus2020-05-12T08:25:03Z<p>A.antreich: /* Cluster Preparation */ change of scrub command and emphasis</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade from Ceph Luminous to Nautilus (14.2.0 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[http://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Luminous (12.2.12-pve1).<br />
<br />
The cluster must be healthy and working.<br />
<br />
== Note ==<br />
<br />
* After upgrading to Proxmox VE 6.x and before upgrading to Ceph Nautilus,<br />
:*Do not use the Proxmox VE 6.x tools for Ceph (pveceph), as they are not intended to work with Ceph Luminous.<br />
:*If it's absolutely necessary to change the Ceph cluster before upgrading to Nautilus, use the Ceph native tools instead.<br />
<br />
* During the upgrade from Luminous to Nautilus it will not be possible to create a new OSD using a Luminous ceph-osd daemon after the monitors have been upgraded to Nautilus. Avoid adding or replacing any OSDs while the upgrade is in progress.<br />
<br />
* Avoid creating any RADOS pools while the upgrade is in progress.<br />
* You can monitor the progress of your upgrade anytime with the ceph versions command. This will tell you which Ceph version(s) are running for each type of daemon.<br />
<br />
== Cluster Preparation ==<br />
If your cluster was originally installed with a version prior to Luminous, '''ensure that it has completed at least one full scrub of all PGs while running Luminous'''. Failure to do so will cause your monitor daemons to refuse to join the quorum on start, leaving them non-functional.<br />
<br />
If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, check the state of your cluster by running:<br />
<br />
ceph osd dump | grep ^flags<br />
<br />
In order to be able to proceed to Nautilus, your OSD map must include the flags<br />
* recovery_deletes flag<br />
* purged_snapdirs flag<br />
<br />
If your OSD map does not contain both these flags, you can simply '''wait for approximately 24-48 hours'''. In a standard cluster configuration this should be the ample time for all your placement groups to be scrubbed at least once. Then repeat the above process to recheck.<br />
<br />
In case that you have just completed an upgrade to Luminous and want to proceed to Nautilus in short order, you can force a scrub on all placement groups with the following command, like:<br />
<br />
ceph osd scrub all<br />
<br />
Consider that this forced scrub may possibly have a negative impact on the performance of your Ceph clients. And verify afterwards that the above mentioned flags are set after the scrub has finished.<br />
<br />
=== Adapt /etc/pve/ceph.conf ===<br />
Since Nautilus, all daemons use the 'keyring' option for its keyring, so you have to adapt this.<br />
The easiest way is to move the global 'keyring' option into the 'client' section, and remove it everywhere else.<br />
Create the 'client' section if you don't have one.<br />
<br />
For example:<br />
<br />
From:<br />
[global]<br />
...<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
[osd]<br />
keyring = /var/lib/ceph/osd/ceph-$id/keyring<br />
<br />
To:<br />
[global]<br />
...<br />
[client]<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
<br />
== Preparation on each Ceph cluster node ==<br />
Change the current Ceph repositories from Luminous to Nautilus.<br />
<br />
sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-nautilus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab.<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Nautilus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Luminous binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the nautilus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 14 (nautilus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
'''Important Steps before restarting OSD'''<br />
<br />
If you have a cluster with IPv6 only, you need to set the following command in the global section of the ceph config<br />
<br />
ms_bind_ipv4 = false<br />
ms_bind_ipv6 = true<br />
<br />
Otherwise, each OSD trys to bind to an IPv4 in addition to the IPv6 and fails if it cannot find an IPv4 address in the given public/cluster networks.<br />
<br />
Next, restart all OSDs on all nodes<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
On each host, tell ceph-volume to adapt the OSDs created with ceph-disk using the following two commands:<br />
<br />
ceph-volume simple scan<br />
ceph-volume simple activate --all<br />
<br />
If you get a failure, your OSDs will not be recognized after a reboot.<br />
<br />
* One of such failures can be '''Required devices (block and data) not present for bluestore''' This may happen if you have filestore OSDs, due to a bug in Ceph tooling (see http://wordpress.hawkless.id.au/index.php/2019/05/10/ceph-nautilus-required-devices-block-and-data-not-present-for-bluestore/).<br />
** To fix it, edit the ''/etc/ceph/osd/{OSDID}-GUID.json'' files created for each filestore OSD and add a line (check syntax is correct JSON, each attrib has to end in ''',''' except the last one).<br />
“type”: “filestore”<br />
* Run again: ceph-volume simple activate --all<br />
<br />
To verify that the OSDs start up automatically, it's recommended that each OSD host is rebooted following the step above.<br />
<br />
Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.<br />
<br />
You will need to scan the main data partition for each ceph-disk OSD explicitly, if<br />
*the OSD isn’t currently running when the above scan command is run, <br />
*a ceph-disk-based OSD is moved to a new host, <br />
*the host OSD is reinstalled, <br />
*or the /etc/ceph/osd directory is lost. <br />
<br />
For example:<br />
<br />
ceph-volume simple scan /dev/sdb1<br />
The output will include the appopriate ceph-volume simple activate command to enable the OSD.<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Disallow pre-Nautilus OSDs and enable all new Nautilus-only functionality ==<br />
<br />
ceph osd require-osd-release nautilus<br />
<br />
== Unset 'noout' and check cluster status ==<br />
<br />
Unset the 'noout' flag.<br />
You can do this in the GUI or with this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if your Ceph cluster is healthy.<br />
ceph -s<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:<br />
<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
To enable the new v2 network protocol, issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
'''Updating /etc/pve/ceph.conf'''<br />
<br />
For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.<br />
<br />
It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:<br />
<br />
[global]<br />
...<br />
mon_host = 10.0.0.100 10.0.0.101 10.0.0.102<br />
...<br />
<br />
For details see: [http://docs.ceph.com/docs/nautilus/rados/configuration/msgr2/#msgr2-ceph-conf Messenger V2]<br />
<br />
== Legacy BlueStore stats reporting ==<br />
After the upgrade, '''ceph -s''' may show the below message.<br />
HEALTH_WARN Legacy BlueStore stats reporting detected on 6 OSD(s)<br />
In Ceph Nautilus 14.2.0 the pool utilization stats reported (ceph df) changed. This change needs an on-disk format change on the Bluestore OSDs.<br />
<br />
To get the new stats format, the OSDs need to be manually "repaired". This will change the on-disk format. Alternatively, the OSDs can be destroyed and recreated, but this will create more recovery traffic.<br />
systemctl stop ceph-osd@<N>.service <br />
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-<N>/<br />
systemctl start ceph-osd@<N>.service <br />
Once all OSDs are "repaired" the health warning will disappear.<br />
<br />
== Command-line Interface ==<br />
see https://ceph.com/rbd/new-in-nautilus-rbd-performance-monitoring/<br />
<br />
enable<br />
<pre><br />
ceph mgr module enable rbd_support<br />
</pre><br />
then these are avail<br />
rbd perf image iotop<br />
<br />
rbd perf image iostat<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Nested_Virtualization&diff=10695Nested Virtualization2020-04-08T12:37:35Z<p>A.antreich: /* PVE as nested Hypervisor */ reword section and add note for hyper-v</p>
<hr />
<div>== What is ==<br />
Nested virtualization is when you run an hypervisor, like PVE or others, inside a virtual machine (which is of course running on another hypervisor) instead that on real hardware. In other words, you have a host hypervisor, hosting a guest hypervisor (as a vm), which can hosts its own vms. <br />
<br />
This obviously adds an overhead to the nested environment, but it could be useful in some cases: <br />
* it could let you test (or learn) how to manage hypervisors before actual implementation, or test some dangerous/tricky procedure involving hypervisors berfore actually doing it on the real thing. <br />
* it could enable businesses to deploy their own virtualization environment, e.g. on public services (cloud), see also http://www.ibm.com/developerworks/cloud/library/cl-nestedvirtualization/<br />
<br />
== Requirements ==<br />
In order to have the fastest possible performance, near to native, any hypervisor should have access to some (real) hardware features that are generally useful for virtualization, the so called 'hardware-assisted virtualization extensions' (see http://en.wikipedia.org/wiki/Hardware-assisted_virtualization).<br />
<br />
In nested virtualization, also the guest hypervisor should have access to hardware-assisted virtualization extensions, and that implies that the host hypervisor should expose those extension to its virtual machines. In principle it works without those extensions too but with poor performance and it is not an option for productive environment (but maybe sufficient for some test cases). Exposing of those extensions requires in case of intel CPUs kernel 3 or higher, i.e. it is available in Proxmox VE 4.x/5.x, but not as default in older versions.<br />
<br />
You will need to allocate plenty of cpu, ram and disk to those guest hypervisors.<br />
<br />
== Proxmox VE and nesting ==<br />
Proxmox VE can:<br />
* '''host a nested (guest) hypervisor'''<br />
By default, it does not expose hardware-assisted virtualization extensions to its VMs. Do not expect optimal performance for virtual machines on the guest hypervisor, unless you configure the VM's CPU as "host" and have nested hardware-assisted virtualization extensions enabled on the physical PVE host.<br />
<br />
{{Note|Microsoft Hyper-V as a nested Hypervisor works only on Intel CPUs.<br />
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/nested-virtualization}}<br />
<br />
* '''be hosted as a nested (guest) hypervisor'''<br />
The host hypervisor needs to expose the hardware-assisted virtualization extensions. Proxmox VE can use them to provide better performance to its guests. Otherwise, as in the PVE-inside-PVE case, any VM (KVM) needs to turn off the KVM hardware virtualization (see VM options).<br />
<br />
== Enable Nested Hardware-assisted Virtualization ==<br />
<br />
{{Note|VMs with nesting active (vmx/svm flag) cannot be live-migrated!}}<br />
<br />
To be done on the physical PVE host (or any other hypervisor).<br />
<br />
To have nested hardware-assisted virtualization, you have to:<br />
<br />
* use AMD cpu or very recent Intel one<br />
* use kernel >= 3.10 (is always the case in Proxmox VE 4.x)<br />
* enable nested support<br />
to check if is enabled do ("kvm_intel" for intel cpu, "kvm_amd" for AMD)<br />
root@proxmox:~# cat /sys/module/kvm_intel/parameters/nested <br />
N<br />
<br />
<br />
N means it's not, to enable ("kvm-intel" for intel):<br />
# echo "options kvm-intel nested=Y" > /etc/modprobe.d/kvm-intel.conf<br />
(or "kvm-amd" for AMD, note the 1 instead of Y):<br />
# echo "options kvm-amd nested=1" > /etc/modprobe.d/kvm-amd.conf<br />
and reboot or reload the kernel modul<br />
modprobe -r kvm_intel<br />
modprobe kvm_intel<br />
<br />
check again<br />
root@proxmox:~# cat /sys/module/kvm_intel/parameters/nested <br />
Y<br />
<br />
<br />
(pay attention where the dash "-" is used, and where it's underscore "_" instead)<br />
<br />
Then create a guest where you install e.g. Proxmox as nested Virtualization Environment.<br />
<br />
* set the CPU type to "host"<br />
<br />
* in case of AMD CPU: add also the following in the configuration file:<br />
<br />
args: -cpu host,+svm<br />
<br />
Once installed the guest OS, if GNU/Linux you can enter and verify that the hardware virtualization support is enabled by doing<br />
root@guest1# egrep '(vmx|svm)' --color=always /proc/cpuinfo<br />
<br />
== Example: PVE hosts a PVE guest hypervisor ==<br />
<br />
=== Set a cluster of self nested PVE ===<br />
In the physical host Proxmox you create 2 VM, and in each one install a new instance of Proxmox, so you can experiment with cluster concepts without the need of having multiple physical servers.<br />
* log into (web gui) your host pve (running on real hardware)<br />
=> PVE<br />
* create two or more vm guests (kvm) in your host PVE, each with enough ram/disk and install PVE from iso on the each guest vm (same network)<br />
=> PVE => VMPVE1 (guest PVE)<br />
=> PVE => VMPVE2 (guest PVE)<br />
...<br />
* log into (ssh/console) the first guest vm & create cluster CLUSTERNAME<br />
=> PVE => VMPVE1 (guest PVE) => #pvecm create CLUSTERNAME<br />
* log into each other guest vm & join cluster <CLUSTERNAME><br />
=> PVE => VMPVE2 (guest PVE) => #pvecm add <IP address of VM1><br />
* log into (web gui) any guest vm (guest pve) and manage the new (guest) cluster<br />
=> PVE => VMPVE1/2 (guest PVE) => #pvecm n<br />
* create vm or ct inside the guest pve (nodes of CLUSTERNAME)<br />
** if you did't enable hardware-assisted nested virtualization, you have to turn off KVM hardware virtualization (see vm options)<br />
** install only CLI based, small ct or vm for those guest (do not try anything with a GUI, don't even think of running Windows...)<br />
<br />
=> PVE => VMPVE1/2 (guest PVE) => VM/CT<br />
<br />
* install something on (eg) a vm (eg: a basic ubuntu server) from iso<br />
=> PVE => VMPVE2 (guest PVE) => VM (basic ubuntu server)<br />
<br />
=== vm/ct performance without hardware-assisted virtualization extensions ===<br />
if you can't setup hardware-assisted virtualization extensions for the guest, performance is far from optimal! Use only to practice or test!<br />
* ct (lxc) will be faster, of course, quite usable<br />
* vm (kvm) will be really slow, nearly unusable (you can expect 10x slower or more), since (as said above) they're running without KVM hardware virtualization<br />
<br />
but at least you can try or test "guest pve" features or setups:<br />
* you could create a small test cluster to practice with cluster concepts and operations<br />
* you could test a new pve version before upgrading<br />
* you could test setups conflicting with your production setup</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Web_Interface_Via_Nginx_Proxy&diff=10674Web Interface Via Nginx Proxy2020-03-16T15:04:39Z<p>A.antreich: /* Configuration */</p>
<hr />
<div>= Introduction =<br />
This allows you to access Proxmox VE via the port 443<br />
<br />
''Tested from Proxmox 3.4 - 6.1''<br />
<br />
'''Why do I need this?'''<br />
<br />
Sometimes there is a firewall restriction that blocks port 8006 and since we shouldn't touch the port config in proxmox we'll just use nginx as proxy to provide the web interface available on default https port 443. Now let's begin...<br />
<br />
= Configuration =<br />
* '''install nginx'''<br />
<pre>apt install nginx</pre><br />
<br />
* '''remove the default config file – not needed from PVE 4 (Jessie) onward'''<br />
<pre>rm /etc/nginx/conf.d/default</pre><br />
<br />
respectively <br />
<br />
<pre>rm /etc/nginx/sites-enabled/default</pre><br />
<br />
* '''create a new config file'''<br />
<pre>nano /etc/nginx/conf.d/proxmox.conf</pre><br />
<br />
'''Note:''' You can choose the configuration filename freely, but it must have a ''.conf'' ending.<br />
<br />
The following is an example config that works for the web interface and also the noVNC console:<br />
<br />
<pre><br />
upstream proxmox {<br />
server "FQDN HOSTNAME";<br />
}<br />
<br />
server {<br />
listen 80 default_server;<br />
rewrite ^(.*) https://$host$1 permanent;<br />
}<br />
<br />
server {<br />
listen 443;<br />
server_name _;<br />
ssl on;<br />
ssl_certificate /etc/pve/local/pve-ssl.pem;<br />
ssl_certificate_key /etc/pve/local/pve-ssl.key;<br />
proxy_redirect off;<br />
location / {<br />
proxy_http_version 1.1;<br />
proxy_set_header Upgrade $http_upgrade;<br />
proxy_set_header Connection "upgrade"; <br />
proxy_pass https://localhost:8006;<br />
proxy_buffering off;<br />
client_max_body_size 0;<br />
proxy_connect_timeout 3600s;<br />
proxy_read_timeout 3600s;<br />
proxy_send_timeout 3600s;<br />
send_timeout 3600s;<br />
}<br />
}<br />
</pre><br />
<br />
* '''Test and Apply new config'''<br />
<br />
<pre><br />
nginx -t # checks config syntax<br />
systemctl restart nginx<br />
</pre><br />
<br />
* '''ensure that nginx gets only started after the certificates are available'''<br />
<br />
As the certificates reside on /etc/pve which is provided by the pve-cluster.service<br />
we need to tell nginx.service to only start after that one.<br />
The easiest and cleanest way to do that is to add an Requires and After as systemd override snippet.<br />
<br />
This can be done with systemd edit UNIT which opens your $EDITOR:<br />
# systemctl edit nginx.service<br />
here add:<br />
<pre><br />
[Unit]<br />
Requires=pve-cluster.service<br />
After=pve-cluster.service<br />
</pre><br />
<br />
and save + exit.<br />
<br />
Enjoy the webinterface on HTTPS port 443!<br />
<br />
= See Also =<br />
<br />
NoVNC reverse Proxy with Apache https://forum.proxmox.com/threads/working-novnc-with-reverse-proxy-on-5-1.43644/<br />
<br />
[[Category:HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Web_Interface_Via_Nginx_Proxy&diff=10673Web Interface Via Nginx Proxy2020-03-16T15:02:45Z<p>A.antreich: /* Introduction */</p>
<hr />
<div>= Introduction =<br />
This allows you to access Proxmox VE via the port 443<br />
<br />
''Tested from Proxmox 3.4 - 6.1''<br />
<br />
'''Why do I need this?'''<br />
<br />
Sometimes there is a firewall restriction that blocks port 8006 and since we shouldn't touch the port config in proxmox we'll just use nginx as proxy to provide the web interface available on default https port 443. Now let's begin...<br />
<br />
= Configuration =<br />
* '''install nginx'''<br />
<pre>apt install nginx</pre><br />
<br />
* '''remove the default config file – not needed on PVE 4/5 (Stretch & Jessie!)'''<br />
<pre>rm /etc/nginx/conf.d/default</pre><br />
<br />
respectively <br />
<br />
<pre>rm /etc/nginx/sites-enabled/default</pre><br />
<br />
* '''create a new config file'''<br />
<pre>nano /etc/nginx/conf.d/proxmox.conf</pre><br />
<br />
'''Note:''' You can choose the configuration filename freely, but it must have a ''.conf'' ending.<br />
<br />
The following is an example config that works for the web interface and also the noVNC console:<br />
<br />
<pre><br />
upstream proxmox {<br />
server "FQDN HOSTNAME";<br />
}<br />
<br />
server {<br />
listen 80 default_server;<br />
rewrite ^(.*) https://$host$1 permanent;<br />
}<br />
<br />
server {<br />
listen 443;<br />
server_name _;<br />
ssl on;<br />
ssl_certificate /etc/pve/local/pve-ssl.pem;<br />
ssl_certificate_key /etc/pve/local/pve-ssl.key;<br />
proxy_redirect off;<br />
location / {<br />
proxy_http_version 1.1;<br />
proxy_set_header Upgrade $http_upgrade;<br />
proxy_set_header Connection "upgrade"; <br />
proxy_pass https://localhost:8006;<br />
proxy_buffering off;<br />
client_max_body_size 0;<br />
proxy_connect_timeout 3600s;<br />
proxy_read_timeout 3600s;<br />
proxy_send_timeout 3600s;<br />
send_timeout 3600s;<br />
}<br />
}<br />
</pre><br />
<br />
* '''Test and Apply new config'''<br />
<br />
<pre><br />
nginx -t # checks config syntax<br />
systemctl restart nginx<br />
</pre><br />
<br />
* '''ensure that nginx gets only started after the certificates are available'''<br />
<br />
As the certificates reside on /etc/pve which is provided by the pve-cluster.service<br />
we need to tell nginx.service to only start after that one.<br />
The easiest and cleanest way to do that is to add an Requires and After as systemd override snippet.<br />
<br />
This can be done with systemd edit UNIT which opens your $EDITOR:<br />
# systemctl edit nginx.service<br />
here add:<br />
<pre><br />
[Unit]<br />
Requires=pve-cluster.service<br />
After=pve-cluster.service<br />
</pre><br />
<br />
and save + exit.<br />
<br />
Enjoy the webinterface on HTTPS port 443!<br />
<br />
= See Also =<br />
<br />
NoVNC reverse Proxy with Apache https://forum.proxmox.com/threads/working-novnc-with-reverse-proxy-on-5-1.43644/<br />
<br />
[[Category:HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Luminous_to_Nautilus&diff=10562Ceph Luminous to Nautilus2019-12-02T13:48:22Z<p>A.antreich: </p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade from Ceph Luminous to Nautilus (14.2.0 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[http://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Luminous (12.2.12-pve1).<br />
<br />
The cluster must be healthy and working.<br />
<br />
== Note ==<br />
<br />
* After upgrading to Proxmox VE 6.x and before upgrading to Ceph Nautilus,<br />
:*Do not use the Proxmox VE 6.x tools for Ceph (pveceph), as they are not intended to work with Ceph Luminous.<br />
:*If it's absolutely necessary to change the Ceph cluster before upgrading to Nautilus, use the Ceph native tools instead.<br />
<br />
* During the upgrade from Luminous to Nautilus it will not be possible to create a new OSD using a Luminous ceph-osd daemon after the monitors have been upgraded to Nautilus. Avoid adding or replacing any OSDs while the upgrade is in progress.<br />
<br />
* Avoid creating any RADOS pools while the upgrade is in progress.<br />
* You can monitor the progress of your upgrade anytime with the ceph versions command. This will tell you which Ceph version(s) are running for each type of daemon.<br />
<br />
== Cluster Preparation ==<br />
If your cluster was originally installed with a version prior to Luminous, ensure that it has completed at least one full scrub of all PGs while running Luminous. Failure to do so will cause your monitor daemons to refuse to join the quorum on start, leaving them non-functional.<br />
<br />
If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, check the state of your cluster by running:<br />
<br />
ceph osd dump | grep ^flags<br />
<br />
In order to be able to proceed to Nautilus, your OSD map must include the flags<br />
* recovery_deletes flag<br />
* purged_snapdirs flag<br />
<br />
If your OSD map does not contain both these flags, you can simply wait for approximately 24-48 hours. In a standard cluster configuration this should be the ample time for all your placement groups to be scrubbed at least once. Then repeat the above process to recheck.<br />
<br />
In case that you have just completed an upgrade to Luminous and want to proceed to Nautilus in short order, you can force a scrub on all placement groups with a one-line shell command, like:<br />
<br />
ceph pg dump pgs_brief | cut -d " " -f 1 | xargs -n1 ceph pg scrub<br />
<br />
Consider that this forced scrub may possibly have a negative impact on the performance of your Ceph clients.<br />
<br />
=== Adapt /etc/pve/ceph.conf ===<br />
Since Nautilus, all daemons use the 'keyring' option for its keyring, so you have to adapt this.<br />
The easiest way is to move the global 'keyring' option into the 'client' section, and remove it everywhere else.<br />
Create the 'client' section if you don't have one.<br />
<br />
For example:<br />
<br />
From:<br />
[global]<br />
...<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
[osd]<br />
keyring = /var/lib/ceph/osd/ceph-$id/keyring<br />
<br />
To:<br />
[global]<br />
...<br />
[client]<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
<br />
== Preparation on each Ceph cluster node ==<br />
Change the current Ceph repositories from Luminous to Nautilus.<br />
<br />
sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-nautilus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab.<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Nautilus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Luminous binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the nautilus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 14 (nautilus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
'''Important Steps before restarting OSD'''<br />
<br />
If you have a cluster with IPv6 only, you need to set the following command in the global section of the ceph config<br />
<br />
ms_bind_ipv4 = false<br />
ms_bind_ipv6 = true<br />
<br />
Otherwise, each OSD trys to bind to an IPv4 in addition to the IPv6 and fails if it cannot find an IPv4 address in the given public/cluster networks.<br />
<br />
Next, restart all OSDs on all nodes<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
On each host, tell ceph-volume to adapt the OSDs created with ceph-disk using the following two commands:<br />
<br />
ceph-volume simple scan<br />
ceph-volume simple activate --all<br />
<br />
If you get a failure, your OSDs will not be recognized after a reboot.<br />
<br />
* One of such failures can be '''Required devices (block and data) not present for bluestore''' This may happen if you have filestore OSDs, due to a bug in Ceph tooling (see http://wordpress.hawkless.id.au/index.php/2019/05/10/ceph-nautilus-required-devices-block-and-data-not-present-for-bluestore/).<br />
** To fix it, edit the ''/etc/ceph/osd/{OSDID}-GUID.json'' files created for each filestore OSD and add a line (check syntax is correct JSON, each attrib has to end in ''',''' except the last one).<br />
“type”: “filestore”<br />
* Run again: ceph-volume simple activate --all<br />
<br />
To verify that the OSDs start up automatically, it's recommended that each OSD host is rebooted following the step above.<br />
<br />
Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.<br />
<br />
You will need to scan the main data partition for each ceph-disk OSD explicitly, if<br />
*the OSD isn’t currently running when the above scan command is run, <br />
*a ceph-disk-based OSD is moved to a new host, <br />
*the host OSD is reinstalled, <br />
*or the /etc/ceph/osd directory is lost. <br />
<br />
For example:<br />
<br />
ceph-volume simple scan /dev/sdb1<br />
The output will include the appopriate ceph-volume simple activate command to enable the OSD.<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Disallow pre-Nautilus OSDs and enable all new Nautilus-only functionality ==<br />
<br />
ceph osd require-osd-release nautilus<br />
<br />
== Unset 'noout' and check cluster status ==<br />
<br />
Unset the 'noout' flag.<br />
You can do this in the GUI or with this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if your Ceph cluster is healthy.<br />
ceph -s<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:<br />
<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
To enable the new v2 network protocol, issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
'''Updating /etc/pve/ceph.conf'''<br />
<br />
For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.<br />
<br />
It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:<br />
<br />
[global]<br />
...<br />
mon_host = 10.0.0.100 10.0.0.101 10.0.0.102<br />
...<br />
<br />
For details see: [http://docs.ceph.com/docs/nautilus/rados/configuration/msgr2/#msgr2-ceph-conf Messenger V2]<br />
<br />
== Legacy BlueStore stats reporting ==<br />
After the upgrade, '''ceph -s''' may show the below message.<br />
HEALTH_WARN Legacy BlueStore stats reporting detected on 6 OSD(s)<br />
In Ceph Nautilus 14.2.0 the pool utilization stats reported (ceph df) changed. This change needs an on-disk format change on the Bluestore OSDs.<br />
<br />
To get the new stats format, the OSDs need to be manually "repaired". This will change the on-disk format. Alternatively, the OSDs can be destroyed and recreated, but this will create more recovery traffic.<br />
systemctl stop ceph-osd@<N>.service <br />
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-<N>/<br />
systemctl start ceph-osd@<N>.service <br />
Once all OSDs are "repaired" the health warning will disappear.<br />
<br />
== Command-line Interface ==<br />
see https://ceph.com/rbd/new-in-nautilus-rbd-performance-monitoring/<br />
<br />
enable<br />
<pre><br />
ceph mgr module enable rbd_support<br />
</pre><br />
then these are avail<br />
rbd perf image iotop<br />
<br />
rbd perf image iostat<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Luminous_to_Nautilus&diff=10531Ceph Luminous to Nautilus2019-10-25T07:57:02Z<p>A.antreich: /* Adapt /etc/pve/ceph.conf */ use more specific keyring option</p>
<hr />
<div>== Introduction ==<br />
This article explains how to upgrade from Ceph Luminous to Nautilus (14.2.0 or higher) on Proxmox VE 6.x.<br />
<br />
For more information see<br />
[http://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous Release Notes]<br />
<br />
== Assumption ==<br />
We assume that all nodes are on the latest Proxmox VE 6.x version and Ceph is on version Luminous (12.2.12-pve1).<br />
<br />
The cluster must be healthy and working.<br />
<br />
== Note ==<br />
<br />
* After upgrading to Proxmox VE 6.x and before upgrading to Ceph Nautilus,<br />
:*Do not use the Proxmox VE 6.x tools for Ceph (pveceph), as they are not intended to work with Ceph Luminous.<br />
:*If it's absolutely necessary to change the Ceph cluster before upgrading to Nautilus, use the Ceph native tools instead.<br />
<br />
* During the upgrade from Luminous to Nautilus it will not be possible to create a new OSD using a Luminous ceph-osd daemon after the monitors have been upgraded to Nautilus. Avoid adding or replacing any OSDs while the upgrade is in progress.<br />
<br />
* Avoid creating any RADOS pools while the upgrade is in progress.<br />
* You can monitor the progress of your upgrade anytime with the ceph versions command. This will tell you which Ceph version(s) are running for each type of daemon.<br />
<br />
== Cluster Preparation ==<br />
If your cluster was originally installed with a version prior to Luminous, ensure that it has completed at least one full scrub of all PGs while running Luminous. Failure to do so will cause your monitor daemons to refuse to join the quorum on start, leaving them non-functional.<br />
<br />
If you are unsure whether or not your Luminous cluster has completed a full scrub of all PGs, check the state of your cluster by running:<br />
<br />
ceph osd dump | grep ^flags<br />
<br />
In order to be able to proceed to Nautilus, your OSD map must include the flags<br />
* recovery_deletes flag<br />
* purged_snapdirs flag<br />
<br />
If your OSD map does not contain both these flags, you can simply wait for approximately 24-48 hours. In a standard cluster configuration this should be the ample time for all your placement groups to be scrubbed at least once. Then repeat the above process to recheck.<br />
<br />
In case that you have just completed an upgrade to Luminous and want to proceed to Nautilus in short order, you can force a scrub on all placement groups with a one-line shell command, like:<br />
<br />
ceph pg dump pgs_brief | cut -d " " -f 1 | xargs -n1 ceph pg scrub<br />
<br />
Consider that this forced scrub may possibly have a negative impact on the performance of your Ceph clients.<br />
<br />
=== Adapt /etc/pve/ceph.conf ===<br />
Since Nautilus, all daemons use the 'keyring' option for its keyring, so you have to adapt this.<br />
The easiest way is to move the global 'keyring' option into the 'client' section, and remove it everywhere else.<br />
Create the 'client' section if you don't have one.<br />
<br />
For example:<br />
<br />
From:<br />
[global]<br />
...<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
[osd]<br />
keyring = /var/lib/ceph/osd/ceph-$id/keyring<br />
<br />
To:<br />
[global]<br />
...<br />
[client]<br />
keyring = /etc/pve/priv/$cluster.$name.keyring<br />
<br />
== Preparation on each Ceph cluster node ==<br />
Change the current Ceph repositories from Luminous to Nautilus.<br />
<br />
sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list<br />
<br />
Your /etc/apt/sources.list.d/ceph.list should look like this<br />
<br />
deb http://download.proxmox.com/debian/ceph-nautilus buster main<br />
<br />
== Set the 'noout' flag ==<br />
Set the noout flag for the duration of the upgrade (optional, but recommended):<br />
<br />
ceph osd set noout<br />
<br />
Or via the GUI in the OSD tab.<br />
<br />
== Upgrade on each Ceph cluster node ==<br />
Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Nautilus.<br />
apt update<br />
apt dist-upgrade<br />
<br />
After the update you still run the old Luminous binaries.<br />
<br />
== Restart the monitor daemon ==<br />
<br />
After upgrading all cluster nodes, you have to restart the monitor on each node where a monitor runs.<br />
<br />
systemctl restart ceph-mon.target<br />
<br />
Once all monitors are up, verify that the monitor upgrade is complete. Look for the nautilus string in the mon map. The command<br />
<br />
ceph mon dump | grep min_mon_release<br />
<br />
should report<br />
<br />
min_mon_release 14 (nautilus)<br />
<br />
If it does not, this implies that one or more monitors haven’t been upgraded and restarted, and/or that the quorum doesn't include all monitors.<br />
<br />
== Restart the manager daemons on all nodes ==<br />
<br />
Then restart all managers on all nodes<br />
<br />
systemctl restart ceph-mgr.target<br />
<br />
Verify that the ceph-mgr daemons are running by checking ceph -s<br />
<br />
ceph -s<br />
<br />
...<br />
services:<br />
mon: 3 daemons, quorum foo,bar,baz<br />
mgr: foo(active), standbys: bar, baz<br />
...<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
'''Important Steps before restarting OSD'''<br />
<br />
If you have a cluster with IPv6 only, you need to set the following command in the global section of the ceph config<br />
<br />
ms_bind_ipv4 = false<br />
ms_bind_ipv6 = true<br />
<br />
Otherwise, each OSD trys to bind to an IPv4 in addition to the IPv6 and fails if it cannot find an IPv4 address in the given public/cluster networks.<br />
<br />
Next, restart all OSDs on all nodes<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
On each host, tell ceph-volume to adapt the OSDs created with ceph-disk using the following two commands:<br />
<br />
ceph-volume simple scan<br />
ceph-volume simple activate --all<br />
<br />
If you get a failure, your OSDs will not be recognized after a reboot.<br />
<br />
* One of such failures can be '''Required devices (block and data) not present for bluestore''' This may happen if you have filestore OSDs, due to a bug in Ceph tooling (see http://wordpress.hawkless.id.au/index.php/2019/05/10/ceph-nautilus-required-devices-block-and-data-not-present-for-bluestore/).<br />
** To fix it, edit the ''/etc/ceph/osd/{OSDID}-GUID.json'' files created for each filestore OSD and add a line (check syntax is correct JSON, each attrib has to end in ''',''' except the last one).<br />
“type”: “filestore”<br />
* Run again: ceph-volume simple activate --all<br />
<br />
To verify that the OSDs start up automatically, it's recommended that each OSD host is rebooted following the step above.<br />
<br />
Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.<br />
<br />
You will need to scan the main data partition for each ceph-disk OSD explicitly, if<br />
*the OSD isn’t currently running when the above scan command is run, <br />
*a ceph-disk-based OSD is moved to a new host, <br />
*the host OSD is reinstalled, <br />
*or the /etc/ceph/osd directory is lost. <br />
<br />
For example:<br />
<br />
ceph-volume simple scan /dev/sdb1<br />
The output will include the appopriate ceph-volume simple activate command to enable the OSD.<br />
<br />
== Upgrade all CephFS MDS daemons ==<br />
<br />
For each CephFS file system,<br />
<br />
# Reduce the number of ranks to 1 (if you plan to restore it later, first take notes of the original number of MDS daemons).:<br />
#:<pre>ceph status&#10;ceph fs set <fs_name> max_mds 1</pre><br />
# Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:<br />
#:<pre>ceph status</pre><br />
# Take all standby MDS daemons offline on the appropriate hosts with:<br />
#:<pre>systemctl stop ceph-mds.target</pre><br />
# Confirm that only one MDS is online and is on rank 0 for your FS:<br />
#:<pre>ceph status</pre><br />
# Upgrade the last remaining MDS daemon by restarting the daemon:<br />
#:<pre>systemctl restart ceph-mds.target</pre><br />
# Restart all standby MDS daemons that were taken offline:<br />
#:<pre>systemctl start ceph-mds.target</pre><br />
# Restore the original value of max_mds for the volume:<br />
#:<pre>ceph fs set <fs_name> max_mds <original_max_mds></pre><br />
<br />
== Disallow pre-Nautilus OSDs and enable all new Nautilus-only functionality ==<br />
<br />
ceph osd require-osd-release nautilus<br />
<br />
== Unset 'noout' and check cluster status ==<br />
<br />
Unset the 'noout' flag.<br />
You can do this in the GUI or with this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if your Ceph cluster is healthy.<br />
ceph -s<br />
<br />
== Upgrade Tunables ==<br />
If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:<br />
<br />
ceph config set mon mon_crush_min_required_version firefly<br />
<br />
If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any ‘straw’ buckets, this will result in a modest amount of data movement, but generally nothing too severe.:<br />
<br />
ceph osd getcrushmap -o backup-crushmap<br />
ceph osd crush set-all-straw-buckets-to-straw2<br />
<br />
If there are problems, you can easily revert with:<br />
<br />
ceph osd setcrushmap -i backup-crushmap<br />
<br />
Moving to ‘straw2’ buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.<br />
<br />
== Enable msgrv2 protocol and update Ceph configuration ==<br />
<br />
To enable the new v2 network protocol, issue the following command:<br />
<br />
ceph mon enable-msgr2<br />
<br />
This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run<br />
<br />
ceph mon dump<br />
<br />
and verify that each monitor has both a v2: and v1: address listed.<br />
<br />
'''Updating /etc/pve/ceph.conf'''<br />
<br />
For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.<br />
<br />
It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:<br />
<br />
[global]<br />
...<br />
mon_host = 10.0.0.100 10.0.0.101 10.0.0.102<br />
...<br />
<br />
For details see: [http://docs.ceph.com/docs/nautilus/rados/configuration/msgr2/#msgr2-ceph-conf Messenger V2]<br />
<br />
== Command-line Interface ==<br />
see https://ceph.com/rbd/new-in-nautilus-rbd-performance-monitoring/<br />
<br />
enable<br />
<pre><br />
ceph mgr module enable rbd_support<br />
</pre><br />
then these are avail<br />
rbd perf image iotop<br />
<br />
rbd perf image iostat<br />
<br />
[[Category: HOWTO]][[Category: Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_RBD_Mirroring&diff=10360Ceph RBD Mirroring2019-05-24T13:28:00Z<p>A.antreich: Add a note why the ceph user is replaced with root in the unit file</p>
<hr />
<div>Configuring rbd-mirror for Off-Site-Backup (one-way-mirroring)<br />
<br />
== Requirements ==<br />
* Two Ceph clusters<br />
* One or more pools of the same name in both clusters<br />
* Installed rbd-mirror on the backup cluster ONLY (apt install rbd-mirror)<br />
<br />
This guide assumes you have two clusters, one called master where your images are used in production and a backup cluster where you want to create your disaster recovery backup. The general idea is, that one or more rbd-mirror-daemons on the backup cluster are pulling changes from the master cluster. This should be appropriate to maintain a crash consistency copy of the original image. This approach will not help you when you want to failback to the master cluster, for this you will need two-way-mirroring or at least set it up at the time you want to failback.<br />
<br />
First of all only images with the "exclusive-lock" and "journaling" feature will be mirrored, because "journaling" depends on "exclusive-lock" you will need to enable both features. To check whether or not these features are already enabled on an image do the following:<br />
<br />
<pre><br />
# rbd info <your_pool_name>/<your_vm_disk_image><br />
</pre><br />
<br />
e.g.<br />
<br />
<pre><br />
# rbd info data/vm-100-disk-0<br />
</pre><br />
<br />
To enable a feature:<br />
<br />
<pre><br />
# rbd feature enable data/vm-100-disk-0 journaling<br />
</pre><br />
<br />
You need to do this on every image you want to mirror.<br />
<br />
<br />
The next step is to set the mirroring mode on each pool you want to mirror.<br />
You can choose between pool mode or image mode, this has to be done on both clusters on the corresponding pools e.g. data/data.<br />
<pre><br />
# rbd mirror pool enable <your_pool_name> <mode><br />
</pre><br />
<br />
e.g<br />
<br />
<pre><br />
# rbd mirror pool enable data pool<br />
</pre><br />
<br />
<br />
On one of the monitor hosts of the master cluster create a user:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/master.client.rbd-mirror.master.keyring<br />
</pre><br />
<br />
'''Note:'''<br />
You can restrict this to a specific pool if you write 'profile rbd pool=data'<br />
<br />
<br />
Copy your ceph.conf file from your master cluster to your backup cluster "/etc/ceph/" directory under the name of master.conf (be careful to not overwrite your backup cluster's ceph.conf file).<br />
Copy the previously generated keyring-file (master.client.rbd-mirror.master.keyring) to your backup cluster "/etc/pve/priv/" directory.<br />
This step is necessary as it is not possible to mirror two clusters with the same name, therefore we use a different name (master) which is only represented by the different config filename and the corresponding keyring file.<br />
<br />
On a node of the backup cluster create a unique client id to be used for each rbd-mirror-daemon instance:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring<br />
</pre><br />
<br />
You should now be able to start the daemon (as root):<br />
<br />
<pre><br />
# systemctl enable ceph-rbd-mirror.target<br />
# cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service<br />
# sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service<br />
# systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service<br />
# systemctl start ceph-rbd-mirror@rbd-mirror.backup.service<br />
</pre><br />
The replacement of the ceph user in the unit file is only necessary if you put the keyring file under ''/etc/pve/priv/'' (to have the file available cluster-wide), as the user ceph can't access that directory. Ceph tools by default search in ''/etc/ceph/'' for files.<br />
<br />
<br />
Add the master cluster as a peer to the backup cluster to start:<br />
<br />
<pre><br />
# rbd mirror pool peer add <pool_name> <master_client_id>@<name_of_master_cluster><br />
</pre><br />
e.g.<br />
<pre><br />
# rbd mirror pool peer add data client.rbd-mirror.master@master<br />
</pre><br />
<br />
Verify that the peering succeeded by the following command:<br />
<br />
<pre><br />
# rbd mirror pool info <pool_name><br />
</pre><br />
e.g<br />
<pre><br />
# rbd mirror pool info data<br />
</pre><br />
<br />
This should print the peer and the mirror mode if all went well, the uuid which is printed is necessary if you want to remove the peer anytime in the future.<br />
<br />
You should now see each image in your backup cluster which is marked with the journaling feature in the master cluster. You can verify the current mirror state by the following command: <br />
<pre><br />
# rbd mirror pool status data --verbose<br />
</pre><br />
<br />
If you want to switch to the backup cluster, you need to promote the backup images to primary images. This should only be done when your master cluster crashed or you took the necessary steps on the master cluster before switching e.g. demoting the images on the master cluster.<br />
<br />
Please also check out Ceph's rbd-mirror documentation.<br />
http://docs.ceph.com/docs/luminous/rbd/rbd-mirroring/<br />
<br />
[[Category:HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_RBD_Mirroring&diff=10359Ceph RBD Mirroring2019-05-20T14:26:06Z<p>A.antreich: </p>
<hr />
<div>Configuring rbd-mirror for Off-Site-Backup (one-way-mirroring)<br />
<br />
== Requirements ==<br />
* Two Ceph clusters<br />
* One or more pools of the same name in both clusters<br />
* Installed rbd-mirror on the backup cluster ONLY (apt install rbd-mirror)<br />
<br />
This guide assumes you have two clusters, one called master where your images are used in production and a backup cluster where you want to create your disaster recovery backup. The general idea is, that one or more rbd-mirror-daemons on the backup cluster are pulling changes from the master cluster. This should be appropriate to maintain a crash consistency copy of the original image. This approach will not help you when you want to failback to the master cluster, for this you will need two-way-mirroring or at least set it up at the time you want to failback.<br />
<br />
First of all only images with the "exclusive-lock" and "journaling" feature will be mirrored, because "journaling" depends on "exclusive-lock" you will need to enable both features. To check whether or not these features are already enabled on an image do the following:<br />
<br />
<pre><br />
# rbd info <your_pool_name>/<your_vm_disk_image><br />
</pre><br />
<br />
e.g.<br />
<br />
<pre><br />
# rbd info data/vm-100-disk-0<br />
</pre><br />
<br />
To enable a feature:<br />
<br />
<pre><br />
# rbd feature enable data/vm-100-disk-0 journaling<br />
</pre><br />
<br />
You need to do this on every image you want to mirror.<br />
<br />
<br />
The next step is to set the mirroring mode on each pool you want to mirror.<br />
You can choose between pool mode or image mode, this has to be done on both clusters on the corresponding pools e.g. data/data.<br />
<pre><br />
# rbd mirror pool enable <your_pool_name> <mode><br />
</pre><br />
<br />
e.g<br />
<br />
<pre><br />
# rbd mirror pool enable data pool<br />
</pre><br />
<br />
<br />
On one of the monitor hosts of the master cluster create a user:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.master mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/master.client.rbd-mirror.master.keyring<br />
</pre><br />
<br />
'''Note:'''<br />
You can restrict this to a specific pool if you write 'profile rbd pool=data'<br />
<br />
<br />
Copy your ceph.conf file from your master cluster to your backup cluster "/etc/ceph/" directory under the name of master.conf (be careful to not overwrite your backup cluster's ceph.conf file).<br />
Copy the previously generated keyring-file (master.client.rbd-mirror.master.keyring) to your backup cluster "/etc/pve/priv/" directory.<br />
This step is necessary as it is not possible to mirror two clusters with the same name, therefore we use a different name (master) which is only represented by the different config filename and the corresponding keyring file.<br />
<br />
On a node of the backup cluster create a unique client id to be used for each rbd-mirror-daemon instance:<br />
<pre><br />
# ceph auth get-or-create client.rbd-mirror.backup mon 'profile rbd' osd 'profile rbd' -o /etc/pve/priv/ceph.client.rbd-mirror.backup.keyring<br />
</pre><br />
<br />
You should now be able to start the daemon (as root):<br />
<br />
<pre><br />
# systemctl enable ceph-rbd-mirror.target<br />
# cp /lib/systemd/system/ceph-rbd-mirror@.service /etc/systemd/system/ceph-rbd-mirror@.service<br />
# sed -i -e 's/setuser ceph.*/setuser root --setgroup root/' /etc/systemd/system/ceph-rbd-mirror@.service<br />
# systemctl enable ceph-rbd-mirror@rbd-mirror.backup.service<br />
# systemctl start ceph-rbd-mirror@rbd-mirror.backup.service<br />
</pre><br />
<br />
Add the master cluster as a peer to the backup cluster to start:<br />
<br />
<pre><br />
# rbd mirror pool peer add <pool_name> <master_client_id>@<name_of_master_cluster><br />
</pre><br />
e.g.<br />
<pre><br />
# rbd mirror pool peer add data client.rbd-mirror.master@master<br />
</pre><br />
<br />
Verify that the peering succeeded by the following command:<br />
<br />
<pre><br />
# rbd mirror pool info <pool_name><br />
</pre><br />
e.g<br />
<pre><br />
# rbd mirror pool info data<br />
</pre><br />
<br />
This should print the peer and the mirror mode if all went well, the uuid which is printed is necessary if you want to remove the peer anytime in the future.<br />
<br />
You should now see each image in your backup cluster which is marked with the journaling feature in the master cluster. You can verify the current mirror state by the following command: <br />
<pre><br />
# rbd mirror pool status data --verbose<br />
</pre><br />
<br />
If you want to switch to the backup cluster, you need to promote the backup images to primary images. This should only be done when your master cluster crashed or you took the necessary steps on the master cluster before switching e.g. demoting the images on the master cluster.<br />
<br />
Please also check out Ceph's rbd-mirror documentation.<br />
http://docs.ceph.com/docs/luminous/rbd/rbd-mirroring/<br />
<br />
[[Category:HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Upgrade_from_4.x_to_5.0&diff=10265Upgrade from 4.x to 5.02019-02-14T07:23:48Z<p>A.antreich: added color for better visual</p>
<hr />
<div><br />
== Introduction ==<br />
<br />
Proxmox VE 5.x introduces major new features, therefore the upgrade must be carefully planned and tested. Depending on your existing configuration, several manual steps are required, including some downtime. NEVER start the upgrade process without a valid backup and without testing the same in a test lab setup.<br />
<br />
If you run a customized installation and/or you installed additional packages, for example for sheepdog, or any other third party packages, you need to make sure that you also upgrade these package to Debian Stretch. <br />
<br />
Generally speaking there are two possibilities to move from 4.x to 5.x<br />
<br />
*New installation on new hardware (and restore VM´s from backup)<br />
*In-place upgrade via apt, step by step <br />
<br />
In both cases you'd better empty the browser's cache after upgrade and reload the GUI page or there is the possibility that you see a lot of glitches.<br />
<br />
If you run a PVE 4 cluster it's [https://forum.proxmox.com/threads/4-4-and-5-x-version-in-the-same-cluster.37170/#post-182668 tested and supported] to add a PVE 5 node and migrate your guests to the new host.<br />
<br />
=== Caveats to know before you start ===<br />
<br />
* <span style="color: red">''' if using ceph'''</span>, upgrade your ceph cluster to the Luminous release <span style="color: red">'''before you upgrade'''</span>, following the article [[Ceph Jewel to Luminous]].<br />
<br />
== New installation ==<br />
<br />
* Backup all VMs and containers to external media (see [[Backup and Restore]])<br />
* Backup all files in /etc You will need various files in /etc/pve, as well as /etc/passwd, /etc/network/interfaces, /etc/resolv.conf and others depending on what has been configured from the defaults.<br />
* Install Proxmox VE from ISO (this will wipe all data on the existing host)<br />
* Rebuild the cluster if you had any<br />
* Restore the file /etc/pve/storage.cfg (this will re-map and make available any external media you used for backup) <br />
* Restore firewall configs /etc/pve/firewall/ and /etc/pve/nodes/<node>/host.fw (if relevant)<br />
* Restore full VMs from Backups (see [[Backup and Restore]])<br />
<br />
If you feel confortable with the command line, and all your VMs/CTs are one shared storage you can also follow<br />
the procedure [[Bypassing backup and restore when upgrading]]<br />
<br />
<br />
== In-place upgrade ==<br />
<br />
In-place upgrades are done with apt-get, so make sure that you are familiar with apt before you start here.<br />
<br />
'''Tip''': ''You can perform a test upgrade and a standalone server first. Install the Proxmox VE 4.4 ISO on testing hardware, then upgrade this installation to the latest minor version of Proxmox VE 4.4 (see [[Package repositories]]), copy/create relevant configurations to the test machine to replicate your production setup as closely as possible. Then you can start the upgrade.<br />
You can even install Proxmox VE 4.4 in a VM and test the upgrade in this environment.''<br />
<br />
=== Preconditions ===<br />
<br />
* upgraded to latest V 4.4<br />
* reliable access to all configured storages<br />
* healthy cluster<br />
* no VM or CT running<br />
* valid backup of all VM (needed if something goes wrong)<br />
* Correct repository configuration<br />
* at least 1GB free disk space at root mount point<br />
* ensure your /boot partition, if any, has enough space for a new kernel (min 60MB) - e.g., by removing old unused kernels (see pveversion -v)<br />
* if using Ceph, you should be already running the Ceph Luminous version, but see the caveat above<br />
<br />
=== Actions Step by Step ===<br />
<br />
All has to be done on each Proxmox VE node's command line (via console or ssh; preferably via console in order to exclude interrupted ssh connections). Again, make sure that you have a valid backup of all CT and VM before you start.<br />
<br />
==== Add the PVE repositories to you installation ====<br />
<br />
First make sure that your actual installation has the latest package of the Proxmox VE 4.4 release:<br />
<br />
apt-get update && apt-get dist-upgrade<br />
<br />
Update the Debian repository entry to stretch.<br />
<br />
sed -i 's/jessie/stretch/g' /etc/apt/sources.list<br />
<br />
Update the Proxmox VE repository entry to stretch.<br />
<br />
sed -i 's/jessie/stretch/g' /etc/apt/sources.list.d/pve-enterprise.list<br />
<br />
More information about [[Package_Repositories]]<br />
<br />
''' Replace ceph.com repositories with proxmox.com ceph repositories '''<br />
This step is only necessary if you have a ceph cluster on your PVE installation.<br />
<br />
echo "deb http://download.proxmox.com/debian/ceph-luminous stretch main" > /etc/apt/sources.list.d/ceph.list<br />
<br />
'''If there is a backports line then remove it.'''<br />
Currently the upgrade has not been tested when packages from the backports repository are installed.<br />
<br />
Update the repositories data:<br />
<br />
apt-get update<br />
<br />
==== Upgrade the basic system to Debian Stretch and PVE 5.0 ====<br />
<br />
This action will consume some time - depending on the systems performance, this can take up to 60 min or even more. If you run on SSD, the dist-upgrade can be finished in 5 minutes.<br />
<br />
Start with this step to get the initial set of upgraded packages.<br />
<br />
apt-get dist-upgrade<br />
<br />
During either of the above, you may be asked to approve of some new packages replacing configuration files. Do with them as you see fit, but they are not relevant to the Proxmox upgrade.<br />
<br />
Reboot the system in order to use the new PVE kernel<br />
<br />
=== Troubleshooting ===<br />
<br />
* Failing upgrade to "stretch"<br />
<br />
Make the sure that the repository configuration for stretch is correct.<br />
<br />
If there was a network failure and the upgrade has been made partially try to repair the situation with <br />
<br />
apt-get -fy install<br />
<br />
* Unable to boot due to grub failure<br />
<br />
See [[Recover_From_Grub_Failure]]<br />
<br />
<br />
<br />
=== Breaking Changes in 5.0 ===<br />
==== Configuration defaults ====<br />
===== Default display switched from 'cirrus' to 'std' =====<br />
<br />
The default display is now 'std' (Standard VGA card with Bochs VBE extensions), changed from the 'cirrus' type.<br />
Cirrus has security bugs and 'std' is the default since qemu 2.2<br />
<br />
To still be able to simply live migrate VMs to another PVE 4, or an already upgraded PVE 5 host - without off time, ensure that your Proxmox VE 4 Node is '''up to date''', i.e. you ran an:<br />
apt update<br />
apt full-upgrade<br />
<br />
cycle, with valid Debian Jessie and Proxmox VE 4 repositories configured!<br />
<br />
If you are using older package versions, for example qemu-server older than version 4.0-111, you will run into problems!<br />
<br />
=== External links ===<br />
<br />
*[https://www.debian.org/releases/stretch/amd64/release-notes/ Release Notes for Debian 9.0 (stretch), 64-bit PC]<br />
<br />
[[Category: HOWTO]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=IO_Scheduler&diff=10245IO Scheduler2019-01-11T07:38:33Z<p>A.antreich: changed default scheduling sentence</p>
<hr />
<div>== Introduction ==<br />
<br />
The Linux kernel, the core of the operating system, is responsible for controlling disk access by using kernel IO scheduling. <br />
<br />
This article explains how-to change the IO scheduler without recompiling the kernel and without restart.<br />
<br />
== Check the currently used IO scheduler ==<br />
<br />
cat /sys/block/sda/queue/scheduler<br />
<br />
noop anticipatory [deadline] cfq<br />
<br />
For example the scheduler '''deadline''' delivers best performance on hardware raid and SAN environments. While '''noop''' delivers better performance for SSDs.<br />
<br />
== Switching IO Schedulers on runtime ==<br />
<br />
Set the scheduler for /dev/sda to Deadline: <br />
<br />
echo deadline > /sys/block/sda/queue/scheduler<br />
<br />
Set the scheduler for /dev/sda to CFQ: <br />
<br />
echo cfq > /sys/block/sda/queue/scheduler<br />
<br />
== Set IO Schedulers permanently ==<br />
<br />
In order to choose a new default scheduler you need to add the following into your /etc/default/grub: <br />
<br />
nano /etc/default/grub<br />
<br />
GRUB_CMDLINE_LINUX_DEFAULT="... elevator=deadline"<br />
<br />
or: <br />
<br />
GRUB_CMDLINE_LINUX_DEFAULT="... elevator=cfq"<br />
<br />
After you change /etc/default/grub you need to run update-grub to apply changes:<br />
<br />
update-grub<br />
<br />
== Links ==<br />
*http://en.wikipedia.org/wiki/Deadline_scheduler<br />
*http://en.wikipedia.org/wiki/CFQ<br />
*https://cromwell-intl.com/open-source/performance-tuning/disks.html<br />
<br />
[[Category: HOWTO]] [[Category:System Administration]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Raspberry_Pi_as_third_node&diff=10228Raspberry Pi as third node2018-11-30T08:45:10Z<p>A.antreich: </p>
<hr />
<div>[[Category:HOWTO]]<br />
<span style="color: red;>'''This is only suited for testing or homelab use. Never use it in a production environment!'''</span><br />
<br />
<br />
<br />
'''NOTE''': Ideally you install the Stretch version of Raspbian now, then you won't need to enable any backport repository. Further please take a look at corosync qdevice: https://www.mankier.com/8/corosync-qdevice it is much less resource hungry and less error prone, as with it the Raspberry Pi acts as quorum tie only, without running (parts of) Proxmox VE cluster stack.<br />
<br />
This short wiki will document how to prepare and configure a Rasberry Pi to use as third node (widness) in a Proxmox cluster. This howto has been tested on a Rasberry Pi v3 but should also work on any other Rasberry Pi version where Raspbian is available. Raspbian version is Jessie.<br />
<br />
#Login as root on your Pi<br />
#Install Debian Jessie (Standard system utilities and SSH server)<br />
#echo "deb http://ftp.debian.org/debian jessie-backports main contrib" > /etc/apt/sources.list.d/jessie-backports.list<br />
#gpg --keyserver pgpkeys.mit.edu --recv-key 7638D0442B90D010 && gpg -a --export 7638D0442B90D010 | apt-key add -<br />
#gpg --keyserver pgpkeys.mit.edu --recv-key 8B48AD6246925553 && gpg -a --export 8B48AD6246925553 | apt-key add -<br />
#apt-get update<br />
#apt-get -t jessie-backports install corosync<br />
#sed -i 's/without-password/yes/' /etc/ssh/sshd_config && systemctl restart ssh<br />
#scp <ip of pve node>:/etc/corosync/* /etc/corosync<br />
#add new node under nodelist in /etc/corosync.conf (copy one of the current and adjust)<br />
#:<pre><br />
#::for NODE in <ip of pve node 1> <ip of pve node 2>; do<br />
#:::scp /etc/corosync/corosync.conf $NODE:/etc/corosync<br />
#::done</pre><br />
#ssh <ip of pve node 1> systemctl restart corosync<br />
#ssh <ip of pve node 2> systemctl restart corosync<br />
#systemctl start corosync<br />
#run corosync-quorumtool to check all three nodes are registret as online and that there is quorum: <br />
#:<pre><br />
#::corosync-quorumtool | grep Quorate:<br />
#::Quorate: Yes</pre><br />
<br />
'''WARNING'''<br />
*pve-manager is missing so don't add to /etc/pve/corosync.conf.<br />
*If new node is added pve-manager will overwrite /etc/corosync/corosync.conf<br />
*I give no guaranty so everything above is at your own risk.</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Server&diff=10200Ceph Server2018-09-28T15:58:21Z<p>A.antreich: Removed sections that either are old or in our reference documentation</p>
<hr />
<div>The contents of this article can be found in our documentation.<br />
https://pve.proxmox.com/pve-docs/chapter-pveceph.html<br />
<br />
== Recommended hardware ==<br />
Take a look at our Proxmox VE Ceph Benchmark 2018/02 for possible hardware decisions.<br />
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark<br />
<br />
== Further readings about Ceph ==<br />
<br />
Ceph comes with plenty of documentation [http://ceph.com/docs/master/ here]. Even better, the dissertation from the creator of Ceph - Sage A. Weil - is also [http://ceph.com/papers/weil-thesis.pdf available]. By reading this you can get a deep insight how it works. <br />
<br />
*http://ceph.com/ <br />
<br />
*https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/, Journal SSD Recommendations<br />
<br />
== Video Tutorials ==<br />
*[https://www.proxmox.com/en/training/video-tutorials/item/install-ceph-server-on-proxmox-ve Install Ceph Server on Proxmox VE]<br />
<br />
===Proxmox YouTube channel===<br />
You can subscribe to our [http://www.youtube.com/ProxmoxVE Proxmox VE Channel] on YouTube to get updates about new videos.<br />
<br />
== Ceph Misc ==<br />
<br />
=== Upgrading existing Ceph Server from Hammer to Jewel ===<br />
See [[Ceph Hammer to Jewel]]<br />
<br />
=== Upgrading existing Ceph Server from Jewel to Luminous ===<br />
See [[Ceph Jewel to Luminous]]<br />
<br />
===using a disk that was part of a zfs pool ===<br />
as of now <br />
ceph-disk zap /dev/sdX<br />
is needed.<br />
<br />
else it does not show up on pve > ceph > osd > Create OSD<br />
<br />
=== restore lxc from zfs to ceph ===<br />
if lxc is on zfs with compression the actual disk usage can be far greater then expected. <br />
see https://forum.proxmox.com/threads/lxc-restore-fail-to-ceph.32419/#post-161287<br />
<br />
One way to know actual disk usage:<br />
: restore backup to a ext4 directory and run du -sh , then do restore manually specifying target disk size. <br />
=== scsi setting ===<br />
make sure that you use virtio-scsi controller (not LSI), see VM options. I remember some panic when using LSI recently but I did not debug it further as modern OS should use virtio-scsi anyways. https://forum.proxmox.com/threads/restarted-a-node-some-kvms-on-other-nodes-panic.32806<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Jewel_to_Luminous&diff=10195Ceph Jewel to Luminous2018-09-21T10:26:28Z<p>A.antreich: /* Cluster Preparation */ bold the important flag</p>
<hr />
<div>== Introduction ==<br />
This HOWTO explains the upgrade from Ceph Jewel to Luminous (12.2.0 or higher) on Proxmox VE 4.x in preparation for upgrading to PVE 5.x.<br />
<br />
The latest Ceph version supported in pveceph in PVE 4.x is Ceph Jewel (10.2.x). An upgrade to Ceph Luminous (12.2.x) is only possible temporarily as first step of upgrading to PVE 5.x.<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/release-notes/#v12-2-0-luminous Release Notes]<br />
<br />
== Assumption ==<br />
In this HOWTO we assume that all nodes are on the very latest Proxmox VE 4.4 version and Ceph is on Version Jewel (10.2.9 or higher).<br />
<br />
The Cluster must be healthy and working.<br />
<br />
== Cluster Preparation ==<br />
On a cluster member you have to set sortbitwise.<br />
'''This is very important, if this flag is not set you can lose all your data.'''<br />
<br />
ceph osd set sortbitwise<br />
<br />
To avoid re-balance during the upgrade process set noout.<br />
<br />
ceph osd set noout<br />
<br />
Since Luminous you have to allow explicit to delete a pool.<br />
Edit /etc/pve/ceph.conf with your preferred editor and add this line in section [global] <br />
mon allow pool delete = true<br />
<br />
== Preparation on each ceph cluster node ==<br />
Change the current Ceph repositories from Jewel to Luminous.<br />
<br />
sed -i 's/jewel/luminous/' /etc/apt/sources.list.d/ceph.list<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/install/get-packages/ Ceph Packages]<br />
<br />
<br />
'''Note: The ceph luminous version to install should be lower then the released version on the ceph repository from Proxmox.'''<br />
<br />
In some cases the released packages on the ceph upstream repository are newer then on the ceph repository from Proxmox. This will prevent the use of the ceph repository from Proxmox on upgrading to PVE 5.x. A downgrade of the packages or waiting till the versions match would be necessary.<br />
<br />
You can find the ceph luminous packages from Proxmox here: http://download.proxmox.com/debian/ceph-luminous/dists/stretch/main/binary-amd64/<br />
(upstream) download.ceph.com -> 12.2.7<br />
(Proxmox) download.proxmox.com -> 12.2.5<br />
In this example the upstream version is newer then the version from Proxmox. So the lower version (eg. 12.2.5) needs to be installed explicitly.<br />
apt install ceph=12.2.5<br />
To install a version with apt explicitly you specify the version directly with the package.<br />
<br />
== Upgrade on each ceph cluster node ==<br />
Upgrade all your nodes with the following commands. <br />
apt-get update && apt-get dist-upgrade<br />
<br />
It will upgrade Ceph on your node to Luminous.<br />
After the update all services are still running using the old jewel binaries.<br />
<br />
== Restart the Monitor daemons ==<br />
<br />
After all cluster nodes are upgraded you have to restart the monitor on each node were a monitor is configured.<br />
<br />
systemctl restart ceph-mon@<MON-ID>.service<br />
<br />
== Verify Monitor instance versions ==<br />
<br />
Print the binary versions of all currently running Monitor instances in your cluster. Verify that all monitors are running the same Ceph version, and that the version number starts with 12 (X, Y, and AAA are placeholders):<br />
<br />
ceph mon versions<br />
<br />
{<br />
"ceph version 12.X.Y (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA) luminous (rc)": 3<br />
}<br />
<br />
The last number shows the number of monitor instances.<br />
<br />
== Create Manager instances ==<br />
<br />
Create a manager instance on each node where a monitor is configured:<br />
<br />
pveceph createmgr<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Then restart all OSD instances on all node<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
Check the currently running binary version of all running OSD instances in your cluster:<br />
<br />
ceph osd versions<br />
<br />
After restarting all OSD instances on all nodes, this should output one line with a Ceph Luminous version string followed by the total number of OSDs in your cluster.<br />
<br />
== Check cluster status and adjust settings ==<br />
<br />
After your node has successful restarts the daemons, unset the 'noout' flag.<br />
On the GUI or by this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if you Ceph cluster is healthy.<br />
ceph -s<br />
<br />
You will get a warning like this "require_osd_release < luminous"<br />
you can fix it with the following command.<br />
<br />
ceph osd require-osd-release luminous<br />
<br />
Also it is recommended to set the tunable to optimal, but this will produce a massive rebalance.<br />
<br />
ceph osd set-require-min-compat-client jewel <br />
<br />
ceph osd crush tunables optimal <br />
<br />
After you set all tunables, you might see the following message: "application not enabled on 2 pool(s)". Starting with Ceph luminous a pool needs to be associated with an application.<br />
ceph osd pool application enable rbd rbd<br />
<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Jewel_to_Luminous&diff=10177Ceph Jewel to Luminous2018-07-23T13:40:26Z<p>A.antreich: /* Preparation on each ceph cluster node */</p>
<hr />
<div>== Introduction ==<br />
This HOWTO explains the upgrade from Ceph Jewel to Luminous (12.2.0 or higher) on Proxmox VE 4.x in preparation for upgrading to PVE 5.x.<br />
<br />
The latest Ceph version supported in pveceph in PVE 4.x is Ceph Jewel (10.2.x). An upgrade to Ceph Luminous (12.2.x) is only possible temporarily as first step of upgrading to PVE 5.x.<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/release-notes/#v12-2-0-luminous Release Notes]<br />
<br />
== Assumption ==<br />
In this HOWTO we assume that all nodes are on the very latest Proxmox VE 4.4 version and Ceph is on Version Jewel (10.2.9 or higher).<br />
<br />
The Cluster must be healthy and working.<br />
<br />
== Cluster Preparation ==<br />
On a cluster member you have to set sortbitwise.<br />
This is very important, if this flag is not set you can lose all your data.<br />
<br />
ceph osd set sortbitwise<br />
<br />
To avoid re-balance during the upgrade process set noout.<br />
<br />
ceph osd set noout<br />
<br />
Since Luminous you have to allow explicit to delete a pool.<br />
Edit /etc/pve/ceph.conf with your preferred editor and add this line in section [global] <br />
mon allow pool delete = true<br />
<br />
== Preparation on each ceph cluster node ==<br />
Change the current Ceph repositories from Jewel to Luminous.<br />
<br />
sed -i 's/jewel/luminous/' /etc/apt/sources.list.d/ceph.list<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/install/get-packages/ Ceph Packages]<br />
<br />
<br />
'''Note: The ceph luminous version to install should be lower then the released version on the ceph repository from Proxmox.'''<br />
<br />
In some cases the released packages on the ceph upstream repository are newer then on the ceph repository from Proxmox. This will prevent the use of the ceph repository from Proxmox on upgrading to PVE 5.x. A downgrade of the packages or waiting till the versions match would be necessary.<br />
<br />
You can find the ceph luminous packages from Proxmox here: http://download.proxmox.com/debian/ceph-luminous/dists/stretch/main/binary-amd64/<br />
(upstream) download.ceph.com -> 12.2.7<br />
(Proxmox) download.proxmox.com -> 12.2.5<br />
In this example the upstream version is newer then the version from Proxmox. So the lower version (eg. 12.2.5) needs to be installed explicitly.<br />
apt install ceph=12.2.5<br />
To install a version with apt explicitly you specify the version directly with the package.<br />
<br />
== Upgrade on each ceph cluster node ==<br />
Upgrade all your nodes with the following commands. <br />
apt-get update && apt-get dist-upgrade<br />
<br />
It will upgrade Ceph on your node to Luminous.<br />
After the update all services are still running using the old jewel binaries.<br />
<br />
== Restart the Monitor daemons ==<br />
<br />
After all cluster nodes are upgraded you have to restart the monitor on each node were a monitor is configured.<br />
<br />
systemctl restart ceph-mon@<MON-ID>.service<br />
<br />
== Verify Monitor instance versions ==<br />
<br />
Print the binary versions of all currently running Monitor instances in your cluster. Verify that all monitors are running the same Ceph version, and that the version number starts with 12 (X, Y, and AAA are placeholders):<br />
<br />
ceph mon versions<br />
<br />
{<br />
"ceph version 12.X.Y (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA) luminous (rc)": 3<br />
}<br />
<br />
The last number shows the number of monitor instances.<br />
<br />
== Create Manager instances ==<br />
<br />
Create a manager instance on each node where a monitor is configured:<br />
<br />
pveceph createmgr<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Then restart all OSD instances on all node<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
Check the currently running binary version of all running OSD instances in your cluster:<br />
<br />
ceph osd versions<br />
<br />
After restarting all OSD instances on all nodes, this should output one line with a Ceph Luminous version string followed by the total number of OSDs in your cluster.<br />
<br />
== Check cluster status and adjust settings ==<br />
<br />
After your node has successful restarts the daemons, unset the 'noout' flag.<br />
On the GUI or by this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if you Ceph cluster is healthy.<br />
ceph -s<br />
<br />
You will get a warning like this "require_osd_release < luminous"<br />
you can fix it with the following command.<br />
<br />
ceph osd require-osd-release luminous<br />
<br />
Also it is recommended to set the tunable to optimal, but this will produce a massive rebalance.<br />
<br />
ceph osd set-require-min-compat-client jewel <br />
<br />
ceph osd crush tunables optimal <br />
<br />
After you set all tunables, you might see the following message: "application not enabled on 2 pool(s)". Starting with Ceph luminous a pool needs to be associated with an application.<br />
ceph osd pool application enable rbd rbd<br />
<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Jewel_to_Luminous&diff=10176Ceph Jewel to Luminous2018-07-23T13:38:21Z<p>A.antreich: /* Preparation on each ceph cluster node */</p>
<hr />
<div>== Introduction ==<br />
This HOWTO explains the upgrade from Ceph Jewel to Luminous (12.2.0 or higher) on Proxmox VE 4.x in preparation for upgrading to PVE 5.x.<br />
<br />
The latest Ceph version supported in pveceph in PVE 4.x is Ceph Jewel (10.2.x). An upgrade to Ceph Luminous (12.2.x) is only possible temporarily as first step of upgrading to PVE 5.x.<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/release-notes/#v12-2-0-luminous Release Notes]<br />
<br />
== Assumption ==<br />
In this HOWTO we assume that all nodes are on the very latest Proxmox VE 4.4 version and Ceph is on Version Jewel (10.2.9 or higher).<br />
<br />
The Cluster must be healthy and working.<br />
<br />
== Cluster Preparation ==<br />
On a cluster member you have to set sortbitwise.<br />
This is very important, if this flag is not set you can lose all your data.<br />
<br />
ceph osd set sortbitwise<br />
<br />
To avoid re-balance during the upgrade process set noout.<br />
<br />
ceph osd set noout<br />
<br />
Since Luminous you have to allow explicit to delete a pool.<br />
Edit /etc/pve/ceph.conf with your preferred editor and add this line in section [global] <br />
mon allow pool delete = true<br />
<br />
== Preparation on each ceph cluster node ==<br />
Change the current Ceph repositories from Jewel to Luminous.<br />
<br />
sed -i 's/jewel/luminous/' /etc/apt/sources.list.d/ceph.list<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/install/get-packages/ Ceph Packages]<br />
<br />
<br />
'''Note: The ceph luminous version to install should be lower then the released version on the ceph repository from Proxmox.'''<br />
<br />
In some cases the released packages on the ceph upstream repository are newer then on the ceph repository from Proxmox. This will prevent the use of the ceph repository from Proxmox on upgrading to PVE 5.x.<br />
<br />
You can find the ceph luminous packages from Proxmox here: http://download.proxmox.com/debian/ceph-luminous/dists/stretch/main/binary-amd64/<br />
(upstream) download.ceph.com -> 12.2.7<br />
(Proxmox) download.proxmox.com -> 12.2.5<br />
In this example the upstream version is newer then the version from Proxmox. So the lower version (eg. 12.2.5) needs to be installed explicitly.<br />
apt install ceph=12.2.5<br />
To install a version with apt explicitly you specify the version directly with the package.<br />
<br />
== Upgrade on each ceph cluster node ==<br />
Upgrade all your nodes with the following commands. <br />
apt-get update && apt-get dist-upgrade<br />
<br />
It will upgrade Ceph on your node to Luminous.<br />
After the update all services are still running using the old jewel binaries.<br />
<br />
== Restart the Monitor daemons ==<br />
<br />
After all cluster nodes are upgraded you have to restart the monitor on each node were a monitor is configured.<br />
<br />
systemctl restart ceph-mon@<MON-ID>.service<br />
<br />
== Verify Monitor instance versions ==<br />
<br />
Print the binary versions of all currently running Monitor instances in your cluster. Verify that all monitors are running the same Ceph version, and that the version number starts with 12 (X, Y, and AAA are placeholders):<br />
<br />
ceph mon versions<br />
<br />
{<br />
"ceph version 12.X.Y (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA) luminous (rc)": 3<br />
}<br />
<br />
The last number shows the number of monitor instances.<br />
<br />
== Create Manager instances ==<br />
<br />
Create a manager instance on each node where a monitor is configured:<br />
<br />
pveceph createmgr<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Then restart all OSD instances on all node<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
Check the currently running binary version of all running OSD instances in your cluster:<br />
<br />
ceph osd versions<br />
<br />
After restarting all OSD instances on all nodes, this should output one line with a Ceph Luminous version string followed by the total number of OSDs in your cluster.<br />
<br />
== Check cluster status and adjust settings ==<br />
<br />
After your node has successful restarts the daemons, unset the 'noout' flag.<br />
On the GUI or by this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if you Ceph cluster is healthy.<br />
ceph -s<br />
<br />
You will get a warning like this "require_osd_release < luminous"<br />
you can fix it with the following command.<br />
<br />
ceph osd require-osd-release luminous<br />
<br />
Also it is recommended to set the tunable to optimal, but this will produce a massive rebalance.<br />
<br />
ceph osd set-require-min-compat-client jewel <br />
<br />
ceph osd crush tunables optimal <br />
<br />
After you set all tunables, you might see the following message: "application not enabled on 2 pool(s)". Starting with Ceph luminous a pool needs to be associated with an application.<br />
ceph osd pool application enable rbd rbd<br />
<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Nested_Virtualization&diff=10153Nested Virtualization2018-06-11T07:49:09Z<p>A.antreich: /* Requirements */ Added note for cluster</p>
<hr />
<div>'''beware: this wiki article is a draft and may be not accurate or complete'''<br />
<br />
== What is ==<br />
Nested virtualization is when you run an hypervisor, like PVE or others, inside a virtual machine (which is of course running on another hypervisor) instead that on real hardware. In other words, you have a host hypervisor, hosting a guest hypervisor (as a vm), which can hosts its own vms. <br />
<br />
This obviously adds an overhead to the nested environment, but it could be useful in some cases: <br />
* it could let you test (or learn) how to manage hypervisors before actual implementation, or test some dangerous/tricky procedure involving hypervisors berfore actually doing it on the real thing. <br />
* it could enable businesses to deploy their own virtualization environment, e.g. on public services (cloud), see also http://www.ibm.com/developerworks/cloud/library/cl-nestedvirtualization/<br />
<br />
== Requirements ==<br />
In order to have the fastest possible performance, near to native, any hypervisor should have access to some (real) hardware features that are generally useful for virtualization, the so called 'hardware-assisted virtualization extensions' (see http://en.wikipedia.org/wiki/Hardware-assisted_virtualization).<br />
<br />
In nested virtualization, also the guest hypervisor should have access to hardware-assisted virtualization extensions, and that implies that the host hypervisor should expose those extension to its virtual machines. In principle it works without those extensions too but with poor performance and it is mot an option for productive environment (but maybe sufficient for some test cases). Exposing of those extensions requires in case of intel CPUs kernel 3 or higher, i.e. it is available in Proxmox VE 4.x/5.x, but not as default in older versions.<br />
<br />
You will need to allocate plenty of cpu, ram and disk to those guest hypervisors. If you intend to migrate machines in a cluster, nested needs to be activated on all hosts in the cluster.<br />
<br />
== PVE as nested Hypervisor ==<br />
<br />
PVE can:<br />
* host a nested (guest) hypervisor, but by default it does not expose hardware-assisted virtualization extensions to its VM, so you cannot expect to have optimal performance for virtual machines in the guest hypervisor unless you configure the VM´s (virtual hypervisor´s) CPU as "host" and have nested hardware-assisted virtualization extensions enabled in the physical PVE host.<br />
<br />
* be hosted as a nested (guest) hypervisor. If the host hypervisor can expose hardware-assisted virtualization extensions to PVE, it could be able to use them, and provide better performance to its guests, otherwise, as in the PVE-inside-PVE case, any vm (kvm) will only work after you turn off KVM hardware virtualization (see vm options).<br />
<br />
== Enable Nested Hardware-assisted Virtualization ==<br />
<br />
To be done on the physical PVE host (or any other hypervisor).<br />
<br />
To have nested hardware-assisted virtualization, you have to:<br />
<br />
* use AMD cpu or very recent Intel one<br />
* use kernel >= 3.10 (is always the case in Proxmox VE 4.x)<br />
* enable nested support<br />
to check if is enabled do ("kvm_intel" for intel cpu, "kvm_amd" for AMD)<br />
root@proxmox:~# cat /sys/module/kvm_intel/parameters/nested <br />
N<br />
<br />
<br />
N means it's not, to enable ("kvm-intel" for intel):<br />
# echo "options kvm-intel nested=Y" > /etc/modprobe.d/kvm-intel.conf<br />
(or "kvm-amd" for AMD, note the 1 instead of Y):<br />
# echo "options kvm-amd nested=1" > /etc/modprobe.d/kvm-amd.conf<br />
and reboot or reload the kernel modul<br />
modprobe -r kvm_intel<br />
modprobe kvm_intel<br />
<br />
check again<br />
root@proxmox:~# cat /sys/module/kvm_intel/parameters/nested <br />
Y<br />
<br />
<br />
(pay attention where the dash "-" is used, and where it's underscore "_" instead)<br />
<br />
Then create a guest where you install e.g. Proxmox as nested Virtualization Environment.<br />
<br />
* set the CPU type to "host"<br />
<br />
* in case of AMD CPU: add also the following in the configuration file:<br />
<br />
args: -cpu host,+svm<br />
<br />
Once installed the guest OS, if GNU/Linux you can enter and verify that the hardware virtualization support is enabled by doing<br />
root@guest1# egrep '(vmx|svm)' --color=always /proc/cpuinfo<br />
<br />
== Example: PVE hosts a PVE guest hypervisor ==<br />
<br />
=== Set a cluster of self nested PVE ===<br />
In the physical host Proxmox you create 2 VM, and in each one install a new instance of Proxmox, so you can experiment with cluster concepts without the need of having multiple physical servers.<br />
* log into (web gui) your host pve (running on real hardware)<br />
=> PVE<br />
* create two or more vm guests (kvm) in your host PVE, each with enough ram/disk and install PVE from iso on the each guest vm (same network)<br />
=> PVE => VMPVE1 (guest PVE)<br />
=> PVE => VMPVE2 (guest PVE)<br />
...<br />
* log into (ssh/console) the first guest vm & create cluster CLUSTERNAME<br />
=> PVE => VMPVE1 (guest PVE) => #pvecm create CLUSTERNAME<br />
* log into each other guest vm & join cluster <CLUSTERNAME><br />
=> PVE => VMPVE2 (guest PVE) => #pvecm add <IP address of VM1><br />
* log into (web gui) any guest vm (guest pve) and manage the new (guest) cluster<br />
=> PVE => VMPVE1/2 (guest PVE) => #pvecm n<br />
* create vm or ct inside the guest pve (nodes of CLUSTERNAME)<br />
** if you did't enable hardware-assisted nested virtualization, you have to turn off KVM hardware virtualization (see vm options)<br />
** install only CLI based, small ct or vm for those guest (do not try anything with a GUI, don't even think of running Windows...)<br />
<br />
=> PVE => VMPVE1/2 (guest PVE) => VM/CT<br />
<br />
* install something on (eg) a vm (eg: a basic ubuntu server) from iso<br />
=> PVE => VMPVE2 (guest PVE) => VM (basic ubuntu server)<br />
<br />
=== vm/ct performance without hardware-assisted virtualization extensions ===<br />
if you can't setup hardware-assisted virtualization extensions for the guest, performance is far from optimal! Use only to practice or test!<br />
* ct (lxc) will be faster, of course, quite usable<br />
* vm (kvm) will be really slow, nearly unusable (you can expect 10x slower or more), since (as said above) they're running without KVM hardware virtualization<br />
<br />
but at least you can try or test "guest pve" features or setups:<br />
* you could create a small test cluster to practice with cluster concepts and operations<br />
* you could test a new pve version before upgrading<br />
* you could test setups conflicting with your production setup</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Jewel_to_Luminous&diff=10069Ceph Jewel to Luminous2018-03-12T15:54:04Z<p>A.antreich: </p>
<hr />
<div>== Introduction ==<br />
This HOWTO explains the upgrade from Ceph Jewel to Luminous (12.2.0 or higher) on Proxmox VE 4.x in preparation for upgrading to PVE 5.x.<br />
<br />
The latest Ceph version supported in pveceph in PVE 4.x is Ceph Jewel (10.2.x). An upgrade to Ceph Luminous (12.2.x) is only possible temporarily as first step of upgrading to PVE 5.x.<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/release-notes/#v12-2-0-luminous Release Notes]<br />
<br />
== Assumption ==<br />
In this HOWTO we assume that all nodes are on the very latest Proxmox VE 4.4 version and Ceph is on Version Jewel (10.2.9 or higher).<br />
<br />
The Cluster must be healthy and working.<br />
<br />
== Cluster Preparation ==<br />
On a cluster member you have to set sortbitwise.<br />
This is very important, if this flag is not set you can lose all your data.<br />
<br />
ceph osd set sortbitwise<br />
<br />
To avoid re-balance during the upgrade process set noout.<br />
<br />
ceph osd set noout<br />
<br />
Since Luminous you have to allow explicit to delete a pool.<br />
Edit /etc/pve/ceph.conf with your preferred editor and add this line in section [global] <br />
mon allow pool delete = true<br />
<br />
== Preparation on each ceph cluster node ==<br />
Change the current Ceph repositories from Jewel to Luminous.<br />
<br />
sed -i 's/jewel/luminous/' /etc/apt/sources.list.d/ceph.list<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/install/get-packages/ Ceph Packages]<br />
<br />
== Upgrade on each ceph cluster node ==<br />
Upgrade all your nodes with the following commands. <br />
apt-get update && apt-get dist-upgrade<br />
<br />
It will upgrade Ceph on your node to Luminous.<br />
After the update all services are still running using the old jewel binaries.<br />
<br />
== Restart the Monitor daemons ==<br />
<br />
After all cluster nodes are upgraded you have to restart the monitor on each node were a monitor is configured.<br />
<br />
systemctl restart ceph-mon@<MON-ID>.service<br />
<br />
== Verify Monitor instance versions ==<br />
<br />
Print the binary versions of all currently running Monitor instances in your cluster. Verify that all monitors are running the same Ceph version, and that the version number starts with 12 (X, Y, and AAA are placeholders):<br />
<br />
ceph mon versions<br />
<br />
{<br />
"ceph version 12.X.Y (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA) luminous (rc)": 3<br />
}<br />
<br />
The last number shows the number of monitor instances.<br />
<br />
== Create Manager instances ==<br />
<br />
Create a manager instance on each node where a monitor is configured:<br />
<br />
pveceph createmgr<br />
<br />
== Restart the OSD daemon on all nodes ==<br />
<br />
Then restart all OSD instances on all node<br />
<br />
systemctl restart ceph-osd.target<br />
<br />
Check the currently running binary version of all running OSD instances in your cluster:<br />
<br />
ceph osd versions<br />
<br />
After restarting all OSD instances on all nodes, this should output one line with a Ceph Luminous version string followed by the total number of OSDs in your cluster.<br />
<br />
== Check cluster status and adjust settings ==<br />
<br />
After your node has successful restarts the daemons, unset the 'noout' flag.<br />
On the GUI or by this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if you Ceph cluster is healthy.<br />
ceph -s<br />
<br />
You will get a warning like this "require_osd_release < luminous"<br />
you can fix it with the following command.<br />
<br />
ceph osd require-osd-release luminous<br />
<br />
Also it is recommended to set the tunable to optimal, but this will produce a massive rebalance.<br />
<br />
ceph osd set-require-min-compat-client jewel <br />
<br />
ceph osd crush tunables optimal <br />
<br />
After you set all tunables, you might see the following message: "application not enabled on 2 pool(s)". Starting with Ceph luminous a pool needs to be associated with an application.<br />
ceph osd pool application enable rbd rbd<br />
<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=User:A.antreich&diff=9993User:A.antreich2017-11-03T14:03:41Z<p>A.antreich: Created page with "PVE"</p>
<hr />
<div>PVE</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Server&diff=9992Ceph Server2017-11-03T14:02:44Z<p>A.antreich: </p>
<hr />
<div>== Introduction ==<br />
<br />
[[Image:Screen-Ceph-Status.png|thumb]] Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability - See more at: http://ceph.com. <br />
<br />
Proxmox VE supports Ceph’s RADOS Block Device to be used for VM and container disks. The Ceph storage services are usually hosted on external, dedicated storage nodes. Such storage clusters can sum up to several hundreds of nodes, providing petabytes of storage capacity. <br />
<br />
For smaller deployments, it is also possible to run Ceph services directly on your Proxmox VE nodes. Recent hardware has plenty of CPU power and RAM, so running storage services and VM/CTs on same node is possible. <br />
<br />
This articles describes how to setup and run Ceph storage services directly on Proxmox VE nodes. If you want to install and configure an external Ceph storage read the [http://docs.ceph.com/docs/master/ Ceph documentation]. To configure an external Ceph storage works as described in section [[#Ceph Client | Ceph Client]] accordingly.<br />
<br />
== Advantages ==<br />
<br />
*Easy setup and management with CLI and GUI support on Proxmox VE <br />
*Thin provisioning <br />
*Snapshots support <br />
*Self healing <br />
*No single point of failure <br />
*Scalable to the exabyte level <br />
*Setup pools with different performance and redundancy characteristics <br />
*Data is replicated, making it fault tolerant <br />
*Runs on economical commodity hardware <br />
*No need for hardware RAID controllers <br />
*Easy management <br />
*Open source<br />
<br />
== Why do we need a new command line tool (pveceph)? ==<br />
<br />
For the use in the specific Proxmox VE architecture we use pveceph. Proxmox VE provides a distributed file system ([http://pve.proxmox.com/wiki/Proxmox_Cluster_file_system_%28pmxcfs%29 pmxcfs]) to store configuration files. <br />
<br />
We use this to store the Ceph configuration. The advantage is that all nodes see the same file, and there is no need to copy configuration data around using ssh/scp. The tool can also use additional information from your Proxmox VE setup. <br />
<br />
Tools like ceph-deploy cannot take advantage of that architecture.<br />
<br />
== Recommended hardware ==<br />
<br />
You need at least three identical servers for the redundant setup. Here is the specifications of one of our test lab clusters with Proxmox VE and Ceph (three nodes): <br />
<br />
*Dual Xeon E5-2620v2, 64 GB RAM, Intel S2600CP mainboard, Intel RMM, Chenbro 2U chassis with eight 3.5” hot-swap drive bays, 2 fixed 2.5" SSD bays <br />
*10 GBit network for Ceph traffic (one Dual 10 Gbit Intel X540-T2 in each server, one 10Gb switch - Cisco SG350XG-2F1) <br />
*Single enterprise class SSD for the Proxmox VE installation (because we run Ceph monitors there and quite a lot of logs), we use one Samsung SM863 240 GB per host. <br />
*Use at least two SSD as OSD drives. You need high quality and enterprise class SSDs here, never use consumer or "PRO" consumer SSDs. In our testsetup, we have 4 Intel SSD DC S3520 1.2TB, 2.5" SATA SSD per host for storing the data (OSD, no extra journal) - This setup delivers about 14 TB storage. By using the redundancy of 3, you can store up to 4,7 TB (100%). But to be prepared for failed disks and hosts, you should never fill up your storage with 100&nbsp;%. <br />
*As a general rule, the more OSD the better, fast CPU (high GHZ) is also recommended. NVMe express cards are also possible, e.g. mix of slow SATA disks with SSD/NVMe journal devices. <br />
<br />
Again, if you expect good performance, always use enterprise class SSD only, we have good results in our testlabs with:<br />
*SATA SSDs:<br />
**Intel SSD DC S3520<br />
**Intel SSD DC S3610<br />
**Intel SSD DC S3700/S3710<br />
**Samsung SSD SM863<br />
*NVMe PCIe 3.0 x4 as journal:<br />
**Intel SSD DC P3700<br />
<br />
By adding more OSD SSD/disks into the free drive bays, the storage can be expanded. Of course, you can add more servers too as soon as your business is growing, without service interruption and with minimal configuration changes. <br />
<br />
If you do not want to run virtual machines and Ceph on the same host, you can just add more Proxmox VE nodes and use these for running the guests and the others just for the storage.<br />
<br />
== Installation of Proxmox VE ==<br />
<br />
Before you start with Ceph, you need a working Proxmox VE cluster with 3 nodes (or more). We install Proxmox VE on a fast and reliable enterprise class SSD, so we can use all bays for OSD (Object Storage Devices) data. Just follow the well known instructions on [[Installation]] and [[Cluster_Manager]]. <br />
<br />
'''Note:''' <br />
<br />
Use ext4 if you install on SSD (at the boot prompt of the installation ISO you can specify parameters, e.g. "linux ext4 swapsize=4").<br />
<br />
=== Ceph on Proxmox VE 5.1 ===<br />
In Proxmox VE 5.1 the only available Ceph version is Luminous, stable and production ready.<br />
<br />
== Network for Ceph ==<br />
<br />
All nodes need access to a separate 10Gb network interface, exclusively used for Ceph. We use network 10.10.10.0/24 for this tutorial. <br />
<br />
It is highly recommended to use 10Gb for that network to avoid performance problems. Bonding can be used to increase availability. <br />
<br />
If you do not have fast network switches, you can also setup a [[Full Mesh Network for Ceph Server]].<br />
<br />
=== First node ===<br />
<br />
The network setup (ceph private network) from our first node contains: <br />
<br />
# from /etc/network/interfaces<br />
auto eth2<br />
iface eth2 inet static<br />
address 10.10.10.1<br />
netmask 255.255.255.0<br />
<br />
=== Second node ===<br />
<br />
The network setup (ceph private network) from our second node contains: <br />
<br />
# from /etc/network/interfaces<br />
auto eth2<br />
iface eth2 inet static<br />
address 10.10.10.2<br />
netmask 255.255.255.0<br />
<br />
=== Third node ===<br />
<br />
The network setup (ceph private network) from our third node contains: <br />
<br />
# from /etc/network/interfaces<br />
auto eth2<br />
iface eth2 inet static<br />
address 10.10.10.3<br />
netmask 255.255.255.0<br />
<br />
== Installation of Ceph packages ==<br />
<br />
You now need to select 3 nodes and install the Ceph software packages there. We wrote a small command line utility called 'pveceph' which helps you performing this tasks, you can also choose the version of Ceph. Login to all your nodes and execute the following on all: <br />
<br />
node1# pveceph install --version jewel<br />
<br />
node2# pveceph install --version jewel<br />
<br />
node3# pveceph install --version jewel<br />
<br />
This sets up an 'apt' package repository in /etc/apt/sources.list.d/ceph.list and installs the required software.<br />
<br />
== Create initial Ceph configuration ==<br />
<br />
[[Image:Screen-Ceph-Config.png|thumb]] After installation of packages, you need to create an initial Ceph configuration on just one node, based on your private network: <br />
<br />
node1# pveceph init --network 10.10.10.0/24<br />
<br />
This creates an initial config at /etc/pve/ceph.conf. That file is automatically distributed to all Proxmox VE nodes by using [http://pve.proxmox.com/wiki/Proxmox_Cluster_file_system_%28pmxcfs%29 pmxcfs]. The command also creates a symbolic link from /etc/ceph/ceph.conf pointing to that file. So you can simply run Ceph commands without the need to specify a configuration file. <br />
<br />
== Creating Ceph Monitors ==<br />
<br />
[[Image:Screen-Ceph-Monitor.png|thumb]] After that you can create the first Ceph monitor service using: <br />
<br />
node1# pveceph createmon<br />
<br />
== Continue with CLI or GUI ==<br />
<br />
As soon as you have created the first monitor, you can start using the Proxmox GUI (see the video tutorial on [http://youtu.be/ImyRUyMBrwo Managing Ceph Server]) to manage and view your Ceph configuration. <br />
<br />
Of course, you can continue to use the command line tools (CLI). &nbsp;We continue with the CLI in this wiki article, but you should achieve the same results no matter which way you finish the remaining steps. <br />
<br />
== Creating more Ceph Monitors ==<br />
<br />
You should run 3 monitors, one on each node. Create them via GUI or via CLI. So please login to the next node and run: <br />
<br />
node2# pveceph createmon<br />
<br />
And execute the same steps on the third node: <br />
<br />
node3# pveceph createmon<br />
<br />
'''Note:''' <br />
<br />
If you add a node where you do not want to run a Ceph monitor, e.g. another node for OSDs, you need to install the Ceph packages with 'pveceph install'.<br />
<br />
== Creating Ceph OSDs ==<br />
<br />
[[Image:Screen-Disks.png|thumb]] [[Image:Screen-Ceph-OSD-Status.png|thumb]] First, please be careful when you initialize your OSD disks, because it basically removes all existing data from those disks. So it is important to select the correct device names. The Proxmox VE GUI displays a list of all disk, together with device names, usage information and serial numbers. <br />
<br />
Creating OSDs can be done via GUI - self explaining - or via CLI, explained here: <br />
<br />
Having that said, initializing an OSD can be done with: <br />
<br />
# pveceph createosd /dev/sd[X]<br />
<br />
If you want to use a dedicated SSD journal disk: <br />
<br />
# pveceph createosd /dev/sd[X] -journal_dev /dev/sd[X]<br />
<br />
Example: /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD journal disk <br />
<br />
# pveceph createosd /dev/sdf -journal_dev /dev/sdb<br />
<br />
This partitions the disk (data and journal partition), creates filesystems, starts the OSD and add it to the existing crush map. So afterward the OSD is running and fully functional. Please create at least 12 OSDs, distributed among your nodes (4 on each node). <br />
<br />
You can create OSDs containing both journal and data partitions or you can place the journal on a dedicated SSD. Using a SSD journal disk is highly recommended if you expect good performance. <br />
<br />
'''Note:''' <br />
<br />
In order to use a dedicated journal disk (SSD), the disk needs to have a GPT partition table. You can create this with 'gdisk /dev/sd(x)'. If there is no GPT, you cannot select the disk as journal. Currently the journal size is fixed to 5 GB.<br />
<br />
== Ceph Pools ==<br />
<br />
[[Image:Screen-Ceph-Pools.png|thumb]] [[Image:Screen-Ceph-Log.png|thumb]] The standard installation creates some default pools, so you can either use the standard 'rbd' pool, or create your own pools using the GUI. <br />
<br />
In order to calculate your the number of placement groups for your pools, you can use:<br />
<br />
'''Ceph PGs per Pool Calculator'''<br />
<br />
http://ceph.com/pgcalc/<br />
===pool size notes===<br />
The recommended pool setting is size 3 and min.size 2.<br />
All lower size settings are dangerous and you can lose your pool data.<br />
<br />
== Ceph Client ==<br />
You also need to copy the keyring to a predefined location.<br />
<br />
'''Note that the file name needs to be storage id + .keyring storage id is the expression after 'rbd:' in /etc/pve/storage.cfg which is my-ceph-storage in the current example.'''<br />
<br />
# mkdir /etc/pve/priv/ceph<br />
# cp /etc/pve/priv/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring<br />
<br />
You can then configure Proxmox VE to use such pools to store VM images, just use the GUI ("Add Storage": RBD). A typical entry in the Proxmox VE storage configuration looks like: <br />
<br />
# from /etc/pve/storage.cfg<br />
rbd: my-ceph-storage<br />
monhost 10.10.10.1;10.10.10.2;10.10.10.3<br />
pool rbd<br />
content images<br />
username admin<br />
krbd 0<br />
<br />
If you want to store containers on Ceph, you need to create an extra pool using KRBD.<br />
<br />
# from /etc/pve/storage.cfg<br />
rbd: my-ceph-storage-for-lxc<br />
monhost 10.10.10.1;10.10.10.2;10.10.10.3<br />
pool rbd-lxc<br />
content images<br />
username admin<br />
krbd 1<br />
<br />
== Further readings about Ceph ==<br />
<br />
Ceph comes with plenty of documentation [http://ceph.com/docs/master/ here]. Even better, the dissertation from the creator of Ceph - Sage A. Weil - is also [http://ceph.com/papers/weil-thesis.pdf available]. By reading this you can get a deep insight how it works. <br />
<br />
*http://ceph.com/ <br />
*https://www.redhat.com/en/technologies/storage/ceph<br />
<br />
*https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/, Journal SSD Recommendations<br />
<br />
== Video Tutorials ==<br />
*[https://www.proxmox.com/en/training/video-tutorials/item/install-ceph-server-on-proxmox-ve Install Ceph Server on Proxmox VE]<br />
<br />
===Proxmox YouTube channel===<br />
You can subscribe to our [http://www.youtube.com/ProxmoxVE Proxmox VE Channel] on YouTube to get updates about new videos.<br />
<br />
== Ceph Misc ==<br />
<br />
=== Set the Ceph OSD tunables ===<br />
<br />
In Luminous the tunables are already set to optimal. On Jewel you set the tunables to hammer.<br />
ceph osd crush tunables optimal<br />
<br />
If you set the tunable and your ceph cluster is used you should do this when the cluster has the least load.<br />
Also try to use the backfill option to do this process slower.<br />
<br />
[http://docs.ceph.com/docs/jewel/dev/osd_internals/backfill_reservation/ Ceph Backfill]<br />
<br />
'''Important'''<br />
On PVE4, do not set tunable to '''optimal''' because then krbd will not work.<br />
Set it to hammer.<br />
<br />
=== Prepare OSD Disks ===<br />
<br />
It should be noted that this command refuses to initialize disk when it detects existing data. So if you want to overwrite a disk you should remove existing data first. You can do that using: <br />
<br />
# ceph-disk zap /dev/sd[X]<br />
*In some cases disks that used to be part of a 3ware raid need the following in addition to zap. <br />
<br />
try this:<br />
<pre><br />
#To remove partition table and boot sector the following should be sufficient:<br />
dd if=/dev/zero of=/dev/$DISK bs=1024 count=1<br />
</pre><br />
or<br />
<pre><br />
DISK=$1<br />
<br />
if [ "$1" = "" ]; then<br />
echo "Need to supply a dev name like sdg . exiting"<br />
exit 1<br />
fi<br />
echo " make sure this is the correct disk "<br />
echo $DISK<br />
echo " you will end up with NO partition table when this procedes . example:<br />
Disk /dev/$1 doesn't contain a valid partition table<br />
Press enter to procede , or ctl-c to exit "<br />
<br />
read x<br />
dd if=/dev/zero of=/dev/$DISK bs=512 count=50000<br />
</pre><br />
<br />
=== Upgrading existing Ceph Server from Hammer to Jewel ===<br />
See [[Ceph Hammer to Jewel]]<br />
<br />
=== Upgrading existing Ceph Server from Jewel to Luminous ===<br />
See [[Ceph Jewel to Luminous]]<br />
<br />
===osd hardware===<br />
it is best to use the same suggested model drive for all osd's.<br />
<br />
===make one change at a time===<br />
after ceph is set up and running make only one change at a time.<br />
<br />
adding and osd, changing pool settings - check log and make sure health is normal before the next change. too many changes at the same time can result in slow systems - bad for cli.<br />
<br />
to check that change has completed check logs. at '''pve>ceph>log''' <br />
:or from cli<br />
ceph -w<br />
or<br />
ceph -s<br />
<br />
===using a disk that was part of a zfs pool ===<br />
as of now <br />
ceph-disk zap /dev/sdX<br />
is needed.<br />
<br />
else it does not show up on pve > ceph > osd > Create OSD<br />
<br />
=== restore lxc from zfs to ceph ===<br />
if lxc is on zfs with compression the actual disk usage can be far greater then expected. <br />
see https://forum.proxmox.com/threads/lxc-restore-fail-to-ceph.32419/#post-161287<br />
<br />
One way to know actual disk usage:<br />
: restore backup to a ext4 directory and run du -sh , then do restore manually specifying target disk size. <br />
=== scsi setting ===<br />
make sure that you use virtio-scsi controller (not LSI), see VM options. I remember some panic when using LSI recently but I did not debug it further as modern OS should use virtio-scsi anyways. https://forum.proxmox.com/threads/restarted-a-node-some-kvms-on-other-nodes-panic.32806<br />
=== kvm hard disk cache===<br />
use write-through [ from forum 2/2017 ]<br />
<br />
=== noout ===<br />
http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing<br />
<br />
"Periodically, you may need to perform maintenance on a subset of your cluster, or resolve a problem that affects a failure domain (e.g., a rack). If you do not want CRUSH to automatically rebalance the cluster as you stop OSDs for maintenance, set the cluster to noout first:"<br />
ceph osd set noout<br />
<br />
'''that can be done from pve at ceph>osd . very important.'''<br />
<br />
=== Disabling Cephx ===<br />
research this 1ST . check forum , ceph mail list and http://docs.ceph.com/docs/master/rados/configuration/auth-config-ref/ :''" The cephx protocol is enabled by default. Cryptographic authentication has some computational costs, though they should generally be quite low. If the network environment connecting your client and server hosts is very safe and you cannot afford authentication, you can turn it off. This is not generally recommended."''<br />
<br />
Our ceph network is isolated, and I am looking to speed up ceph performance so did this.<br />
<br />
<br />
1- turn off all VM's which use ceph<br />
<br />
2- /etc/pve/ceph.conf set:<br />
<pre><br />
auth cluster required = none<br />
auth service required = none<br />
auth client required = none<br />
</pre><br />
3- stop ceph daemons per http://docs.ceph.com/docs/master/rados/operations/operating/<br />
To stop all daemons on a Ceph Node (irrespective of type), execute the following: <br />
** Do on all mon nodes **<br />
systemctl stop ceph\*.service ceph\*.target<br />
<br />
4- you need also to remove the client key in /etc/pve/priv/ceph/ <br />
<pre><br />
cd /etc/pve/priv<br />
# do not do this line, else this file will be recreated by /usr/share/perl5/PVE/API2/Ceph.pm and using old keys later will not work.<br />
#-#mv ceph.client.admin.keyring ceph.client.admin.keyring-old<br />
mkdir /etc/pve/priv/ceph/old<br />
mv /etc/pve/priv/ceph/*keyring /etc/pve/priv/ceph/old/<br />
</pre><br />
<br />
5- start ceph<br />
To start all daemons on a Ceph Node (irrespective of type), execute the following: <br />
** On all mon nodes **<br />
systemctl start ceph.target<br />
<br />
6- start vm's<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreichhttps://pve.proxmox.com/mediawiki/index.php?title=Ceph_Hammer_to_Jewel&diff=9952Ceph Hammer to Jewel2017-08-14T15:00:05Z<p>A.antreich: purge ceph not needed, logrotate error fixed. Metapackage needed for upgrade to luminous.</p>
<hr />
<div>== Introduction ==<br />
<br />
This HOWTO explains the upgrade from Ceph Hammer to Jewel (10.2.5 or higher).<br />
We strongly recommend that you update the cluster node by node.<br />
<br />
== Assumption ==<br />
In this HOWTO we assume that all nodes are on the very latest Proxmox VE 4.4 or higher version and Ceph is on Version Hammer.<br />
<br />
== Preparation ==<br />
Change the current Ceph repositories from Hammer to Jewel.<br />
<br />
sed -i 's/hammer/jewel/' /etc/apt/sources.list.d/ceph.list<br />
<br />
More information see<br />
[http://docs.ceph.com/docs/master/install/get-packages/ Ceph Packages]<br />
<br />
== Upgrade ==<br />
Upgrade the node with the following commands. <br />
apt-get update && apt-get dist-upgrade<br />
It will upgrade all repository's on your node.<br />
<br />
== Stop daemons ==<br />
To prevent a re-balance set the OSD to noout.<br />
This can be done on the GUI in the OSD Tab or with this command<br />
<br />
ceph osd set noout<br />
<br />
Kill all OSD on this node, this can be only done on the command line.<br />
<br />
killall ceph-osd<br />
<br />
Stop the Monitor on this node.<br />
To get the <UNIQUE ID> you can use the tab completion.<br />
<br />
systemctl stop ceph-mon.<MON-ID>.<UNIQUE ID>.service<br />
<br />
== Set permission ==<br />
Ceph use since Infernalis 'ceph' as user for the daemons and no more root.<br />
This increase the security but need a change of the permission on some directory's.<br />
<br />
chown ceph: -R /var/lib/ceph/<br />
chown :root -R /var/log/ceph/<br />
<br />
In the log dir root must still have access to rotate the logs.<br />
<br />
The following commands must be executed for every OSD on this node.<br />
OSD-ID is the number of the osd on this node and you can get it from the GUI or "ceph osd tree"<br />
<br />
readlink -f /var/lib/ceph/osd/ceph-<OSD-ID>/journal<br />
chown ceph: <output of the command before><br />
<br />
NOTE: If you have manually moved a journal in the past and did not set the partition type properly you need to fix that before you reboot<br /><br />
=== Set partition type ===<br />
udev looks for the GUID type 45b0969e-9b03-4f30-b4c6-b4b80ceff106 to set the permissions on startup.<br /><br />
These commands will output the data needed to check if the type is set properly:<br />
readlink -f /var/lib/ceph/osd/ceph-<OSD-ID>/journal<br />
blkid -o udev -p <output of the command before><br />
<br />
If you see this line in the output then the type is set properly, if not you need to fix it:<br />
ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106<br />
<br />
Open the block device containing the journal partition in gdisk<br />
gdisk /dev/<whatever your block device is><br />
<br />
Once in gdisk use the t command to change the partition type to 45b0969e-9b03-4f30-b4c6-b4b80ceff106 and save your changes.<br />
<br />
== Start the daemon ==<br />
To ensure that Ceph startup in the correct order you should do the following steps.<br />
<br />
cp /usr/share/doc/pve-manager/examples/ceph.service /etc/systemd/system/ceph.service<br />
systemctl daemon-reload<br />
systemctl enable ceph.service<br />
<br />
The first daemon which we start is the Monitor,<br />
but from now on we use systemd.<br />
<br />
systemctl start ceph-mon@<MON-ID>.service<br />
systemctl enable ceph-mon@<MON-ID>.service<br />
<br />
Then start all OSD's on this node<br />
<br />
systemctl start ceph-osd@<OSD-ID>.service<br />
<br />
After your node has successful starts the daemons, delete the 'noout' flag.<br />
On the GUI or by this command.<br />
<br />
ceph osd unset noout<br />
<br />
Now check if you Ceph cluster is healthy.<br />
ceph -s<br />
<br />
== Upgrade all nodes ==<br />
Now you must repeat this steps on the other nodes on your cluster, until all nodes are upgraded.<br />
Start over with [[Ceph_hammer_to_jewel#Preparation|Preparation]]<br />
<br />
== Set the tunables ==<br />
Now you get a warning that you ceph cluster has 'legacy tunables'<br />
<br />
ceph osd crush tunables hammer<br />
<br />
ceph osd set require_jewel_osds<br />
<br />
To get away this warning read carefully the following link and if you decide to set tunables read the text below before you apply this change. <br />
<br />
[http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#warning-when-tunables-are-non-optimal Ceph warning when tunables are non optimal] <br />
<br />
If you set the tunable and your Ceph cluster is used you should do this when the cluster has the least load.<br />
Also try to use the backfill option to do this process slower.<br />
<br />
[http://docs.ceph.com/docs/jewel/dev/osd_internals/backfill_reservation/ Ceph Backfill]<br />
<br />
'''Important'''<br />
Do not set tunable to '''optimal''' because then krbd will not work.<br />
Set it to hammer it you want to use krbd (needed for containers)<br />
<br />
== Prepare for post-Jewel upgrades ==<br />
After the upgrade to Jewel finished successfully, your should set the 'sortbitwise' flags on your OSDs. T[http://docs.ceph.com/docs/master/release-notes/#upgrading-from-jewel his is mandatoryfor upgrading to later releases.]<br />
<br />
ceph osd set sortbitwise<br />
<br />
<br />
<br />
== New installation of Ceph Server using Jewel ==<br />
See [[Ceph Server]]<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>A.antreich