Difference between revisions of "Upgrade from 6.x to 7.0"

From Proxmox VE
Jump to navigation Jump to search
(24 intermediate revisions by 6 users not shown)
Line 32: Line 32:
 
== Preconditions ==
 
== Preconditions ==
  
* Upgraded to the latest version of Proxmox VE 6.4
+
* Upgraded to the latest version of Proxmox VE 6.4 (check correct package repository configuration)
* Hyper-converged Ceph: upgrade the Ceph cluster to Ceph 15.2 Octopus '''before''' you start the Proxmox VE upgrade to 7.0. Follow the guide [[Ceph Nautilus to Octopus]]
+
* Hyper-converged Ceph: upgrade the Ceph Nautilus cluster to Ceph 15.2 Octopus '''before''' you start the Proxmox VE upgrade to 7.0. Follow the guide [[Ceph Nautilus to Octopus]]
* Co-installed Proxmox Backup Server: you must wait until Proxmox Backup Server 2.0 is released!
+
* Co-installed Proxmox Backup Server: see [https://pbs.proxmox.com/wiki/index.php/Upgrade_from_1.1_to_2.x the Proxmox Backup Server 1.1 to 2.x upgrade how-to]
* Reliable access to all configured storage
+
* Reliable access to the node. It's recommended to have access over an host independent channel like iKVM/IPMI or physical access.
 +
*: If only SSH is available we recommend testing the upgrade on a identical, but non-production machine first.
 
* A healthy cluster
 
* A healthy cluster
 
* Valid and tested backup of all VMs and CTs (in case something goes wrong)
 
* Valid and tested backup of all VMs and CTs (in case something goes wrong)
* Correct configuration of the repository
 
 
* At least 4 GiB free disk space on the root mount point.
 
* At least 4 GiB free disk space on the root mount point.
 
* Check [[#Known_upgrade_issues|known upgrade issues]]
 
* Check [[#Known_upgrade_issues|known upgrade issues]]
Line 50: Line 50:
 
The following actions need to be carried out from the command line of each Proxmox VE node in your cluster
 
The following actions need to be carried out from the command line of each Proxmox VE node in your cluster
  
Perform the actions via console or ssh; preferably via console to avoid interrupted ssh connections.  
+
'''Perform the actions via console or ssh; preferably via console to avoid interrupted ssh connections. Do not carry out the upgrade when connected via the virtual console offered by the GUI; as this will get interrupted during the upgrade.'''
Do not carry out the upgrade when connected via the virtual console offered by the GUI; as this will get interrupted during the upgrade.
 
  
 
Remember to ensure that a valid backup of all VMs and CTs has been created before proceeding.
 
Remember to ensure that a valid backup of all VMs and CTs has been created before proceeding.
Line 78: Line 77:
 
=== Check Linux Network Bridge MAC ===
 
=== Check Linux Network Bridge MAC ===
  
With Proxmox VE 7 the MAC address of the Linux bridge itself may change, as noted in [[Upgrade from 6.x to 7.0#Linux Bridge MAC-Address Change]].
+
With Proxmox VE 7, the MAC address of the Linux bridge itself may change, as noted in [[Upgrade from 6.x to 7.0#Linux Bridge MAC-Address Change]].
  
In hosted setups the MAC address of a host is often restricted to avoid spoofing by other hosts.
+
In hosted setups, the MAC address of a host is often restricted, to avoid spoofing by other hosts.
That means that you either need to tell your hosting provider the new (additional) bridge MAC address your Proxmox VE host is using, or you need to explicitly configure the bridge to keep using the old MAC address.
 
  
You can get the MAC address of all network devices with using the <code>ip -c link</code> command.
+
==== Solution A: Use ifupdown2 ====
Then edit your network config at <code>/etc/network/interfaces</code> to add a <code>hwaddress MAC</code> line to the respective bridge section.
 
  
For example, by default the main bridge is called <code>vmbr0</code>, so there the change would look like:
+
The ifupdown2 package, which Proxmox ships in the Proxmox VE 7.x repository, was adapted with a new policy configuration, so that it always derives the MAC address from the bridge port.
 +
 
 +
If you're already using ifupdown2 with Proxmox VE 6.4, and you upgrade to Proxmox VE 7.x, the ifupdown2 version <code>3.1.0-1+pmx1</code> (or newer) will ensure that you do not need to adapt anything else.
 +
 
 +
==== Solution B: Hardcode MAC Address ====
 +
 
 +
You can either tell your hosting provider the new (additional) bridge MAC address of your Proxmox VE host, or you need to explicitly configure the bridge to keep using the old MAC address.
 +
 
 +
You can get the MAC address of all network devices, using the command <code>ip -c link</code>.
 +
Then, edit your network configuration at <code>/etc/network/interfaces</code>, adding a <code>hwaddress MAC</code> line to the respective bridge section.
 +
 
 +
For example, by default, the main bridge is called <code>vmbr0</code>, so the change would look like:
  
 
  auto vmbr0
 
  auto vmbr0
Line 94: Line 102:
 
     # ... remaining options
 
     # ... remaining options
  
You can use <code>ifreload -a</code> to apply this change when ifupdown2 is used. For legacy ifupdown that tool is not available, so you either need to reboot or use <code>ifdown vmbr0; ifup vmbr0</code> (enter both semi-colon separated commands in one go!).
+
If ifupdown2 is installed, you can use <code>ifreload -a</code> to apply this change. For the legacy ifupdown, <code>ifreload</code> is not available, so you either need to reboot or use <code>ifdown vmbr0; ifup vmbr0</code> (enter both semi-colon separated commands in one go!).
 +
 
 +
Note, hard-coding the MAC requires manual adaptions, if you ever change your physical NIC.
  
 
=== Update the configured APT repositories ===
 
=== Update the configured APT repositories ===
Line 137: Line 147:
 
Start with this step, to get the initial set of upgraded packages:
 
Start with this step, to get the initial set of upgraded packages:
  
  apt dist-upgrade
+
apt dist-upgrade
  
 
During the above step, you may be asked to approve some new packages, that want to replace certain configuration files. These are not relevant to the Proxmox VE upgrade, so you can choose what's most appropriate for your setup.
 
During the above step, you may be asked to approve some new packages, that want to replace certain configuration files. These are not relevant to the Proxmox VE upgrade, so you can choose what's most appropriate for your setup.
Line 152: Line 162:
  
 
Now you can upgrade the Ceph cluster to the Pacific release, following the article [[Ceph Octopus to Pacific]].
 
Now you can upgrade the Ceph cluster to the Pacific release, following the article [[Ceph Octopus to Pacific]].
Note that while an upgrade is recommended, it's not strictly necessary. Ceph Octopus will be supported in Proxmox VE 7.x, until it's end-of-life circa end of 2022/Q2.
+
Note that while an upgrade is recommended, it's not strictly necessary. Ceph Octopus will be supported until its end-of-life (circa end of 2022/Q2) in Proxmox VE 7.x,
  
 
== Checklist issues ==
 
== Checklist issues ==
Line 190: Line 200:
 
If you used the official Proxmox VE or Debian installer, and you didn't remove the password after the installation, you are safe.
 
If you used the official Proxmox VE or Debian installer, and you didn't remove the password after the installation, you are safe.
  
 +
<hr>
 
=== Third-party Storage Plugins ===
 
=== Third-party Storage Plugins ===
  
Line 196: Line 207:
  
 
If you use any external storage plugin you need to wait until the plugin author adapted it for Proxmox VE 7.0.
 
If you use any external storage plugin you need to wait until the plugin author adapted it for Proxmox VE 7.0.
 +
 +
<hr>
 +
=== Older Hardware and New 5.15 Kernel ===
 +
 +
==== KVM: entry failed, hardware error 0x80000021 ====
 +
 +
===== Recommended Fix =====
 +
 +
Update to the 5.15 based kernel package <code>pve-kernel-5.15.39-3-pve</code> with version <code>5.15.39-3</code>, or newer, which contain fixes for the underlying issue.
 +
The package is available on all repositories.
 +
 +
===== Background =====
 +
 +
'''Note''': This issue was resolved, see above for the kernel version with the recommended fix you should upgrade.
 +
 +
With the 5.15 kernel, the two-dimensional paging (TDP) memory management unit (MMU) implementation got activated by default.
 +
The new implementation reduces the complexity of mapping the guest OS virtual memory address to the host's physical memory address and improves performance, especially during live migrations for VMs with a lot of memory and many CPU cores.
 +
However, the new TDP MMU feature has been shown to cause regressions on some (mostly) older hardware, likely due to assumptions about when the fallback is required not being met by that HW.
 +
 +
The problem manifests as crashes of the machine with a kernel (<code>dmesg</code>) or <code>journalctl</code> log entry with, among others, a line like this:
 +
 +
<code>KVM: entry failed, hardware error 0x80000021</code>
 +
 +
Normally there's also an assert error message logged from the QEMU process around the same time.
 +
Windows VMs are the most commonly affected in the user reports.
 +
 +
The affected models could not get pinpointed exactly, but it seems CPUs launched over 8 years ago are most likely triggering the issue.
 +
Note that there are known cases where updating to the latest available firmware (BIOS/EFI) and CPU microcode fixed the regression.
 +
Thus, before trying the workaround below, we recommend ensuring that you have the latest firmware [https://wiki.debian.org/Microcode#Debian_11_.22Bullseye.22_.28stable.29 and CPU microcode installed].
 +
 +
===== Old Workaround: Disable tdp_mmu =====
 +
 +
'''Note''': This should not be necessary anymore, see above for the kernel version with the recommended fix you should upgrade.
 +
 +
The <code>tdp_mmu</code> kvm module option can be used to force disabling the usage of the two-dimensional paging (TDP) MMU.
 +
 +
* You can either add that parameter to the PVE host's kernel command line as <code>kvm.tdp_mmu=N</code>, see [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline this reference documentation section].
 +
* Alternatively, set the module option using a modprobe config, for example:
 +
: <code>echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf</code>
 +
 +
To finish applying the workaround, '''always run''' <code>update-initramfs -k all -u</code> to update the initramfs for all kernels and '''then reboot''' the Proxmox VE host.
 +
 +
You can confirm that the change is active by checking that the output of<code>cat /sys/module/kvm/parameters/tdp_mmu</code> is <code>N</code>.
 +
 +
<hr>
  
 
=== Network ===
 
=== Network ===
Line 214: Line 270:
 
Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding <code>hwaddress</code> to the respective entry in <code>/etc/network/interfaces</code>.
 
Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding <code>hwaddress</code> to the respective entry in <code>/etc/network/interfaces</code>.
  
=== Older Virtual Machines with Windows and Static Network ===
+
==== Older Virtual Machines with Windows and Static Network ====
  
 
Since QEMU 5.2, first introduced in Proxmox VE 6.4, the way QEMU sets the ACPI-ID for PCI devices changed to conform to standards. This led to some Windows guests loosing their device configuration, as they detect the re-ordered devices as new ones.
 
Since QEMU 5.2, first introduced in Proxmox VE 6.4, the way QEMU sets the ACPI-ID for PCI devices changed to conform to standards. This led to some Windows guests loosing their device configuration, as they detect the re-ordered devices as new ones.
Line 223: Line 279:
 
Note that if you have already upgraded to Proxmox VE 6.4, your system has implemented this change already, so you can ignore it.
 
Note that if you have already upgraded to Proxmox VE 6.4, your system has implemented this change already, so you can ignore it.
  
=== Old Container and CGroupv2 ===
+
==== Network Interface Name Change ====
 +
 
 +
Due to the new kernel recognizing more features of some hardware, like for example virtual functions, and interface naming often derives from the PCI(e) address, some NICs may change their name, in which case the network configuration needs to be adapted.
 +
 
 +
NIC name changes have been observed at least on the following hardware:
 +
* High-speed Mellanox models.
 +
*: For example, due to newly supported functions, a change from <code>enp33s0f0</code> to <code>enp33s0f0np0</code> could occur.
 +
* Broadcom BCM57412 NetXtreme-E 10Gb RDMA Ethernet.
 +
*: For example <code>ens2f0np0</code> could change to <code>enp101s0f0np0</code>
 +
 
 +
In general, it's recommended to either have an independent remote connection to the Proxmox VE's host console, for example, through IPMI or iKVM, or physical access for managing the server even when its own network doesn't comes up after a major upgrade or network change.
 +
 
 +
<hr>
 +
 
 +
=== CGroupV2 ===
 +
 
 +
==== Old Container and CGroupv2 ====
  
 
Since Proxmox VE 7.0, the default is a pure cgroupv2 environment.
 
Since Proxmox VE 7.0, the default is a pure cgroupv2 environment.
Line 232: Line 304:
 
CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, for details and possible fixes see:
 
CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, for details and possible fixes see:
 
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat
 
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat
 +
 +
==== Container HW Pass-Through & CGroupv2 ====
 +
 +
Proxmox VE 7.0 defaults to the pure cgroupv2 environment, as v1 will be slowly sunset in systemd and other tooling.
 +
And with that some LXC config keys needs a slightly different syntax, for example for hardware pass-through you need to use <code>lxc.cgroup2.devices.allow</code> (instead of the old <code>lxc.cgroup.devices.allow</code>, note the missing <code>2</code>)
  
 
== Troubleshooting ==
 
== Troubleshooting ==
Line 270: Line 347:
 
[https://www.debian.org/releases/bullseye/amd64/release-notes/ Release Notes for Debian 11.0 (bullseye), 64-bit PC]
 
[https://www.debian.org/releases/bullseye/amd64/release-notes/ Release Notes for Debian 11.0 (bullseye), 64-bit PC]
  
[[Category: HOWTO]][[Category: Installation]]
+
[[Category: HOWTO]][[Category: Installation]][[Category: Upgrade]]

Revision as of 06:31, 9 September 2022

Introduction

Proxmox VE 7.x introduces several new major features. You should plan the upgrade carefully, make and verify backups before beginning, and test extensively. Depending on the existing configuration, several manual steps—including some downtime—may be required.

Note: A valid and tested backup is always required before starting the upgrade process. Test the backup beforehand in a test lab setup.

In case the system is customized and/or uses additional packages or any other third party repositories/packages, ensure those packages are also upgraded to and compatible with Debian Bullseye.

In general, there are two ways to upgrade a Proxmox VE 6.x system to Proxmox VE 7.x:

  • A new installation on new hardware (restoring VMs from the backup)
  • An in-place upgrade via apt (step-by-step)

In both cases, emptying the browser cache and reloading the GUI are required after the upgrade.

New installation

  • Backup all VMs and containers to an external storage (see Backup and Restore).
  • Backup all files in /etc (required: files in /etc/pve, as well as /etc/passwd, /etc/network/interfaces, /etc/resolv.conf, and anything that deviates from a default installation).
  • Install Proxmox VE 7.x from the ISO (this will delete all data on the existing host).
  • Rebuild your cluster, if applicable.
  • Restore the file /etc/pve/storage.cfg (this will make the external storage used for backup available).
  • Restore firewall configs /etc/pve/firewall/ and /etc/pve/nodes/<node>/host.fw (if applicable).
  • Restore all VMs from backups (see Backup and Restore).

Administrators comfortable with the command line can follow the procedure Bypassing backup and restore when upgrading, if all VMs/CTs are on a single shared storage.

In-place upgrade

In-place upgrades are carried out via apt. Familiarity with apt is required to proceed with this upgrade method.

Preconditions

  • Upgraded to the latest version of Proxmox VE 6.4 (check correct package repository configuration)
  • Hyper-converged Ceph: upgrade the Ceph Nautilus cluster to Ceph 15.2 Octopus before you start the Proxmox VE upgrade to 7.0. Follow the guide Ceph Nautilus to Octopus
  • Co-installed Proxmox Backup Server: see the Proxmox Backup Server 1.1 to 2.x upgrade how-to
  • Reliable access to the node. It's recommended to have access over an host independent channel like iKVM/IPMI or physical access.
    If only SSH is available we recommend testing the upgrade on a identical, but non-production machine first.
  • A healthy cluster
  • Valid and tested backup of all VMs and CTs (in case something goes wrong)
  • At least 4 GiB free disk space on the root mount point.
  • Check known upgrade issues

Testing the Upgrade

An upgrade test can be easily performed using a standalone server. Install the Proxmox VE 6.4 ISO on some test hardware, then upgrade this installation to the latest minor version of Proxmox VE 6.4 (see Package repositories). To replicate the production setup as closely as possible, copy or create all relevant configurations to the test machine, then start the upgrade. It is also possible to install Proxmox VE 6.4 in a VM and test the upgrade in this environment.

Actions step-by-step

The following actions need to be carried out from the command line of each Proxmox VE node in your cluster

Perform the actions via console or ssh; preferably via console to avoid interrupted ssh connections. Do not carry out the upgrade when connected via the virtual console offered by the GUI; as this will get interrupted during the upgrade.

Remember to ensure that a valid backup of all VMs and CTs has been created before proceeding.

Continuously use the pve6to7 checklist script

A small checklist program named pve6to7 is included in the latest Proxmox VE 6.4 packages. The program will provide hints and warnings about potential issues before, during and after the upgrade process. You can call it by executing:

 pve6to7

To run it with all checks enabled, execute:

 pve6to7 --full

Make sure to run the full checks at least once before the upgrade.

This script only checks and reports things. By default, no changes to the system are made and thus, none of the issues will be automatically fixed. You should keep in mind that Proxmox VE can be heavily customized, so the script may not recognize all the possible problems with a particular setup!

It is recommended to re-run the script after each attempt to fix an issue. This ensures that the actions taken actually fixed the respective warning.

Move important Virtual Machines and Containers

If any VMs and CTs need to keep running for the duration of the upgrade, migrate them away from the node that is being upgraded. A migration of a VM or CT from an older version of Proxmox VE to a newer version will always work. A migration from a newer Proxmox VE version to an older version may work, but is generally not supported. Keep this in mind when planning your cluster upgrade.

Check Linux Network Bridge MAC

With Proxmox VE 7, the MAC address of the Linux bridge itself may change, as noted in Upgrade from 6.x to 7.0#Linux Bridge MAC-Address Change.

In hosted setups, the MAC address of a host is often restricted, to avoid spoofing by other hosts.

Solution A: Use ifupdown2

The ifupdown2 package, which Proxmox ships in the Proxmox VE 7.x repository, was adapted with a new policy configuration, so that it always derives the MAC address from the bridge port.

If you're already using ifupdown2 with Proxmox VE 6.4, and you upgrade to Proxmox VE 7.x, the ifupdown2 version 3.1.0-1+pmx1 (or newer) will ensure that you do not need to adapt anything else.

Solution B: Hardcode MAC Address

You can either tell your hosting provider the new (additional) bridge MAC address of your Proxmox VE host, or you need to explicitly configure the bridge to keep using the old MAC address.

You can get the MAC address of all network devices, using the command ip -c link. Then, edit your network configuration at /etc/network/interfaces, adding a hwaddress MAC line to the respective bridge section.

For example, by default, the main bridge is called vmbr0, so the change would look like:

auto vmbr0
iface vmbr0 inet static
    address 192.168.X.Y/24
    hwaddress aa:bb:cc:12:34
    # ... remaining options

If ifupdown2 is installed, you can use ifreload -a to apply this change. For the legacy ifupdown, ifreload is not available, so you either need to reboot or use ifdown vmbr0; ifup vmbr0 (enter both semi-colon separated commands in one go!).

Note, hard-coding the MAC requires manual adaptions, if you ever change your physical NIC.

Update the configured APT repositories

First, make sure that the system is using the latest Proxmox VE 6.4 packages:

apt update
apt dist-upgrade

Update all Debian repository entries to Bullseye.

sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list

Note that Debian changed its security update repo from deb http://security.debian.org buster/updates main to deb http://security.debian.org bullseye-security main for the sake of consistency. The above command accounts for that change already.

You must also disable all Proxmox VE 6.x repositories, including the pve-enterprise repository, the pve-no-subscription repository and the pvetest repository. Use the # symbol to comment out these repositories in the /etc/apt/sources.list.d/pve-enterprise.list and /etc/apt/sources.list files. See Package_Repositories

Add the Proxmox VE 7 Package Repository

echo "deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise" > /etc/apt/sources.list.d/pve-enterprise.list

For the no-subscription repository, see Package Repositories. Rather than commenting out/removing the PVE 6.x repositories, as was previously mentioned, you could also run the following command to update to the Proxmox VE 7 repositories:

sed -i -e 's/buster/bullseye/g' /etc/apt/sources.list.d/pve-install-repo.list 

(Ceph only) Replace ceph.com repositories with proxmox.com ceph repositories

echo "deb http://download.proxmox.com/debian/ceph-octopus bullseye main" > /etc/apt/sources.list.d/ceph.list

If there is a backports line, remove it - the upgrade has not been tested with packages from the backports repository installed.

Update the repositories' data:

apt update

Upgrade the system to Debian Bullseye and Proxmox VE 7.0

Note that the time required for finishing this step heavily depends on the system's performance, especially the root filesystem's IOPS and bandwidth. A slow spinner can take up to 60 minutes or more, while for a high-performance server with SSD storage, the dist-upgrade can be finished in 5 minutes.

Start with this step, to get the initial set of upgraded packages:

apt dist-upgrade

During the above step, you may be asked to approve some new packages, that want to replace certain configuration files. These are not relevant to the Proxmox VE upgrade, so you can choose what's most appropriate for your setup.

If the command exits successfully, you can reboot the system in order to use the new PVE kernel.

After the Proxmox VE upgrade

For Clusters

  • Check that all nodes are up and running on the latest package versions.

For Hyper-converged Ceph

Now you can upgrade the Ceph cluster to the Pacific release, following the article Ceph Octopus to Pacific. Note that while an upgrade is recommended, it's not strictly necessary. Ceph Octopus will be supported until its end-of-life (circa end of 2022/Q2) in Proxmox VE 7.x,

Checklist issues

proxmox-ve package is too old

Check the configured package repository entries; they still need to be for Proxmox VE 6.x and buster at this step (see Package_Repositories). Then run

apt update

followed by

apt dist-upgrade

to get the latest PVE 6.x packages before upgrading to PVE 7.x

Known upgrade issues

General

As a Debian based distribution, Proxmox VE is affected by most issues and changes affecting Debian. Thus, ensure that you read the upgrade specific issues for bullseye

Please also check the known issue list from the Proxmox VE 7.0 changelog: https://pve.proxmox.com/wiki/Roadmap#7.0-known-issues

Upgrade wants to remove package 'proxmox-ve'

If you have installed Proxmox VE on top of Debian Buster, you may have installed the package 'linux-image-amd64', which conflicts with current 6.x setups. To solve this, you have to remove this package with

apt remove linux-image-amd64

before the dist-upgrade.

No 'root' password set

The root account must have a password set (that you remember). If not, the sudo package will be uninstalled during the upgrade, and so you will not be able to log in again as root.

If you used the official Proxmox VE or Debian installer, and you didn't remove the password after the installation, you are safe.


Third-party Storage Plugins

The external, third-party storage plugin mechanism had a ABI-version bump that reset the ABI-age. This means there was an incompatible breaking change, that external plugins must adapt before being able to get loaded again.

If you use any external storage plugin you need to wait until the plugin author adapted it for Proxmox VE 7.0.


Older Hardware and New 5.15 Kernel

KVM: entry failed, hardware error 0x80000021

Recommended Fix

Update to the 5.15 based kernel package pve-kernel-5.15.39-3-pve with version 5.15.39-3, or newer, which contain fixes for the underlying issue. The package is available on all repositories.

Background

Note: This issue was resolved, see above for the kernel version with the recommended fix you should upgrade.

With the 5.15 kernel, the two-dimensional paging (TDP) memory management unit (MMU) implementation got activated by default. The new implementation reduces the complexity of mapping the guest OS virtual memory address to the host's physical memory address and improves performance, especially during live migrations for VMs with a lot of memory and many CPU cores. However, the new TDP MMU feature has been shown to cause regressions on some (mostly) older hardware, likely due to assumptions about when the fallback is required not being met by that HW.

The problem manifests as crashes of the machine with a kernel (dmesg) or journalctl log entry with, among others, a line like this:

KVM: entry failed, hardware error 0x80000021

Normally there's also an assert error message logged from the QEMU process around the same time. Windows VMs are the most commonly affected in the user reports.

The affected models could not get pinpointed exactly, but it seems CPUs launched over 8 years ago are most likely triggering the issue. Note that there are known cases where updating to the latest available firmware (BIOS/EFI) and CPU microcode fixed the regression. Thus, before trying the workaround below, we recommend ensuring that you have the latest firmware and CPU microcode installed.

Old Workaround: Disable tdp_mmu

Note: This should not be necessary anymore, see above for the kernel version with the recommended fix you should upgrade.

The tdp_mmu kvm module option can be used to force disabling the usage of the two-dimensional paging (TDP) MMU.

  • You can either add that parameter to the PVE host's kernel command line as kvm.tdp_mmu=N, see this reference documentation section.
  • Alternatively, set the module option using a modprobe config, for example:
echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf

To finish applying the workaround, always run update-initramfs -k all -u to update the initramfs for all kernels and then reboot the Proxmox VE host.

You can confirm that the change is active by checking that the output ofcat /sys/module/kvm/parameters/tdp_mmu is N.


Network

Linux Bridge MAC-Address Change

With Proxmox VE 7 / Debian Bullseye, a new systemd version is used, that changes how the MAC addresses of Linux network bridge devices are calculated:

MACAddressPolicy=persistent was extended to set MAC addresses based on the device name. Previously addresses were only based on the ID_NET_NAME_* attributes, which meant that interface names would never be generated for virtual devices. Now a persistent address will be generated for most devices, including in particular bridges.

-- https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html#v241

A unique and persistent MAC address is now calculated using the bridge name and the unique machine-id (/etc/machine-id), which is generated at install time.

Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding hwaddress to the respective entry in /etc/network/interfaces.

Older Virtual Machines with Windows and Static Network

Since QEMU 5.2, first introduced in Proxmox VE 6.4, the way QEMU sets the ACPI-ID for PCI devices changed to conform to standards. This led to some Windows guests loosing their device configuration, as they detect the re-ordered devices as new ones.

Due to this Proxmox VE will now pin the machine-version for windows-based guests to the newest available on guest creation, or the minimum of (5.2, latest-available) for existing one. You can also easily change the machine-version through the web-interface now. See this forum thread for further information.

Note that if you have already upgraded to Proxmox VE 6.4, your system has implemented this change already, so you can ignore it.

Network Interface Name Change

Due to the new kernel recognizing more features of some hardware, like for example virtual functions, and interface naming often derives from the PCI(e) address, some NICs may change their name, in which case the network configuration needs to be adapted.

NIC name changes have been observed at least on the following hardware:

  • High-speed Mellanox models.
    For example, due to newly supported functions, a change from enp33s0f0 to enp33s0f0np0 could occur.
  • Broadcom BCM57412 NetXtreme-E 10Gb RDMA Ethernet.
    For example ens2f0np0 could change to enp101s0f0np0

In general, it's recommended to either have an independent remote connection to the Proxmox VE's host console, for example, through IPMI or iKVM, or physical access for managing the server even when its own network doesn't comes up after a major upgrade or network change.


CGroupV2

Old Container and CGroupv2

Since Proxmox VE 7.0, the default is a pure cgroupv2 environment. Previously a "hybrid" setup was used, where resource control was mainly done in cgroupv1 with an additional cgroupv2 controller which could take over some subsystems via the cgroup_no_v1 kernel command line parameter. (See the kernel parameter documentation for details.)

cgroupv2 support by the container’s OS is needed to run in a pure cgroupv2 environment. Containers running systemd version 231 (released in 2016) or newer support cgroupv2, as do containers that do not use systemd as init system in the first place (e.g., Alpine Linux or Devuan).

CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, for details and possible fixes see: https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat

Container HW Pass-Through & CGroupv2

Proxmox VE 7.0 defaults to the pure cgroupv2 environment, as v1 will be slowly sunset in systemd and other tooling. And with that some LXC config keys needs a slightly different syntax, for example for hardware pass-through you need to use lxc.cgroup2.devices.allow (instead of the old lxc.cgroup.devices.allow, note the missing 2)

Troubleshooting

Failing upgrade to "bullseye"

Make sure that the repository configuration for Bullseye is correct.

If there was a network failure and the upgrade was only partially completed, try to repair the situation with

apt -f install

If you see the following message:

W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!

then one or more of the currently existing packages cannot be upgraded since the proper Bullseye repository is not configured.

Check which of the previously used repositories (i.e. for Buster) do not exist for Bullseye or have not been upgraded to Bullseye ones.

If a corresponding Bullseye repository exists, upgrade the configuration (see also special remark for Ceph).

If an upgrade is not possible, configure all repositories as they were before the upgrade attempt, then run:

apt update

again. Then remove all packages which are currently installed from that repository. Following this, start the upgrade procedure again.

Unable to boot due to grub failure

See Recover From Grub Failure

If your system was installed on ZFS using legacy BIOS boot before the Proxmox VE 6.4 ISO, incompatibilities between the ZFS implementation in grub and newer ZFS versions can lead to a broken boot. Check the article on switching to proxmox-boot-tool ZFS: Switch Legacy-Boot to Proxmox Boot Tool for more details.

External links

Release Notes for Debian 11.0 (bullseye), 64-bit PC