Upgrade from 6.x to 7.0

From Proxmox VE
Revision as of 13:48, 5 July 2022 by Thomas Lamprecht (talk | contribs) (→‎Known Issues: Add workaround for KVM: entry failed, hardware error 0x80000021)
Jump to navigation Jump to search

Introduction

Proxmox VE 7.x introduces several new major features. You should plan the upgrade carefully, make and verify backups before beginning, and test extensively. Depending on the existing configuration, several manual steps—including some downtime—may be required.

Note: A valid and tested backup is always required before starting the upgrade process. Test the backup beforehand in a test lab setup.

In case the system is customized and/or uses additional packages or any other third party repositories/packages, ensure those packages are also upgraded to and compatible with Debian Bullseye.

In general, there are two ways to upgrade a Proxmox VE 6.x system to Proxmox VE 7.x:

  • A new installation on new hardware (restoring VMs from the backup)
  • An in-place upgrade via apt (step-by-step)

In both cases, emptying the browser cache and reloading the GUI are required after the upgrade.

New installation

  • Backup all VMs and containers to an external storage (see Backup and Restore).
  • Backup all files in /etc (required: files in /etc/pve, as well as /etc/passwd, /etc/network/interfaces, /etc/resolv.conf, and anything that deviates from a default installation).
  • Install Proxmox VE 7.x from the ISO (this will delete all data on the existing host).
  • Rebuild your cluster, if applicable.
  • Restore the file /etc/pve/storage.cfg (this will make the external storage used for backup available).
  • Restore firewall configs /etc/pve/firewall/ and /etc/pve/nodes/<node>/host.fw (if applicable).
  • Restore all VMs from backups (see Backup and Restore).

Administrators comfortable with the command line can follow the procedure Bypassing backup and restore when upgrading, if all VMs/CTs are on a single shared storage.

In-place upgrade

In-place upgrades are carried out via apt. Familiarity with apt is required to proceed with this upgrade method.

Preconditions

  • Upgraded to the latest version of Proxmox VE 6.4 (check correct package repository configuration)
  • Hyper-converged Ceph: upgrade the Ceph Nautilus cluster to Ceph 15.2 Octopus before you start the Proxmox VE upgrade to 7.0. Follow the guide Ceph Nautilus to Octopus
  • Co-installed Proxmox Backup Server: see the Proxmox Backup Server 1.1 to 2.x upgrade how-to
  • Reliable access to the node (through ssh, iKVM/IPMI or physical access)
  • A healthy cluster
  • Valid and tested backup of all VMs and CTs (in case something goes wrong)
  • At least 4 GiB free disk space on the root mount point.
  • Check known upgrade issues

Testing the Upgrade

An upgrade test can be easily performed using a standalone server. Install the Proxmox VE 6.4 ISO on some test hardware, then upgrade this installation to the latest minor version of Proxmox VE 6.4 (see Package repositories). To replicate the production setup as closely as possible, copy or create all relevant configurations to the test machine, then start the upgrade. It is also possible to install Proxmox VE 6.4 in a VM and test the upgrade in this environment.

Actions step-by-step

The following actions need to be carried out from the command line of each Proxmox VE node in your cluster

Perform the actions via console or ssh; preferably via console to avoid interrupted ssh connections. Do not carry out the upgrade when connected via the virtual console offered by the GUI; as this will get interrupted during the upgrade.

Remember to ensure that a valid backup of all VMs and CTs has been created before proceeding.

Continuously use the pve6to7 checklist script

A small checklist program named pve6to7 is included in the latest Proxmox VE 6.4 packages. The program will provide hints and warnings about potential issues before, during and after the upgrade process. You can call it by executing:

 pve6to7

To run it with all checks enabled, execute:

 pve6to7 --full

Make sure to run the full checks at least once before the upgrade.

This script only checks and reports things. By default, no changes to the system are made and thus, none of the issues will be automatically fixed. You should keep in mind that Proxmox VE can be heavily customized, so the script may not recognize all the possible problems with a particular setup!

It is recommended to re-run the script after each attempt to fix an issue. This ensures that the actions taken actually fixed the respective warning.

Move important Virtual Machines and Containers

If any VMs and CTs need to keep running for the duration of the upgrade, migrate them away from the node that is being upgraded. A migration of a VM or CT from an older version of Proxmox VE to a newer version will always work. A migration from a newer Proxmox VE version to an older version may work, but is generally not supported. Keep this in mind when planning your cluster upgrade.

Check Linux Network Bridge MAC

With Proxmox VE 7, the MAC address of the Linux bridge itself may change, as noted in Upgrade from 6.x to 7.0#Linux Bridge MAC-Address Change.

In hosted setups, the MAC address of a host is often restricted, to avoid spoofing by other hosts.

Solution A: Use ifupdown2

The ifupdown2 package, which Proxmox ships in the Proxmox VE 7.x repository, was adapted with a new policy configuration, so that it always derives the MAC address from the bridge port.

If you're already using ifupdown2 with Proxmox VE 6.4, and you upgrade to Proxmox VE 7.x, the ifupdown2 version 3.1.0-1+pmx1 (or newer) will ensure that you do not need to adapt anything else.

Solution B: Hardcode MAC Address

You can either tell your hosting provider the new (additional) bridge MAC address of your Proxmox VE host, or you need to explicitly configure the bridge to keep using the old MAC address.

You can get the MAC address of all network devices, using the command ip -c link. Then, edit your network configuration at /etc/network/interfaces, adding a hwaddress MAC line to the respective bridge section.

For example, by default, the main bridge is called vmbr0, so the change would look like:

auto vmbr0
iface vmbr0 inet static
    address 192.168.X.Y/24
    hwaddress aa:bb:cc:12:34
    # ... remaining options

If ifupdown2 is installed, you can use ifreload -a to apply this change. For the legacy ifupdown, ifreload is not available, so you either need to reboot or use ifdown vmbr0; ifup vmbr0 (enter both semi-colon separated commands in one go!).

Note, hard-coding the MAC requires manual adaptions, if you ever change your physical NIC.

Update the configured APT repositories

First, make sure that the system is using the latest Proxmox VE 6.4 packages:

apt update
apt dist-upgrade

Update all Debian repository entries to Bullseye.

sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list

Note that Debian changed its security update repo from deb http://security.debian.org buster/updates main to deb http://security.debian.org bullseye-security main for the sake of consistency. The above command accounts for that change already.

You must also disable all Proxmox VE 6.x repositories, including the pve-enterprise repository, the pve-no-subscription repository and the pvetest repository. Use the # symbol to comment out these repositories in the /etc/apt/sources.list.d/pve-enterprise.list and /etc/apt/sources.list files. See Package_Repositories

Add the Proxmox VE 7 Package Repository

echo "deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise" > /etc/apt/sources.list.d/pve-enterprise.list

For the no-subscription repository, see Package Repositories. Rather than commenting out/removing the PVE 6.x repositories, as was previously mentioned, you could also run the following command to update to the Proxmox VE 7 repositories:

sed -i -e 's/buster/bullseye/g' /etc/apt/sources.list.d/pve-install-repo.list 

(Ceph only) Replace ceph.com repositories with proxmox.com ceph repositories

echo "deb http://download.proxmox.com/debian/ceph-octopus bullseye main" > /etc/apt/sources.list.d/ceph.list

If there is a backports line, remove it - the upgrade has not been tested with packages from the backports repository installed.

Update the repositories' data:

apt update

Upgrade the system to Debian Bullseye and Proxmox VE 7.0

Note that the time required for finishing this step heavily depends on the system's performance, especially the root filesystem's IOPS and bandwidth. A slow spinner can take up to 60 minutes or more, while for a high-performance server with SSD storage, the dist-upgrade can be finished in 5 minutes.

Start with this step, to get the initial set of upgraded packages:

apt dist-upgrade

During the above step, you may be asked to approve some new packages, that want to replace certain configuration files. These are not relevant to the Proxmox VE upgrade, so you can choose what's most appropriate for your setup.

If the command exits successfully, you can reboot the system in order to use the new PVE kernel.

After the Proxmox VE upgrade

For Clusters

  • Check that all nodes are up and running on the latest package versions.

For Hyper-converged Ceph

Now you can upgrade the Ceph cluster to the Pacific release, following the article Ceph Octopus to Pacific. Note that while an upgrade is recommended, it's not strictly necessary. Ceph Octopus will be supported until its end-of-life (circa end of 2022/Q2) in Proxmox VE 7.x,

Checklist issues

proxmox-ve package is too old

Check the configured package repository entries; they still need to be for Proxmox VE 6.x and buster at this step (see Package_Repositories). Then run

apt update

followed by

apt dist-upgrade

to get the latest PVE 6.x packages before upgrading to PVE 7.x

Known upgrade issues

General

As a Debian based distribution, Proxmox VE is affected by most issues and changes affecting Debian. Thus, ensure that you read the upgrade specific issues for bullseye

Please also check the known issue list from the Proxmox VE 7.0 changelog: https://pve.proxmox.com/wiki/Roadmap#7.0-known-issues

Upgrade wants to remove package 'proxmox-ve'

If you have installed Proxmox VE on top of Debian Buster, you may have installed the package 'linux-image-amd64', which conflicts with current 6.x setups. To solve this, you have to remove this package with

apt remove linux-image-amd64

before the dist-upgrade.

No 'root' password set

The root account must have a password set (that you remember). If not, the sudo package will be uninstalled during the upgrade, and so you will not be able to log in again as root.

If you used the official Proxmox VE or Debian installer, and you didn't remove the password after the installation, you are safe.

Third-party Storage Plugins

The external, third-party storage plugin mechanism had a ABI-version bump that reset the ABI-age. This means there was an incompatible breaking change, that external plugins must adapt before being able to get loaded again.

If you use any external storage plugin you need to wait until the plugin author adapted it for Proxmox VE 7.0.

Older Hardware and New 5.15 Kernel

KVM: entry failed, hardware error 0x80000021

Background

With the 5.15 kernel, the two-dimensional paging (TDP) memory management unit (MMU) implementation got activated by default. While this new implementation reduces complexity with the guest's OS virtual memory address mapping to host's physical memory address, and thus improves performance especially with big VMs (huge memory amount and core count) and especially during live migration, it seems that it can regress on older hardware.

The problems show as crash from the machine with a kernel (dmesg) or journalctl log entry with, among others, a line like:

KVM: entry failed, hardware error 0x80000021

Normally there's also an assert error message logged from the QEMU process around the same time.

We could not exactly pin point the affected models, but it seems CPUs launched over 8 years ago seem most likely to trigger the issue, even more so if they do not have the latest available firmware (BIOS/EFI) and CPU micro code updates applied.

Thus, before trying below workaround, we recommend ensuring that you have the latest firmware applied and CPU microcode setup.

Workaround: Disable tdp_mmu

The tdp_mmu kvm module option can be used to force disabling the usage of the two-dimensional paging (TDP) MMU.

  • You can either add that parameter to the PVE host's kernel command line as kvm.tdp_mmu=N, see this reference documentation section.
  • Alternatively, set the module option using a modprobe config, for example:
echo "options kvm tdp_mmu=N" >/etc/modprobe.d/kvm-disable-tdp-mmu.conf

To finish applying the workaround, always run update-initramfs -k all -u to apply any changes to the initramfs and then reboot the Proxmox VE host.

You can confirm that the change is active by checking that the output ofcat /sys/module/kvm/parameters/tdp_mmu is N

Network

Linux Bridge MAC-Address Change

With Proxmox VE 7 / Debian Bullseye, a new systemd version is used, that changes how the MAC addresses of Linux network bridge devices are calculated:

MACAddressPolicy=persistent was extended to set MAC addresses based on the device name. Previously addresses were only based on the ID_NET_NAME_* attributes, which meant that interface names would never be generated for virtual devices. Now a persistent address will be generated for most devices, including in particular bridges.

-- https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html#v241

A unique and persistent MAC address is now calculated using the bridge name and the unique machine-id (/etc/machine-id), which is generated at install time.

Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding hwaddress to the respective entry in /etc/network/interfaces.

Older Virtual Machines with Windows and Static Network

Since QEMU 5.2, first introduced in Proxmox VE 6.4, the way QEMU sets the ACPI-ID for PCI devices changed to conform to standards. This led to some Windows guests loosing their device configuration, as they detect the re-ordered devices as new ones.

Due to this Proxmox VE will now pin the machine-version for windows-based guests to the newest available on guest creation, or the minimum of (5.2, latest-available) for existing one. You can also easily change the machine-version through the web-interface now. See this forum thread for further information.

Note that if you have already upgraded to Proxmox VE 6.4, your system has implemented this change already, so you can ignore it.

CGroupV2

Old Container and CGroupv2

Since Proxmox VE 7.0, the default is a pure cgroupv2 environment. Previously a "hybrid" setup was used, where resource control was mainly done in cgroupv1 with an additional cgroupv2 controller which could take over some subsystems via the cgroup_no_v1 kernel command line parameter. (See the kernel parameter documentation for details.)

cgroupv2 support by the container’s OS is needed to run in a pure cgroupv2 environment. Containers running systemd version 231 (released in 2016) or newer support cgroupv2, as do containers that do not use systemd as init system in the first place (e.g., Alpine Linux or Devuan).

CentOS 7 and Ubuntu 16.10 are two prominent Linux distributions releases, which have a systemd version that is too old to run in a cgroupv2 environment, for details and possible fixes see: https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat

Container HW Pass-Through & CGroupv2

Proxmox VE 7.0 defaults to the pure cgroupv2 environment, as v1 will be slowly sunset in systemd and other tooling. And with that some LXC config keys needs a slightly different syntax, for example for hardware pass-through you need to use lxc.cgroup2.devices.allow (instead of the old lxc.cgroup.devices.allow, note the missing 2)

Troubleshooting

Failing upgrade to "bullseye"

Make sure that the repository configuration for Bullseye is correct.

If there was a network failure and the upgrade was only partially completed, try to repair the situation with

apt -f install

If you see the following message:

W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!

then one or more of the currently existing packages cannot be upgraded since the proper Bullseye repository is not configured.

Check which of the previously used repositories (i.e. for Buster) do not exist for Bullseye or have not been upgraded to Bullseye ones.

If a corresponding Bullseye repository exists, upgrade the configuration (see also special remark for Ceph).

If an upgrade is not possible, configure all repositories as they were before the upgrade attempt, then run:

apt update

again. Then remove all packages which are currently installed from that repository. Following this, start the upgrade procedure again.

Unable to boot due to grub failure

See Recover From Grub Failure

If your system was installed on ZFS using legacy BIOS boot before the Proxmox VE 6.4 ISO, incompatibilities between the ZFS implementation in grub and newer ZFS versions can lead to a broken boot. Check the article on switching to proxmox-boot-tool ZFS: Switch Legacy-Boot to Proxmox Boot Tool for more details.

External links

Release Notes for Debian 11.0 (bullseye), 64-bit PC