Recover From Grub Failure: Difference between revisions
(add bind-mount for /run and use <code> tags for readability of commands) |
(rework chroot to cover ZFS systems as well→General advice) |
||
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== General advice == | |||
The following article provides pointers how to prepare a <code>chroot</code> | |||
environment for Proxmox VE systems, when repairing issues with boot-loaders. | |||
One example is finding oneself confronted with: | |||
* <code>grub rescue ></code> | |||
You can use the current Proxmox installation ISO and select debug mode. | |||
On the second prompt you'll have the full Linux tools, including LVM, ZFS, ..., available | |||
mounting your filesystems and entering a <code>chroot</code> for repair. | |||
(After you exit that prompt (using Ctrl+D or <code>exit</code>) you will come to the installation screens, simply hit abort there and | |||
reset the system). | |||
Alternatively, one can use a 64 bit version of Ubuntu or Debian Rescue CD, if you | |||
do not use ZFS as root filesystem (as this is usually not available in most rescue CDs). | |||
The following commands need to be run as <code>root</code> or using <code>sudo</code> or similar. | |||
We will use <code>/media/RESCUE</code> as mountpoint for the root-fs, and <code>/dev/sdX</code> | |||
as device on which Proxmox VE is installed in the examples. | |||
Create the mountpoint: | |||
mkdir /media/RESCUE | |||
=== LVM (Ext4/XFS) based systems === | |||
Enable the volume-group and all LVs within: | |||
*<code>grub-install /dev/ | vgscan | ||
vgchange -ay | |||
Mount the relevant filesystems Your paths will vary depending on your drive configuration. | |||
mount /dev/pve/root /media/RESCUE/ | |||
=== ZFS based systems === | |||
Import the pool with alternative root: | |||
zpool import -f -R /media/RESCUE rpool | |||
As the <code>hostid</code> in the installer is different, you will need | |||
to run <code>zpool import -f rpool</code> in the initramfs once after | |||
booting back into your system. | |||
=== Mount relevant filesystems and hostpaths === | |||
mount -o rbind /proc /media/RESCUE/proc | |||
mount -o rbind /sys /media/RESCUE/sys | |||
mount -o rbind /dev /media/RESCUE/dev | |||
mount -o rbind /run /media/RESCUE/run | |||
=== Chroot and repair === | |||
Chroot into your install. | |||
chroot /media/RESCUE | |||
Inside the <code>chroot</code> first check if your system is using | |||
<code>proxmox-boot-tool</code>: | |||
proxmox-boot-tool status | |||
If it is not used it will print: | |||
E: /etc/kernel/proxmox-boot-uuids does not exist. | |||
* If <code>proxmox-boot-tool</code> is used then run: | |||
proxmox-boot-tool reinit | |||
* If not - then mount the ESP and reinistall grub: | |||
mount /dev/sdX2 /boot/efi | |||
grub-install /dev/sdX | |||
If there are no error messages, you should be able to reboot now. | If there are no error messages, you should be able to reboot now. | ||
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/ | Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/ | ||
== Recovering from grub "disk not found" error when booting from LVM == | |||
This section applies to the following setups: | |||
* PVE 7.4 (or earlier) hosts with their boot disk on LVM | |||
* PVE 8 hosts that have their boot disk on LVM, boot in UEFI mode and were upgraded from PVE 7 | |||
In these setups, the host might end up in a state in which grub fails to boot and prints an error <code>disk `lvmid/<vg uuid>/<lv uuid>` not found</code>. An example (of course, the UUIDs vary): | |||
<nowiki> | |||
Welcome to GRUB! | |||
error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found. | |||
grub rescue> </nowiki> | |||
This error "disk `...` not found" error is originally [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008 caused by a grub bug]. LVM metadata is stored on-disk in a ring buffer, so occasionally the current metadata will wrap around the end of the ring buffer. However, if there is a wraparound in the ring buffer, grub fails to parse the metadata and fails to boot with the above error. | |||
The recommended steps differ between the PVE 7.4 and PVE 8. | |||
=== PVE 7.x === | |||
This subsection applies to PVE 7.4 (or earlier) hosts with their boot disk on LVM. | |||
PVE 7.4 ships <code>grub 2.06-3~deb11u5</code> which is affected by the [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008 bug] (though earlier versions may also be affected). This was also reported multiple times in the forum already, see [https://forum.proxmox.com/threads/98761/ here] and [https://forum.proxmox.com/threads/123512/ here]. | |||
==== Temporary Workaround ==== | |||
In order to '''temporarily''' work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again. | |||
One simple way to trigger an LVM metadata update is to create a small logical volume: | |||
* Boot from a live USB/CD/DVD with LVM support, e.g. [https://grml.org/ grml] | |||
* Run <code>vgscan</code> | |||
* Create a 4MB logical volume named <code>grubtemp</code> in the <code>pve</code> volume group: <code>lvcreate -L 4M pve -n grubtemp</code> | |||
* Reboot. PVE should boot normally again. | |||
* You can now remove the <code>grubtemp</code> volume: <code>lvremove pve/grubtemp</code> | |||
Note that there are many other options for triggering a metadata update, e.g. using <code>lvchange</code> to extend an existing logical volume or add a tag to an existing logical volume. | |||
The workaround is only temporary: If the host is (re)booted at a time when there is again a wraparound in the metadata ring buffer, grub will fail to boot again. | |||
On a running PVE system, you can check whether there is a wraparound in the metadata ring buffer using the following command: | |||
<nowiki> | |||
vgscan -vvv 2>&1 | grep "Reading metadata" </nowiki> | |||
If the output lines end with <code>(+0)</code>, there is no wraparound. If they end with <code>(+N)</code> for any other number <code>N</code>, there is a wraparound and the grub will most likely fail to boot after a reboot. | |||
==== Permanent Fix ==== | |||
The only '''permanent''' fix for PVE 7.x is: | |||
* Apply the temporary workaround to be able to boot PVE again | |||
* Upgrade to PVE 8 by following the [[Upgrade_from_7_to_8|upgrade guide]]. | |||
=== PVE 8 === | |||
This subsection applies to PVE 8 hosts that have their boot disk on LVM, boot in UEFI mode and were upgraded from PVE 7. | |||
PVE 8 ships <code>grub 2.06-13</code> in which the [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008 grub bug] is fixed. However, on hosts that boot in UEFI mode and were upgraded from PVE 7, it can happen that the updated grub 2.06-13 EFI binary is not installed to the EFI system partition (ESP) at <code>/boot/efi/EFI/proxmox/grubx64.efi</code>. As a result, when booting in UEFI mode, the host still runs the older <code>grub 2.06-3~deb11u5</code> binary that is affected by the grub bug. To find out whether this is the case, check its mtime using <code>ls -l /boot/efi/EFI/proxmox/grubx64.efi</code>. If it is older than the time of the upgrade from PVE 7 to 8, the host still runs the older grub binary when booting in UEFI mode. | |||
==== Temporary Workaround ==== | |||
The temporary workaround for PVE 8 to get the host in a bootable state [[#Temporary_Workaround|is the same as for PVE 7.x (see above)]]. | |||
==== Permanent Fix ==== | |||
The issue can be fixed permanently on PVE 8 by installing the correct grub metapackage for UEFI and choosing the correct UEFI boot entry. | |||
First, apply the [[#Temporary_Workaround|temporary workaround]] to be able to boot into PVE 8 again. When booted into PVE 8, run the following command. It checks if the host is indeed booted in UEFI mode, and if yes, installs the correct grub metapackage for UEFI: | |||
<nowiki> | |||
[ -d /sys/firmware/efi ] && apt install grub-efi-amd64 </nowiki> | |||
This will remove the <code>grub-pc</code> package, and update the binary on the ESP. You can verify that the mtime of <code>/boot/efi/EFI/proxmox/grubx64.efi</code> was updated. | |||
Note that this will not update the default EFI binary at <code>/boot/efi/EFI/BOOT/BOOTx64.EFI</code>, which might still be the grub binary that is affected by the bug. Consequently, make sure that you select the <code>proxmox</code> boot entry when booting in UEFI mode. If needed, you can adjust the boot order directly in the UEFI firmware or using the <code>efibootmgr</code> tool (see [https://manpages.debian.org/stable/efibootmgr/efibootmgr.8.en.html#Changing_the_boot_order its manpage]). |
Latest revision as of 09:52, 13 August 2025
General advice
The following article provides pointers how to prepare a chroot
environment for Proxmox VE systems, when repairing issues with boot-loaders.
One example is finding oneself confronted with:
grub rescue >
You can use the current Proxmox installation ISO and select debug mode.
On the second prompt you'll have the full Linux tools, including LVM, ZFS, ..., available
mounting your filesystems and entering a chroot
for repair.
(After you exit that prompt (using Ctrl+D or exit
) you will come to the installation screens, simply hit abort there and
reset the system).
Alternatively, one can use a 64 bit version of Ubuntu or Debian Rescue CD, if you do not use ZFS as root filesystem (as this is usually not available in most rescue CDs).
The following commands need to be run as root
or using sudo
or similar.
We will use /media/RESCUE
as mountpoint for the root-fs, and /dev/sdX
as device on which Proxmox VE is installed in the examples.
Create the mountpoint:
mkdir /media/RESCUE
LVM (Ext4/XFS) based systems
Enable the volume-group and all LVs within:
vgscan vgchange -ay
Mount the relevant filesystems Your paths will vary depending on your drive configuration.
mount /dev/pve/root /media/RESCUE/
ZFS based systems
Import the pool with alternative root:
zpool import -f -R /media/RESCUE rpool
As the hostid
in the installer is different, you will need
to run zpool import -f rpool
in the initramfs once after
booting back into your system.
Mount relevant filesystems and hostpaths
mount -o rbind /proc /media/RESCUE/proc mount -o rbind /sys /media/RESCUE/sys mount -o rbind /dev /media/RESCUE/dev mount -o rbind /run /media/RESCUE/run
Chroot and repair
Chroot into your install.
chroot /media/RESCUE
Inside the chroot
first check if your system is using
proxmox-boot-tool
:
proxmox-boot-tool status
If it is not used it will print:
E: /etc/kernel/proxmox-boot-uuids does not exist.
- If
proxmox-boot-tool
is used then run:
proxmox-boot-tool reinit
- If not - then mount the ESP and reinistall grub:
mount /dev/sdX2 /boot/efi grub-install /dev/sdX
If there are no error messages, you should be able to reboot now.
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/
Recovering from grub "disk not found" error when booting from LVM
This section applies to the following setups:
- PVE 7.4 (or earlier) hosts with their boot disk on LVM
- PVE 8 hosts that have their boot disk on LVM, boot in UEFI mode and were upgraded from PVE 7
In these setups, the host might end up in a state in which grub fails to boot and prints an error disk `lvmid/<vg uuid>/<lv uuid>` not found
. An example (of course, the UUIDs vary):
Welcome to GRUB! error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found. grub rescue>
This error "disk `...` not found" error is originally caused by a grub bug. LVM metadata is stored on-disk in a ring buffer, so occasionally the current metadata will wrap around the end of the ring buffer. However, if there is a wraparound in the ring buffer, grub fails to parse the metadata and fails to boot with the above error.
The recommended steps differ between the PVE 7.4 and PVE 8.
PVE 7.x
This subsection applies to PVE 7.4 (or earlier) hosts with their boot disk on LVM.
PVE 7.4 ships grub 2.06-3~deb11u5
which is affected by the bug (though earlier versions may also be affected). This was also reported multiple times in the forum already, see here and here.
Temporary Workaround
In order to temporarily work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again.
One simple way to trigger an LVM metadata update is to create a small logical volume:
- Boot from a live USB/CD/DVD with LVM support, e.g. grml
- Run
vgscan
- Create a 4MB logical volume named
grubtemp
in thepve
volume group:lvcreate -L 4M pve -n grubtemp
- Reboot. PVE should boot normally again.
- You can now remove the
grubtemp
volume:lvremove pve/grubtemp
Note that there are many other options for triggering a metadata update, e.g. using lvchange
to extend an existing logical volume or add a tag to an existing logical volume.
The workaround is only temporary: If the host is (re)booted at a time when there is again a wraparound in the metadata ring buffer, grub will fail to boot again.
On a running PVE system, you can check whether there is a wraparound in the metadata ring buffer using the following command:
vgscan -vvv 2>&1 | grep "Reading metadata"
If the output lines end with (+0)
, there is no wraparound. If they end with (+N)
for any other number N
, there is a wraparound and the grub will most likely fail to boot after a reboot.
Permanent Fix
The only permanent fix for PVE 7.x is:
- Apply the temporary workaround to be able to boot PVE again
- Upgrade to PVE 8 by following the upgrade guide.
PVE 8
This subsection applies to PVE 8 hosts that have their boot disk on LVM, boot in UEFI mode and were upgraded from PVE 7.
PVE 8 ships grub 2.06-13
in which the grub bug is fixed. However, on hosts that boot in UEFI mode and were upgraded from PVE 7, it can happen that the updated grub 2.06-13 EFI binary is not installed to the EFI system partition (ESP) at /boot/efi/EFI/proxmox/grubx64.efi
. As a result, when booting in UEFI mode, the host still runs the older grub 2.06-3~deb11u5
binary that is affected by the grub bug. To find out whether this is the case, check its mtime using ls -l /boot/efi/EFI/proxmox/grubx64.efi
. If it is older than the time of the upgrade from PVE 7 to 8, the host still runs the older grub binary when booting in UEFI mode.
Temporary Workaround
The temporary workaround for PVE 8 to get the host in a bootable state is the same as for PVE 7.x (see above).
Permanent Fix
The issue can be fixed permanently on PVE 8 by installing the correct grub metapackage for UEFI and choosing the correct UEFI boot entry.
First, apply the temporary workaround to be able to boot into PVE 8 again. When booted into PVE 8, run the following command. It checks if the host is indeed booted in UEFI mode, and if yes, installs the correct grub metapackage for UEFI:
[ -d /sys/firmware/efi ] && apt install grub-efi-amd64
This will remove the grub-pc
package, and update the binary on the ESP. You can verify that the mtime of /boot/efi/EFI/proxmox/grubx64.efi
was updated.
Note that this will not update the default EFI binary at /boot/efi/EFI/BOOT/BOOTx64.EFI
, which might still be the grub binary that is affected by the bug. Consequently, make sure that you select the proxmox
boot entry when booting in UEFI mode. If needed, you can adjust the boot order directly in the UEFI firmware or using the efibootmgr
tool (see its manpage).