Recover From Grub Failure: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
m (don't actually need to run vgchange)
(add sentence about grub 2.06-13 and link the forum thread)
Line 38: Line 38:
== Recovering from grub "disk not found" error when booting from LVM ==
== Recovering from grub "disk not found" error when booting from LVM ==


This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub <code>2.06-3~deb11u5</code> (though also other versions of grub might be affected). In this setup, the host might end up in a state in which grub fails to boot and prints an error <code>disk `lvmid/<vg uuid>/<lv uuid>` not found</code>.
This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub <code>2.06-3~deb11u5</code> (though also other versions of grub might be affected). There are reports that PVE 8 and grub <code>2.06-13</code> are affected as well, even though the original bug is fixed in that version. The same workaround applies. If you are affected by this bug on PVE 8, please consider posting some information that might help with debugging in [https://forum.proxmox.com/threads/error-disk-lvmid-not-found-grub-rescue.123512/post-587764 this forum thread] '''before''' you apply the workaround below.
 
In this setup, the host might end up in a state in which grub fails to boot and prints an error <code>disk `lvmid/<vg uuid>/<lv uuid>` not found</code>.


An example (of course, the UUIDs vary):
An example (of course, the UUIDs vary):

Revision as of 11:32, 8 September 2023

General advice

During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. Monitor shows:

  • grub rescue >

You can use Proxmox installation ISO in verison 5.4 or newer, and select debug mode. On the second prompt you'll have the full Linux tools, including LVM, ZFS, ..., available. If you exit that prompt you will come to the installation screens, simply hit abort there.

Alternatively, one can use a 64 bit version of Ubuntu or Debian Rescue CD.

Boot Proxmox VE in debug mode, or the Ubuntu/Debian off the ISO. We do not want to install Ubuntu/Debian, just run it live off the ISO/DVD.

First We need to activate LVM and mount the the root partition that is inside the LVM container.

  • sudo vgscan
  • sudo vgchange -ay

Mount all the filesystems that are already there so we can upgrade/install grub. Your paths may vary depending on your drive configuration.

  • sudo mkdir /media/RESCUE
  • sudo mount /dev/pve/root /media/RESCUE/
  • sudo mount /dev/sda1 /media/RESCUE/boot
  • sudo mount -t proc proc /media/RESCUE/proc
  • sudo mount -t sysfs sys /media/RESCUE/sys
  • sudo mount -o bind /dev /media/RESCUE/dev
  • sudo mount -o bind /run /media/RESCUE/run

Chroot into your proxmox install.

  • chroot /media/RESCUE

Then update grub and install it.

  • update-grub
  • grub-install /dev/sda

If there are no error messages, you should be able to reboot now.

Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/

Recovering from grub "disk not found" error when booting from LVM

This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub 2.06-3~deb11u5 (though also other versions of grub might be affected). There are reports that PVE 8 and grub 2.06-13 are affected as well, even though the original bug is fixed in that version. The same workaround applies. If you are affected by this bug on PVE 8, please consider posting some information that might help with debugging in this forum thread before you apply the workaround below.

In this setup, the host might end up in a state in which grub fails to boot and prints an error disk `lvmid/<vg uuid>/<lv uuid>` not found.

An example (of course, the UUIDs vary):

Welcome to GRUB!

error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found.
grub rescue> 

This error seems to be caused by a grub bug: grub apparently fails to parse LVM metadata correctly if there is a wraparound in the metadata ring buffer. This was also reported multiple times in the forum already, see here and here.

In order to work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again.

One simple way to trigger an LVM metadata update is to create a small logical volume:

  • Boot from a live USB/CD/DVD with LVM support, e.g. grml
  • Run vgscan
  • Create a 4MB logical volume named grubtemp in the pve volume group: lvcreate -L 4M pve -n grubtemp
  • Reboot. PVE should boot normally again.
  • You can now remove the grubtemp volume: lvremove pve/grubtemp

Note that there are many other options for triggering a metadata update, e.g. using lvchange to extend an existing logical volume or add a tag to an existing logical volume.