Recover From Grub Failure: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
mNo edit summary
(Add section on grub LVM parsing bug)
Line 1: Line 1:
== General advice ==
During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. Monitor shows:
During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. Monitor shows:
*<code>grub rescue ></code>
*<code>grub rescue ></code>
Line 33: Line 35:


Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/
== Recovering from grub "disk not found" error when booting from LVM ==
This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub <code>2.06-3~deb11u5</code> (though also other versions of grub might be affected). In this setup, the host might end up in a state in which grub fails to boot and prints an error <code>disk `lvmid/<vg uuid>/<lv uuid>` not found</code>.
An example (of course, the UUIDs vary):
<nowiki>
Welcome to GRUB!
error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found.
grub rescue> </nowiki>
This error seems to be [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987008 caused by a grub bug]: grub apparently fails to parse LVM metadata correctly if there is a wraparound in the metadata ring buffer. This was also reported multiple times in the forum already, see [https://forum.proxmox.com/threads/98761/ here] and [https://forum.proxmox.com/threads/123512/ here].
In order to work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again.
One simple way to trigger an LVM metadata update is to create a small logical volume:
* Boot from a live USB/CD/DVD with LVM support, e.g. [https://grml.org/ grml]
* Run <code>vgscan</code> and <code>vgchange -ay</code>
* Create a 4MB logical volume named <code>grubtemp</code> in the <code>pve</code> volume group: <code>lvcreate -L 4M pve -n grubtemp</code>
* Reboot. PVE should boot normally again.
* You can now remove the <code>grubtemp</code> volume: <code>lvremove pve/grubtemp</code>
Note that there are many other options for triggering a metadata update, e.g. using <code>lvchange</code> to extend an existing logical volume or add a tag to an existing logical volume.

Revision as of 13:01, 31 March 2023

General advice

During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. Monitor shows:

  • grub rescue >

You can use Proxmox installation ISO in verison 5.4 or newer, and select debug mode. On the second prompt you'll have the full Linux tools, including LVM, ZFS, ..., available. If you exit that prompt you will come to the installation screens, simply hit abort there.

Alternatively, one can use a 64 bit version of Ubuntu or Debian Rescue CD.

Boot Proxmox VE in debug mode, or the Ubuntu/Debian off the ISO. We do not want to install Ubuntu/Debian, just run it live off the ISO/DVD.

First We need to activate LVM and mount the the root partition that is inside the LVM container.

  • sudo vgscan
  • sudo vgchange -ay

Mount all the filesystems that are already there so we can upgrade/install grub. Your paths may vary depending on your drive configuration.

  • sudo mkdir /media/RESCUE
  • sudo mount /dev/pve/root /media/RESCUE/
  • sudo mount /dev/sda1 /media/RESCUE/boot
  • sudo mount -t proc proc /media/RESCUE/proc
  • sudo mount -t sysfs sys /media/RESCUE/sys
  • sudo mount -o bind /dev /media/RESCUE/dev
  • sudo mount -o bind /run /media/RESCUE/run

Chroot into your proxmox install.

  • chroot /media/RESCUE

Then update grub and install it.

  • update-grub
  • grub-install /dev/sda

If there are no error messages, you should be able to reboot now.

Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/

Recovering from grub "disk not found" error when booting from LVM

This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub 2.06-3~deb11u5 (though also other versions of grub might be affected). In this setup, the host might end up in a state in which grub fails to boot and prints an error disk `lvmid/<vg uuid>/<lv uuid>` not found.

An example (of course, the UUIDs vary):

Welcome to GRUB!

error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found.
grub rescue> 

This error seems to be caused by a grub bug: grub apparently fails to parse LVM metadata correctly if there is a wraparound in the metadata ring buffer. This was also reported multiple times in the forum already, see here and here.

In order to work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again.

One simple way to trigger an LVM metadata update is to create a small logical volume:

  • Boot from a live USB/CD/DVD with LVM support, e.g. grml
  • Run vgscan and vgchange -ay
  • Create a 4MB logical volume named grubtemp in the pve volume group: lvcreate -L 4M pve -n grubtemp
  • Reboot. PVE should boot normally again.
  • You can now remove the grubtemp volume: lvremove pve/grubtemp

Note that there are many other options for triggering a metadata update, e.g. using lvchange to extend an existing logical volume or add a tag to an existing logical volume.