Recover From Grub Failure
General advice
During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. Monitor shows:
grub rescue >
You can use Proxmox installation ISO in verison 5.4 or newer, and select debug mode. On the second prompt you'll have the full Linux tools, including LVM, ZFS, ..., available. If you exit that prompt you will come to the installation screens, simply hit abort there.
Alternatively, one can use a 64 bit version of Ubuntu or Debian Rescue CD.
Boot Proxmox VE in debug mode, or the Ubuntu/Debian off the ISO. We do not want to install Ubuntu/Debian, just run it live off the ISO/DVD.
First We need to activate LVM and mount the the root partition that is inside the LVM container.
sudo vgscan
sudo vgchange -ay
Mount all the filesystems that are already there so we can upgrade/install grub. Your paths may vary depending on your drive configuration.
sudo mkdir /media/RESCUE
sudo mount /dev/pve/root /media/RESCUE/
sudo mount /dev/sda1 /media/RESCUE/boot
sudo mount -t proc proc /media/RESCUE/proc
sudo mount -t sysfs sys /media/RESCUE/sys
sudo mount -o bind /dev /media/RESCUE/dev
sudo mount -o bind /run /media/RESCUE/run
Chroot into your proxmox install.
chroot /media/RESCUE
Then update grub and install it.
update-grub
grub-install /dev/sda
If there are no error messages, you should be able to reboot now.
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/
Recovering from grub "disk not found" error when booting from LVM
This section applies to hosts which have their boot disk on LVM and run PVE 7.4 (or earlier) and grub 2.06-3~deb11u5
(though also other versions of grub might be affected). There are reports that PVE 8 and grub 2.06-13
are affected as well, even though the original bug is fixed in that version. The same workaround applies. If you are affected by this bug on PVE 8, please consider posting some information that might help with debugging in this forum thread before you apply the workaround below.
In this setup, the host might end up in a state in which grub fails to boot and prints an error disk `lvmid/<vg uuid>/<lv uuid>` not found
.
An example (of course, the UUIDs vary):
Welcome to GRUB! error: disk `lvmid/p3y5O2-jync-R2Ao-Gtlj-It3j-FZXE-ipEDYG/bApewq-qSRB-zYqT-mzvP-pGiV-VQaf-di4Rcz` not found. grub rescue>
This error seems to be caused by a grub bug: grub apparently fails to parse LVM metadata correctly if there is a wraparound in the metadata ring buffer. This was also reported multiple times in the forum already, see here and here.
In order to work around this bug and get the host to a bootable state again, it is sufficient to trigger an LVM metadata update. The updated metadata will reside in one contiguous section of the metadata ring buffer, so no wraparound occurs anymore. grub will then be able to parse the metadata correctly and boot again.
One simple way to trigger an LVM metadata update is to create a small logical volume:
- Boot from a live USB/CD/DVD with LVM support, e.g. grml
- Run
vgscan
- Create a 4MB logical volume named
grubtemp
in thepve
volume group:lvcreate -L 4M pve -n grubtemp
- Reboot. PVE should boot normally again.
- You can now remove the
grubtemp
volume:lvremove pve/grubtemp
Note that there are many other options for triggering a metadata update, e.g. using lvchange
to extend an existing logical volume or add a tag to an existing logical volume.