PCI Passthrough: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
m (replace cmd for checking iommu group separation with the pvesh cmd, like in the pve docs)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==
{{Note|This is a collection of examples, workarounds, hacks, and specific issues for PCI(e) passthrough. For a step-by-step guide on how and what to do to pass through PCI(e) devices, see [https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_pci_passthrough the docs] or  [[PCI(e)_Passthrough|the wiki page generated from the docs]]}}


PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only).
PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only).


If you "PCI passthrough" a device, the device is not available to the host anymore.
If you "PCI passthrough" a device, the device is not available to the host anymore. Note that VMs with passed-through devices cannot be migrated.
 
'''Note:'''
 
PCI passthrough is an experimental feature in Proxmox VE! '''VMs with passthroughed devices cannot be migrated.'''
 
== Enable the IOMMU ==
 
You need to enable the IOMMU, by [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline editing the kernel commandline].
 
First open your bootloader kernel command line config file.
 
For '''GRUB''':
nano /etc/default/grub


Find the line with "GRUB_CMDLINE_LINUX_DEFAULT"
== Requirements ==


For '''systemd-boot''':
This is a list of basic requirements adapted from [https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Prerequisites the Arch wiki]
nano /etc/kernel/cmdline


Its format is a single line with options. You can create the file for systemd-boot if not present.
; CPU requirements:
: Your CPU has to support hardware virtualization and IOMMU. Most new CPUs support this.
* AMD: CPUs from the Bulldozer generation and newer, CPUs from the K10 generation need a 890FX or 990FX motherboard.
* Intel: [https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&0_VTD=True list of VT-d capable Intel CPUs]


=== Intel CPU ===
; Motherboard requirements:
: Your motherboard needs to support IOMMU. Lists can be found on [https://wiki.xenproject.org/wiki/VTd_HowTo the Xen wiki] and [https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware Wikipedia]. Note that, as of writing, both these lists are incomplete and very out-of-date and most newer motherboards support IOMMU.


For Intel CPUs add
; GPU requirements:
  intel_iommu=on
: The ROM of your GPU does not necessarily need to support UEFI, however, most modern GPUs do. If you GPU ROM supports UEFI, it is recommended to use OVMF (UEFI) instead of SeaBIOS. For a list of GPU ROMs, see [https://www.techpowerup.com/vgabios/?architecture=&manufacturer=&model=&version=&interface=&memType=&memSize=&since= Techpowerup's collection of GPU ROMs]
 
==== GRUB ====
 
If you are using GRUB:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
 
Then save the changes and update grub:
update-grub
 
==== systemd-boot ====
If you use systemd-boot, add the following at the end of the first line:
quiet intel_iommu=on
 
Then save the changes and update systemd-boot:
proxmox-boot-tool refresh
 
=== AMD CPU ===
 
For AMD CPUs add
  amd_iommu=on
 
==== GRUB ====
 
If you are using GRUB:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
 
Then save the changes and update grub:
update-grub
 
==== systemd-boot ====
If you are using systemd-boot, add the following at the end of the first line:
quiet amd_iommu=on
 
Then save the changes and update systemd-boot:
proxmox-boot-tool refresh


== Verifying IOMMU parameters ==
=== Verify IOMMU is enabled ===
=== Verify IOMMU is enabled ===


Line 72: Line 30:
There should be a line that looks like "DMAR: IOMMU enabled". If there is no output, something is wrong.
There should be a line that looks like "DMAR: IOMMU enabled". If there is no output, something is wrong.


=== PT Mode ===
=== Verify IOMMU interrupt remapping is enabled ===
 
Both Intel and AMD chips can use the additional parameter "iommu=pt", added in the same way as above to the kernel cmdline.
 
  iommu=pt
 
This enables the IOMMU translation only when necessary, the adapter does not need to use DMA translation to the memory, and can thus improve performance for '''hypervisor''' PCIe devices (which are not passthroughed to a VM)
 
== Required Modules ==
add to /etc/modules
 
<pre>
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
</pre>


Note that in the 5.4 based kernels some of those modules are already built into the kernel directly.
It is not possible to use PCI passthrough without interrupt remapping. Device assignment will fail with 'Failed to assign device "[device name]": Operation not permitted' or 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.'.
 
== IOMMU Interrupt Remapping ==
 
It will not be possible to use PCI passthrough without interrupt remapping. Device assignment will fail with 'Failed to assign device "[device name]": Operation not permitted' or 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.' error.


All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel).
All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel).
Line 106: Line 44:
If you see one of the following lines:
If you see one of the following lines:


* "AMD-Vi: Interrupt remapping enabled"
* <code>AMD-Vi: Interrupt remapping enabled</code>
* "DMAR-IR: Enabled IRQ remapping in x2apic mode" ('x2apic' can be different on old CPUs, but should still work)
* <code>DMAR-IR: Enabled IRQ remapping in x2apic mode</code> ('x2apic' can be different on old CPUs, but should still work)


then remapping is supported.
then remapping is supported.
Line 117: Line 55:
</pre>
</pre>


== Verify IOMMU Isolation==
=== Verify IOMMU isolation ===


For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.
For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.


You should have something like:
When executing


<pre>
# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""
# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.5
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
</pre>
 
To have separate IOMMU groups, your processor needs to have support for a feature called ACS (Access Control Services). Make sure you enable the corresponding setting in your BIOS for this.


All Xeon processor support them (E3,E5) excluding Xeon E3-1200.
replacing {nodename} with the name of your node.


For Intel Core it's different, only some processors support ACS. Anything newer than listed below should support ACS, as long as VT-d is supported. See https://ark.intel.com for more info.
You should get a list similar to:


<pre>
<pre>
Haswell-E (LGA2011-v3)
┌──────────┬────────┬──────────────┬────────────┬────────┬───────────────────────────────────────────────────────────────────┬...
i7-5960X (8-core, 3/3.5GHz)
│ class    │ device │ id          │ iommugroup │ vendor │ device_name                                                      │
i7-5930K (6-core, 3.2/3.8GHz)
╞══════════╪════════╪══════════════╪════════════╪════════╪═══════════════════════════════════════════════════════════════════╪
i7-5820K (6-core, 3.3/3.6GHz)
│ 0x010601 │ 0xa282 │ 0000:00:17.0 │          5 │ 0x8086 │ 200 Series PCH SATA controller [AHCI mode]                        │
 
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
Ivy Bridge-E (LGA2011)
│ 0x010802 │ 0xa808 │ 0000:02:00.0 │        12 │ 0x144d │ NVMe SSD Controller SM981/PM981/PM983                            │
i7-4960X (6-core, 3.6/4GHz)
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
i7-4930K (6-core, 3.4/3.6GHz)
│ 0x020000 │ 0x15b8 │ 0000:00:1f.6 │        11 │ 0x8086 │ Ethernet Connection (2) I219-V                                    │
i7-4820K (4-core, 3.7/3.9GHz)
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
 
│ 0x030000 │ 0x5912 │ 0000:00:02.0 │          2 │ 0x8086 │ HD Graphics 630                                                  │
Sandy Bridge-E (LGA2011)
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
i7-3960X (6-core, 3.3/3.9GHz)
│ 0x030000 │ 0x1d01 │ 0000:01:00.0 │          1 │ 0x10de │ GP108 [GeForce GT 1030]                                          │
i7-3970X (6-core, 3.5/4GHz)
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
i7-3930K (6-core, 3.2/3.8GHz)
.
i7-3820 (4-core, 3.6/3.8GHz)
.
.
</pre>
</pre>


AMD chips from Ryzen 1st generation and newer are fine too.
To have separate IOMMU groups, your processor needs to have support for a feature called ACS (Access Control Services). Make sure you enable the corresponding setting in your BIOS for this.
 
If you don't have dedicated IOMMU groups, you can try:
 
1) moving the card to another pci slot
 
2) adding "pcie_acs_override=downstream" to kernel boot commandline (grub or systemd-boot) options, which can help on some setup with bad ACS implementation.
: Checkout the documentation [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline about Editing the kernel commandline]
 
More infos:
 
http://vfio.blogspot.be/2015/10/intel-processors-with-acs-support.html 
http://vfio.blogspot.be/2014/08/iommu-groups-inside-and-out.html
 
== Determine your PCI card address, and configure your VM ==
 
The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's hardware tab.
 
Alternatively, you can use the command line:
 
Locate your card using "lspci". The address should be in the form of: 01:00.0
Edit the <vmid>.conf file. It can be located at: /etc/pve/qemu-server/vmid.conf.
 
Add this line to the end of the file:
<pre>
hostpci0: 01:00.0
</pre>
 
If you have a multi-function device  (like a vga card with embedded audio chipset), you can pass all functions manually with:
<pre>
hostpci0: 01:00.0;01:00.1
</pre>


or, to pass all functions automatically:
If you don't have dedicated IOMMU groups, you can try moving the card to another PCI slot.
<pre>
hostpci0: 01:00
</pre>


== PCI Express Passthrough ==
Should that not work, you can try using [https://lkml.org/lkml/2013/5/30/513 Alex Williamson's ACS override patch]. However, this should be seen as a last option
and is [http://vfio.blogspot.be/2014/08/iommu-groups-inside-and-out.html not without risks].


Check the "PCI-E" checkbox in the GUI when adding your device, or manually add the pcie=1 parameter to your VM config:
As of writing, the ACS patch is part of the Proxmox VE kernel and can be invoked via [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline Editing the kernel command line]. Add
<pre>
pcie_acs_override=downstream
machine: q35
to the kernel boot command line (grub or systemd-boot) options.
hostpci0: 01:00.0,pcie=1
</pre>


PCIe passthrough is only supported on Q35 machines.
More information can be found at [http://vfio.blogspot.com/ Alex Williamson's blog].


Note that this does not mean that devices assigned without this setting will only have PCI speeds, it just sets a flag for the guest to tell it that the device is a PCIe device instead of a "really-fast legacy PCI device". Some guest applications benefit from this.
== GPU passthrough ==


== GPU Passthrough ==
{{Note|See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with a How-To approach. (NOTE: you usually do not need the ROM-file dumping mentioned at the end!)}}


{{Note|See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with a HOWTO approach. (NOTE: you usually do not need the ROM-file dumping mentioned at the end!)}}
* AMD RADEON 5xxx, 6xxx, 7xxx, NVIDIA GeForce 7, 8, GTX 4xx, 5xx, 6xx, 7xx, 9xx, 10xx, 15xx, 16xx, and RTX 20xx have been reported working. Anything newer should work as well.
 
* AMD Navi (5xxx(XT)/6xxx(XT)) suffer from the reset bug (see https://github.com/gnif/vendor-reset), and while dedicated users have managed to get them to run, they require a lot more effort and will probably not work entirely stable (see the [[PCI_Passthrough#AMD_specific_issues|AMD specific issues]] for workarounds).
* AMD RADEON 5xxx, 6xxx, 7xxx, NVIDIA GEFORCE 7, 8, GTX 4xx, 5xx, 6xx, 7xx, 9xx, 10xx and RTX 16xx/20xx have been reported working.
* AMD Navi (5xxx(XT)/6xxx(XT)) suffer from the reset bug (see https://github.com/gnif/vendor-reset), and while dedicated users have managed to get them to run, they require a lot more effort and will probably not work entirely stable
* You might need to load some specific options in grub.cfg or other tuning values to get your configuration specifically working/stable
* You might need to load some specific options in grub.cfg or other tuning values to get your configuration specifically working/stable
* Here's a good forum thread of archlinux: https://bbs.archlinux.org/viewtopic.php?id=162768
* Here's a good forum thread of Arch Linux: https://bbs.archlinux.org/viewtopic.php?id=162768


For starters, it's often helpful if the host doesn't try to use the GPU, which avoids issues with the host driver unbinding and re-binding to the device. Sometimes making sure the host BIOS POST messages are displayed on a different GPU is helpful too. This can sometimes be acomplished via BIOS settings, moving the card to a different slot or enabling/disabling legacy boot support.
For starters, it's often helpful if the host doesn't try to use the GPU, which avoids issues with the host driver unbinding and re-binding to the device. Sometimes making sure the host BIOS POST messages are displayed on a different GPU is helpful too. This can sometimes be acomplished via BIOS settings, moving the card to a different slot or enabling/disabling legacy boot support.


First, find the device and vendor id of your vga card:
=== Blacklisting drivers ===
 
<pre>
$ lspci -n -s 01:00
01:00.0 0300: 10de:1381 (rev a2)
01:00.1 0403: 10de:0fbc (rev a1)
</pre>


The Vendor:Device IDs for this GPU and it's audio functions are therefore 10de:1381, 10de:0fbc.
The following is a list of common drivers and how to blacklist them:


Then, create a file:
* AMD GPUs
<pre>
<pre>
echo "options vfio-pci ids=10de:1381,10de:0fbc" > /etc/modprobe.d/vfio.conf
echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
</pre>
</pre>
 
* NVIDIA GPUs
blacklist the drivers:
<pre>
<pre>
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf  
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf  
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf  
echo "blacklist nvidia*" >> /etc/modprobe.d/blacklist.conf  
</pre>
</pre>
 
* Intel GPUs
and reboot your machine.
 
For VM configuration, They are 4 configurations possible:
 
=== GPU OVMF PCI Passthrough  (recommended) ===
 
Select "OVMF" as "BIOS" for your VM instead of the default "SeaBIOS".
You need to install your guest OS with uefi support. (for Windows, try win >=8)
 
Using OVMF, you can also add disable_vga=1 to vfio-pci module, which try to to opt-out devices from vga arbitration if possible:
<pre>
<pre>
echo "options vfio-pci ids=10de:1381,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf
echo "blacklist i915" >> /etc/modprobe.d/blacklist.conf
</pre>
</pre>
{{Note | If you are using an Intel iGPU and an Intel discrete GPU, blacklisting the Intel 'i915' drivers that the discrete GPU uses means the iGPU won't be able to use those drivers either.}}


and you need to make sure your graphics card has an UEFI bootable rom:
After blacklisting, you will need to reboot.
http://vfio.blogspot.fr/2014/08/does-my-graphics-card-rom-support-efi.html


<pre>
=== How to know if a graphics card is UEFI (OVMF) compatible ===
bios: ovmf
Have a look at [[PCI passthrough#Requirements|the requirements section]]. Chances are you are using the BIOS listed for your device on the Techpowerup GPU ROM list, which will say if it is UEFI compatible or not.
scsihw: virtio-scsi-pci
bootdisk: scsi0
scsi0: .....
hostpci0: 01:00,x-vga=on
</pre>


=== GPU OVMF PCI Express Passthrough ===
Alternatively, you can dump your ROM and use Alex Williams rom-parser tool:


Same as above, but set machine type to q35 and enable pcie=1:
{{ Note | You will want to run the following commands logged in as root user (by running <code>su -</code>) or by wrapping them with <code>sudo sh -c "<code-snippet>"</code>, otherwise the bash-redirects in the code-snippets below won't work}}
<pre>
bios: ovmf
scsihw: virtio-scsi-pci
bootdisk: scsi0
scsi0: .....
machine: q35
hostpci0: 01:00,pcie=1,x-vga=on
</pre>
 
=== GPU Seabios PCI Passthrough ===
<pre>
hostpci0: 01:00,x-vga=on
</pre>
 
=== GPU Seabios PCI Express Passthrough ===
<pre>
machine: q35
hostpci0: 01:00,pcie=1,x-vga=on
</pre>
 
=== How to know if a Graphics Card is UEFI (OVMF) compatible ===


Get and compile the software "rom-parser":
Get and compile the software "rom-parser":
Line 319: Line 153:
  ./rom-parser /tmp/image.rom
  ./rom-parser /tmp/image.rom


Output should look like this:
The output should look like this:


  Valid ROM signature found @0h, PCIR offset 190h
  Valid ROM signature found @0h, PCIR offset 190h
Line 332: Line 166:
To be UEFI compatible, you need a "type 3" in the result.
To be UEFI compatible, you need a "type 3" in the result.


=== NVIDIA Tips ===
=== The 'romfile' option ===
 
Some motherboards can't pass through GPUs on the first PCI(e) slot by default, because its vBIOS is shadowed during boot up. You need to capture its vBIOS when it is working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.
 
To dump the bios:
<pre>
cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom
</pre>


Some Windows applications like geforce experience, Passmark Performance Test and SiSoftware Sandra crash can crash the VM.
Then you can pass the vbios file (must be located in /usr/share/kvm/) with:
<pre>
hostpci0: 01:00,x-vga=on,romfile=vbios.bin
</pre>
 
=== Tips ===
 
Some Windows applications like GeForce Experience, Passmark Performance Test and SiSoftware Sandra can crash the VM.
You need to add:
You need to add:
<pre>
<pre>
Line 345: Line 196:
</pre>
</pre>


==== Nvidia Tips ====
User have reported that NVIDIA Kepler K80 GPUs need this in vmid.conf:
User have reported that NVIDIA Kepler K80 GPUs need this in vmid.conf:
<pre>
<pre>
args: -machine pc,max-ram-below-4g=1G
args: -machine pc,max-ram-below-4g=1G
</pre>
=== The 'romfile' Option ===
http://lime-technology.com/forum/index.php?topic=43644.msg482110#msg482110
Some motherboards can't passthrough GPUs on the first PCI(e) slot by default, because its vbios is shadowed during bootup. You need to capture its vBIOS when its working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.
To dump the bios:
<pre>
cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom
</pre>
Then you can pass the vbios file (must be located in /usr/share/kvm/) with:
<pre>
hostpci0: 01:00,x-vga=on,romfile=vbios.bin
</pre>
</pre>


==  Troubleshooting ==
==  Troubleshooting ==


=== BAR 3: can't reserve [mem] error ===
=== "BAR 3: can't reserve [mem]" error ===


If you have this error when you try to use the card for a VM:
If you have this error when you try to use the card for a VM:
Line 378: Line 211:
</pre>
</pre>


you can try to add the following kernel commandline option:
you can try to add the following kernel command line option:
<pre>
<pre>
video=efifb:off
video=efifb:off
</pre>
</pre>


Checkout the documentation [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline about Editing the kernel commandline]
Check out the documentation about [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline editing the kernel command line].
 
=== WSLg (Windows Subsystem for Linux GUI)===
If GUI apps don't open in WSLg, see [https://pve.proxmox.com/wiki/Windows_2022_guest_best_practices#Installing_WSL.28g.29 Windows 2022 guest best practices].
 
=== Black display in NoVNC/Spice ===
 
If you are passing through a GPU and are getting a black screen, you might need to change your display settings in the Guest OS. On Windows, this can be done by pressing the "Super/Windows" and "P" key. Alternatively, if you are using the GPU for hardware accelerated computing and need no graphical output from it, you can deselect the "primary GPU" option and physically disconnect your GPU.


=== SPICE ===
=== Spice ===


Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that, even when both cards show up.
Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that, even when both cards show up.
It's always worth a try to disable SPICE and check again if something fails.
It's always worth a try to disable SPICE and check again if something fails.


=== HDMI Audio crackling/broken ===
=== HDMI audio crackling/broken ===


Some digital audio devices (usually added via GPU functions) may require MSI (Message Signaled Interrupts) to be enabled to function correctly. If you experience any issues, try changing MSI settings in the guest and rebooting the guest.
Some digital audio devices (usually added via GPU functions) may require MSI (Message Signaled Interrupts) to be enabled to function correctly. If you experience any issues, try changing MSI settings in the guest and rebooting the guest.
A Windows-Tool to simplify this is available here: https://github.com/CHEF-KOCH/MSI-utility/releases/latest


Linux guests usually enable MSI by themselves. To force use of MSI for GPU audio devices, use the following command and reboot:
Linux guests usually enable MSI by themselves. To force use of MSI for GPU audio devices, use the following command and reboot:
Line 414: Line 252:
=== BIOS options ===
=== BIOS options ===


Make sure you are using the most recent BIOS version for you mainboard. Often IOMMU groupings or passthrough support in general is improved in later versions.
Make sure you are using the most recent BIOS version for you motherboard. Often IOMMU groupings or passthrough support in general is improved in later versions.


Some general BIOS options that might need changing to allow passthrough to work:
Some general BIOS options that might need changing to allow passthrough to work:
Line 422: Line 260:
* 'Resizable BAR'/'Smart Access Memory': Some AMD GPUs (Vega and up) experience 'Code 43' in Windows guests if this is enabled on the host. It's not supported in VMs either way (yet), so the recommended setting is 'off'.
* 'Resizable BAR'/'Smart Access Memory': Some AMD GPUs (Vega and up) experience 'Code 43' in Windows guests if this is enabled on the host. It's not supported in VMs either way (yet), so the recommended setting is 'off'.


== Verify Operation ==
=== Error 43 ===
[https://support.microsoft.com/en-us/windows/fix-graphics-device-problems-with-error-code-43-6f6ae1ec-0bbe-a848-142e-0c6190502842 Error code 43] is a generic Windows driver error and can occur for a wide number of reasons. Things you can try troubleshooting include:
 
==== Finding out if the PCI device has a hardware fault ====
* Try passing the PCI device to a Linux VM
* Try plugging the PCI device into a different PCI slot or into a different machine
 
==== Finding software issues ====
* Check the security event logs of your Windows VM
* Check the dmesg logs of your host machine
* [[PCI Passthrough#How_to_know_if_a_Graphics_Card_is_UEFI_.28OVMF.29_compatible|Dump your vBIOS]] and check if it is working correctly.
* Try a different vbios (see [[PCI_passthrough#Requirements| the GPU requirements section]])
* If your GPU supports resizable BAR/SAM and you have this option set in your BIOS, you might need to deactivate it or manually tweak your BAR using an udev rule (see [https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Code_43_while_Resizable_Bar_is_turned_on_in_the_bios Code 43 while Resizable Bar is turned on in the bios] in the Arch wiki)
* Sometimes the issue is very hardware-dependent. You might find someone else who found a solution who has the same hardware. Try searching the internet with keywords containing your hardware, together with keywords like "Proxmox", "KVM", or "Qemu".
 
==== Nvidia specific issues ====
 
When passing through mobile- or vGPUs, it might be necessary to spoof the Vendor ID and Hardware ID as if the passed-through GPU were the desktop variant. Changing the IDs might also be needed to remove manufacturer-specific vendor ID variants that are not recognized otherwise.


Start the VM and enter the qm monitor onn the CLI: "qm monitor vmnumber"
The Vendor and Device ID can be added in the web interface under "Hardware" -> "PCI Device (hostpciX)" and then clicking on the "Advanced" checkbox.
Verify that your card is listed here: "info pci"
Then install drivers on your guest OS.


NOTE: Card support might be limited to 2 or 3 devices.
Some software will also refuse to run when it detects that it is running in a VM. This should no longer be an issue with Nvidia drivers 465 and newer.


NOTE: A PCI device can only ever be attached to a single VM.
To find the Vendor ID and Device ID of the card installed on your host, run:
lspci -nn
which will give you something similar to
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP108 [GeForce GT 1030] [10de:1d01] (rev a1)
Here, <code>0x10de</code> is the Vendor ID and <code>0x1d01</code> the Device ID.


NOTE: This process will remove the card from the proxmox host OS as long as the VM it's attached to is running.
==== AMD specific issues ====
Some AMD cards suffer from the "AMD reset bug" where the GPU does not correctly reset after power cycling. This can be remedied with the [https://github.com/gnif/vendor-reset/ vendor-reset patch]. See also [https://www.nicksherlock.com/2020/11/working-around-the-amd-gpu-reset-bug-on-proxmox/ Nick Sherlock's writeup] on the issue.


NOTE: Using PCI passthrough to present drives direct to a ZFS (FreeNAS, Openfiler, OmniOS) virtual machine is OK for testing, but '''not recommended''' for production use.  Specific FreeNAS warnings can be found here:  http://forums.freenas.org/threads/absolutely-must-virtualize-freenas-a-guide-to-not-completely-losing-your-data.12714/
== USB passthrough ==
If you need to pass through USB devices (keyboard, mouse), please follow the [[USB Physical Port Mapping]] wiki article.


== USB Passthrough ==
== vGPU ==
If you need to passthrough usb devices (keyboard, mouse), please follow this wiki article:
If you want to split up one GPU into multiple vGPUs, see:
* [https://pve.proxmox.com/wiki/MxGPU_with_AMD_S7150_under_Proxmox_VE_5.x MxGPU with AMD S7150]
* [https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE_7.x NVIDIA vGPU]


https://pve.proxmox.com/wiki/USB_physical_port_mapping
[[Category:Staging]]
[[Category:HOWTO]]

Latest revision as of 09:24, 6 July 2023

Introduction

Yellowpin.svg Note: This is a collection of examples, workarounds, hacks, and specific issues for PCI(e) passthrough. For a step-by-step guide on how and what to do to pass through PCI(e) devices, see the docs or the wiki page generated from the docs

PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only).

If you "PCI passthrough" a device, the device is not available to the host anymore. Note that VMs with passed-through devices cannot be migrated.

Requirements

This is a list of basic requirements adapted from the Arch wiki

CPU requirements
Your CPU has to support hardware virtualization and IOMMU. Most new CPUs support this.
Motherboard requirements
Your motherboard needs to support IOMMU. Lists can be found on the Xen wiki and Wikipedia. Note that, as of writing, both these lists are incomplete and very out-of-date and most newer motherboards support IOMMU.
GPU requirements
The ROM of your GPU does not necessarily need to support UEFI, however, most modern GPUs do. If you GPU ROM supports UEFI, it is recommended to use OVMF (UEFI) instead of SeaBIOS. For a list of GPU ROMs, see Techpowerup's collection of GPU ROMs

Verifying IOMMU parameters

Verify IOMMU is enabled

Reboot, then run:

dmesg | grep -e DMAR -e IOMMU

There should be a line that looks like "DMAR: IOMMU enabled". If there is no output, something is wrong.

Verify IOMMU interrupt remapping is enabled

It is not possible to use PCI passthrough without interrupt remapping. Device assignment will fail with 'Failed to assign device "[device name]": Operation not permitted' or 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.'.

All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel).

To identify if your system has support for interrupt remapping:

dmesg | grep 'remapping'

If you see one of the following lines:

  • AMD-Vi: Interrupt remapping enabled
  • DMAR-IR: Enabled IRQ remapping in x2apic mode ('x2apic' can be different on old CPUs, but should still work)

then remapping is supported.

If your system doesn't support interrupt remapping, you can allow unsafe interrupts with:

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

Verify IOMMU isolation

For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.

When executing

# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""

replacing {nodename} with the name of your node.

You should get a list similar to:

┌──────────┬────────┬──────────────┬────────────┬────────┬───────────────────────────────────────────────────────────────────┬...
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                                                       │
╞══════════╪════════╪══════════════╪════════════╪════════╪═══════════════════════════════════════════════════════════════════╪
│ 0x010601 │ 0xa282 │ 0000:00:17.0 │          5 │ 0x8086 │ 200 Series PCH SATA controller [AHCI mode]                        │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
│ 0x010802 │ 0xa808 │ 0000:02:00.0 │         12 │ 0x144d │ NVMe SSD Controller SM981/PM981/PM983                             │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
│ 0x020000 │ 0x15b8 │ 0000:00:1f.6 │         11 │ 0x8086 │ Ethernet Connection (2) I219-V                                    │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
│ 0x030000 │ 0x5912 │ 0000:00:02.0 │          2 │ 0x8086 │ HD Graphics 630                                                   │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
│ 0x030000 │ 0x1d01 │ 0000:01:00.0 │          1 │ 0x10de │ GP108 [GeForce GT 1030]                                           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────────────────────────────┼
.
.
.

To have separate IOMMU groups, your processor needs to have support for a feature called ACS (Access Control Services). Make sure you enable the corresponding setting in your BIOS for this.

If you don't have dedicated IOMMU groups, you can try moving the card to another PCI slot.

Should that not work, you can try using Alex Williamson's ACS override patch. However, this should be seen as a last option and is not without risks.

As of writing, the ACS patch is part of the Proxmox VE kernel and can be invoked via Editing the kernel command line. Add

pcie_acs_override=downstream

to the kernel boot command line (grub or systemd-boot) options.

More information can be found at Alex Williamson's blog.

GPU passthrough

Yellowpin.svg Note: See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with a How-To approach. (NOTE: you usually do not need the ROM-file dumping mentioned at the end!)
  • AMD RADEON 5xxx, 6xxx, 7xxx, NVIDIA GeForce 7, 8, GTX 4xx, 5xx, 6xx, 7xx, 9xx, 10xx, 15xx, 16xx, and RTX 20xx have been reported working. Anything newer should work as well.
  • AMD Navi (5xxx(XT)/6xxx(XT)) suffer from the reset bug (see https://github.com/gnif/vendor-reset), and while dedicated users have managed to get them to run, they require a lot more effort and will probably not work entirely stable (see the AMD specific issues for workarounds).
  • You might need to load some specific options in grub.cfg or other tuning values to get your configuration specifically working/stable
  • Here's a good forum thread of Arch Linux: https://bbs.archlinux.org/viewtopic.php?id=162768

For starters, it's often helpful if the host doesn't try to use the GPU, which avoids issues with the host driver unbinding and re-binding to the device. Sometimes making sure the host BIOS POST messages are displayed on a different GPU is helpful too. This can sometimes be acomplished via BIOS settings, moving the card to a different slot or enabling/disabling legacy boot support.

Blacklisting drivers

The following is a list of common drivers and how to blacklist them:

  • AMD GPUs
echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
  • NVIDIA GPUs
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf 
echo "blacklist nvidia*" >> /etc/modprobe.d/blacklist.conf 
  • Intel GPUs
echo "blacklist i915" >> /etc/modprobe.d/blacklist.conf
Yellowpin.svg Note: If you are using an Intel iGPU and an Intel discrete GPU, blacklisting the Intel 'i915' drivers that the discrete GPU uses means the iGPU won't be able to use those drivers either.

After blacklisting, you will need to reboot.

How to know if a graphics card is UEFI (OVMF) compatible

Have a look at the requirements section. Chances are you are using the BIOS listed for your device on the Techpowerup GPU ROM list, which will say if it is UEFI compatible or not.

Alternatively, you can dump your ROM and use Alex Williams rom-parser tool:

Yellowpin.svg Note: You will want to run the following commands logged in as root user (by running su -) or by wrapping them with sudo sh -c "<code-snippet>", otherwise the bash-redirects in the code-snippets below won't work

Get and compile the software "rom-parser":

git clone https://github.com/awilliam/rom-parser
cd rom-parser
make

Then dump the rom of you vga card:

cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /tmp/image.rom
echo 0 > rom

and test it with:

./rom-parser /tmp/image.rom

The output should look like this:

Valid ROM signature found @0h, PCIR offset 190h
 PCIR: type 0, vendor: 10de, device: 1280, class: 030000
 PCIR: revision 0, vendor revision: 1
Valid ROM signature found @f400h, PCIR offset 1ch
 PCIR: type 3, vendor: 10de, device: 1280, class: 030000
 PCIR: revision 3, vendor revision: 0
  EFI: Signature Valid
 Last image

To be UEFI compatible, you need a "type 3" in the result.

The 'romfile' option

Some motherboards can't pass through GPUs on the first PCI(e) slot by default, because its vBIOS is shadowed during boot up. You need to capture its vBIOS when it is working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.

To dump the bios:

cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom

Then you can pass the vbios file (must be located in /usr/share/kvm/) with:

hostpci0: 01:00,x-vga=on,romfile=vbios.bin

Tips

Some Windows applications like GeForce Experience, Passmark Performance Test and SiSoftware Sandra can crash the VM. You need to add:

echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

If you see a lot of warning messages in your 'dmesg' system log, add the following instead:

echo "options kvm ignore_msrs=1 report_ignored_msrs=0" > /etc/modprobe.d/kvm.conf

Nvidia Tips

User have reported that NVIDIA Kepler K80 GPUs need this in vmid.conf:

args: -machine pc,max-ram-below-4g=1G

Troubleshooting

"BAR 3: can't reserve [mem]" error

If you have this error when you try to use the card for a VM:

vfio-pci 0000:04:00.0: BAR 3: can't reserve [mem 0xca000000-0xcbffffff 64bit]

you can try to add the following kernel command line option:

video=efifb:off

Check out the documentation about editing the kernel command line.

WSLg (Windows Subsystem for Linux GUI)

If GUI apps don't open in WSLg, see Windows 2022 guest best practices.

Black display in NoVNC/Spice

If you are passing through a GPU and are getting a black screen, you might need to change your display settings in the Guest OS. On Windows, this can be done by pressing the "Super/Windows" and "P" key. Alternatively, if you are using the GPU for hardware accelerated computing and need no graphical output from it, you can deselect the "primary GPU" option and physically disconnect your GPU.

Spice

Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that, even when both cards show up. It's always worth a try to disable SPICE and check again if something fails.

HDMI audio crackling/broken

Some digital audio devices (usually added via GPU functions) may require MSI (Message Signaled Interrupts) to be enabled to function correctly. If you experience any issues, try changing MSI settings in the guest and rebooting the guest.

Linux guests usually enable MSI by themselves. To force use of MSI for GPU audio devices, use the following command and reboot:

echo "options snd-hda-intel enable_msi=1" >> /etc/modprobe.d/snd-hda-intel.conf

Use 'lspci -vv' and check for the following line on your device to see if MSI is enabled:

Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+

If it says 'Enable+', MSI is working, 'Enable-' means it is supported but disabled, and if the line is missing, MSI is not supported by the PCIe hardware.

This can potentially also improve performance for other passthrough devices, including GPUs, but that depends on the hardware being used.

BIOS options

Make sure you are using the most recent BIOS version for you motherboard. Often IOMMU groupings or passthrough support in general is improved in later versions.

Some general BIOS options that might need changing to allow passthrough to work:

  • IOMMU or VT-d: Set to 'Enabled' or equivalent, often 'Auto' is not the same
  • 'Legacy boot' or CSM: For GPU passthrough it can help to disable this, but keep in mind that PVE has to be installed in UEFI mode, as it will not boot in BIOS mode without this enabled. The reason for disabling this is that it avoids legacy VGA initialization of installed GPUs, making them able to be re-initialized later, as required for passthrough. Most useful when trying to use passthrough in single GPU systems.
  • 'Resizable BAR'/'Smart Access Memory': Some AMD GPUs (Vega and up) experience 'Code 43' in Windows guests if this is enabled on the host. It's not supported in VMs either way (yet), so the recommended setting is 'off'.

Error 43

Error code 43 is a generic Windows driver error and can occur for a wide number of reasons. Things you can try troubleshooting include:

Finding out if the PCI device has a hardware fault

  • Try passing the PCI device to a Linux VM
  • Try plugging the PCI device into a different PCI slot or into a different machine

Finding software issues

  • Check the security event logs of your Windows VM
  • Check the dmesg logs of your host machine
  • Dump your vBIOS and check if it is working correctly.
  • Try a different vbios (see the GPU requirements section)
  • If your GPU supports resizable BAR/SAM and you have this option set in your BIOS, you might need to deactivate it or manually tweak your BAR using an udev rule (see Code 43 while Resizable Bar is turned on in the bios in the Arch wiki)
  • Sometimes the issue is very hardware-dependent. You might find someone else who found a solution who has the same hardware. Try searching the internet with keywords containing your hardware, together with keywords like "Proxmox", "KVM", or "Qemu".

Nvidia specific issues

When passing through mobile- or vGPUs, it might be necessary to spoof the Vendor ID and Hardware ID as if the passed-through GPU were the desktop variant. Changing the IDs might also be needed to remove manufacturer-specific vendor ID variants that are not recognized otherwise.

The Vendor and Device ID can be added in the web interface under "Hardware" -> "PCI Device (hostpciX)" and then clicking on the "Advanced" checkbox.

Some software will also refuse to run when it detects that it is running in a VM. This should no longer be an issue with Nvidia drivers 465 and newer.

To find the Vendor ID and Device ID of the card installed on your host, run:

lspci -nn

which will give you something similar to

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP108 [GeForce GT 1030] [10de:1d01] (rev a1)

Here, 0x10de is the Vendor ID and 0x1d01 the Device ID.

AMD specific issues

Some AMD cards suffer from the "AMD reset bug" where the GPU does not correctly reset after power cycling. This can be remedied with the vendor-reset patch. See also Nick Sherlock's writeup on the issue.

USB passthrough

If you need to pass through USB devices (keyboard, mouse), please follow the USB Physical Port Mapping wiki article.

vGPU

If you want to split up one GPU into multiple vGPUs, see: