PCI Passthrough: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
(Add note about iommu=pt)
(Clean up page, add some more info to certain sections and add notes about support for AMD CPUs)
Line 9: Line 9:




== Enable IOMMU ==
== Enable the IOMMU ==


You need to enable the IOMMU, by [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline editing the kernel commandline].
You need to enable the IOMMU, by [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline editing the kernel commandline].
Line 47: Line 47:
Then reboot, after that run "dmesg | grep -e DMAR -e IOMMU" from the command line. If there is no output, then something is wrong.
Then reboot, after that run "dmesg | grep -e DMAR -e IOMMU" from the command line. If there is no output, then something is wrong.


=== PT mode ===
=== PT Mode ===


Both Intel and AMD chips can use the additional parameter "iommu=pt", added in the same way as above.
Both Intel and AMD chips can use the additional parameter "iommu=pt", added in the same way as above.
This enabled the IOMMU translation only when necessary, and can thus improve performance for PCIe devices '''not''' used in VMs.
This enabled the IOMMU translation only when necessary, and can thus improve performance for PCIe devices '''not''' used in VMs.


== Required modules ==
== Required Modules ==
add to /etc/modules
add to /etc/modules


Line 64: Line 64:
Note that in the 5.4 based kernel (will be used for Proxmox VE 6.2 in Q2/2020) some of those modules are already built into the kernel directly.
Note that in the 5.4 based kernel (will be used for Proxmox VE 6.2 in Q2/2020) some of those modules are already built into the kernel directly.


== IOMMU interrupt remapping ==
== IOMMU Interrupt Remapping ==
it will not be possible to use PCI passthrough without interrupt remapping.


Device assignment will fail with a 'Failed to assign device "[device name]" : Operation not permitted' error for users of KVM, and a 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.
it will not be possible to use PCI passthrough without interrupt remapping. Device assignment will fail with 'Failed to assign device "[device name]": Operation not permitted' or 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.' error.
 
All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel). To identify if your system has support for interrupt remapping:
Systems which don't support interrupt remapping:
 
 
* All systems using an AMD processor and chipset that have AMD I/O Virtualization (AMD-Vi) support. Such hardware has interrupt remapping support; however, the software support is not yet available upstream.
 
* All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping. Interrupt remapping support is provided in newer processors and chipsets. To identify if your system has support for interrupt remapping:


1) Run the "dmesg | grep ecap" command.
1) Run the "dmesg | grep ecap" command.
2) On the IOMMU lines, the hexadecimal value after "ecap" indicates whether interrupt remapping is supported. If the last character of this value is an 8, 9, a, b, c, d, e, or an f, interrupt remapping is supported. For example, "ecap 1000" indicates there is no interrupt remapping support. "ecap 10207f" indicates interrupt remapping support, as the last character is an "f".
2) On the IOMMU lines, the hexadecimal value after "ecap" indicates whether interrupt remapping is supported. If the last character of this value is an 8, 9, a, b, c, d, e, or an f, interrupt remapping is supported. For example, "ecap 1000" indicates there is no interrupt remapping support. "ecap 10207f" indicates interrupt remapping support, as the last character is an "f".


Line 99: Line 91:
</pre>
</pre>


If your system doesn't support interrupt remapping,
If your system doesn't support interrupt remapping, you can allow unsafe interrupts with:
 
you can allow unsafe interrupts with:


<pre>
<pre>
Line 107: Line 97:
</pre>
</pre>


== Verify IOMMU isolation==
== Verify IOMMU Isolation==


To have pci passthrough working fine, you need dedicated iommu group for
For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.
your pci devices


You should have something like
You should have something like:


<pre>
<pre>
Line 140: Line 129:




To have separate iommu, your processor need to have support for
To have separate IOMMU groups, your processor needs to have support for a feature called ACS (Access Control Services).
a feature called ACS (Access Control Services).


All Xeon processor support them (E3,E5) excluding Xeon E3-1200
All Xeon processor support them (E3,E5) excluding Xeon E3-1200.


For intel Core , it's different, only some processors support ACS
For Intel Core it's different, only some processors support ACS:


<pre>
<pre>
Anything newer than listed below should support ACS, as long as VT-d is supported. See ark.intel.com for more info.
Haswell-E (LGA2011-v3)
Haswell-E (LGA2011-v3)
i7-5960X (8-core, 3/3.5GHz)
i7-5960X (8-core, 3/3.5GHz)
Line 165: Line 155:
</pre>
</pre>


UPDATE ME : AMD processors ?
AMD chips from Ryzen 1st generation and newer are fine too.


If you don't have dedicated iommu,
If you don't have dedicated IOMMU groups, you can try:
your can try :


 
1) moving the card to another pci slot
1) move the card to another pci slot
2) adding "pcie_acs_override=downstream" to kernel boot commandline (grub or systemd-boot) options, which can help on some setup with bad ACS implementation.
 
2) add "pcie_acs_override=downstream" to kernel boot commandline (grub or systemd-boot) options, which can help on some setup with bad ACS implementation.
: Checkout the documentation [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline about Editing the kernel commandline]
: Checkout the documentation [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline about Editing the kernel commandline]


More infos :  
More infos:


http://vfio.blogspot.be/2015/10/intel-processors-with-acs-support.html
http://vfio.blogspot.be/2015/10/intel-processors-with-acs-support.html
http://vfio.blogspot.be/2014/08/iommu-groups-inside-and-out.html
http://vfio.blogspot.be/2014/08/iommu-groups-inside-and-out.html


== Determine your PCI card address, and configure your VM ==
== Determine your PCI card address, and configure your VM ==


Locate your card using "lspci". &nbsp;The address should be in the form of: 01:00.0
The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's hardware tab.
 
Alternatively, you can use the command line:


Manually edit the node.conf file. &nbsp;It can be located at:&nbsp;/etc/pve/qemu-server/vmid.conf.
Locate your card using "lspci". The address should be in the form of: 01:00.0
Edit the <vmid>.conf file. It can be located at: /etc/pve/qemu-server/vmid.conf.


Add this line to the end of the file:  
Add this line to the end of the file:  
Line 193: Line 182:
</pre>
</pre>


If you have a multi-function device  (like a vga card with embedded audio chipset),
If you have a multi-function device  (like a vga card with embedded audio chipset), you can pass all functions manually with:
 
you can pass all functions manually with
 
<pre>
<pre>
hostpci0: 01:00.0;01:00.1
hostpci0: 01:00.0;01:00.1
</pre>
</pre>


or
or, to pass all functions automatically:
 
to pass all functions automatically
 
<pre>
<pre>
hostpci0: 01:00
hostpci0: 01:00
</pre>
</pre>


== PCI EXPRESS PASSTHROUGH ==
== PCI Express Passthrough ==


<pre>
Check the "PCI-E" checkbox in the GUI when adding your device, or manually add the pcie=1 parameter to your VM config:
/etc/pve/qemu-server/<vmid>.conf
</pre>
 
simple pci-express passthrough
<pre>
<pre>
machine: q35
machine: q35
Line 221: Line 200:
</pre>
</pre>


== GPU PASSTHROUGH ==
PCIe passthrough is only supported on Q35 machines.
 
Note that this does not mean that devices assigned without this setting will only have PCI speeds, it just sets a flag for the guest to tell it that the device is a PCIe device instead of a "really-fast legacy PCI device". Some guest applications benefit from this.
 
== GPU Passthrough ==


{{Note|See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with an HOWTO approach.}}
{{Note|See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with a HOWTO approach.}}


* MD RADEON 5xxx, 6xxx, 7xxx and NVIDIA GEFORCE 7, 8, 4xx, 5xx, 6xx, 7xx have been reported working.
* AMD RADEON 5xxx, 6xxx, 7xxx, Navi 5XXX(XT), NVIDIA GEFORCE 7, 8, GTX 4xx, 5xx, 6xx, 7xx, 9xx, 10xx and RTX 16xx/20xx have been reported working.
* Maybe you'll need to load some specific options in grub.cfg or other tuning values,
* You might need to load some specific options in grub.cfg or other tuning values to get your configuration specifically working/stable
* Here a good forum thread of archlinux: https://bbs.archlinux.org/viewtopic.php?id=162768
* Here's a good forum thread of archlinux: https://bbs.archlinux.org/viewtopic.php?id=162768


For GPU, it's good that host don't try to use the GPU, and avoids issues with the host driver unbinding and re-binding to the device.
For a GPU, it's often helpful if the host doesn't try to use the GPU, which avoids issues with the host driver unbinding and re-binding to the device. Sometimes making sure the host BIOS POST messages are displayed on a different GPU is helpful too. This can sometimes be acomplished via BIOS settings, moving the card to a different slot or enabling/disabling legacy boot support.


First, find the device and vendor id of your vga card
First, find the device and vendor id of your vga card:


<pre>
<pre>
Line 239: Line 222:
</pre>
</pre>


The Vendor:Device IDs for my GPU and audio functions are therefore 10de:1381, 10de:0fbc.
The Vendor:Device IDs for this GPU and it's audio functions are therefore 10de:1381, 10de:0fbc.


Then, create a file
Then, create a file:
<pre>
<pre>
echo "options vfio-pci ids=10de:1381,10de:0fbc" > /etc/modprobe.d/vfio.conf
echo "options vfio-pci ids=10de:1381,10de:0fbc" > /etc/modprobe.d/vfio.conf
</pre>
</pre>


Then blacklist drivers
blacklist the drivers:
 
<pre>
<pre>
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf  
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf  
Line 253: Line 235:
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf  
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf  
</pre>
</pre>
For VM configuration, They are 4 configuration possible:


=== GPU OVMF PCI  PASSTHROUGH  (recommended) ===
and reboot your machine.


OVMF replace bios by UEFI boot.
For VM configuration, They are 4 configurations possible:
You need to install your guest OS with uefi support, (for windows, try win >=8)


using OVMF, you can also add disable_vga=1 to vfio-pci module, which try to to opt-out devices from vga arbitration if possible.
=== GPU OVMF PCI Passthrough  (recommended) ===
 
Select "OVMF" as "BIOS" for your VM instead of the default "SeaBIOS".
You need to install your guest OS with uefi support. (for Windows, try win >=8)
 
Using OVMF, you can also add disable_vga=1 to vfio-pci module, which try to to opt-out devices from vga arbitration if possible:
<pre>
<pre>
echo "options vfio-pci ids=10de:1381,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf
echo "options vfio-pci ids=10de:1381,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf
</pre>
</pre>


and you need to your graphic card have a uefi bootable rom
and you need to make sure your graphics card has an UEFI bootable rom:
http://vfio.blogspot.fr/2014/08/does-my-graphics-card-rom-support-efi.html
http://vfio.blogspot.fr/2014/08/does-my-graphics-card-rom-support-efi.html


Line 276: Line 261:
</pre>
</pre>


=== GPU OVMF PCI EXPRESS PASSTHROUGH ===
=== GPU OVMF PCI Express Passthrough ===
OVMF replace bios by UEFI boot.
 
You need to install your guest OS with uefi support,  (for windows, try win >=8)
 
and you need to your graphic card have a uefi bootable rom
http://vfio.blogspot.fr/2014/08/does-my-graphics-card-rom-support-efi.html


Same as above, but set machine type to q35 and enable pcie=1:
<pre>
<pre>
bios: ovmf
bios: ovmf
Line 293: Line 273:
</pre>
</pre>


=== GPU Seabios PCI PASSTHROUGH ===
=== GPU Seabios PCI Passthrough ===
<pre>
<pre>
hostpci0: 01:00,x-vga=on
hostpci0: 01:00,x-vga=on
</pre>
</pre>


=== GPU Seabios PCI EXPRESS PASSTHROUGH ===
=== GPU Seabios PCI Express Passthrough ===
<pre>
<pre>
machine: q35
machine: q35
Line 304: Line 284:
</pre>
</pre>


=== How to known if card is UEFI (ovmf) compatible ===
=== How to know if a Graphics Card is UEFI (OVMF) compatible ===


Get and compile the software "rom-parser"
Get and compile the software "rom-parser":
<pre>
<pre>
$ git clone https://github.com/awilliam/rom-parser
$ git clone https://github.com/awilliam/rom-parser
Line 313: Line 293:
</pre>
</pre>


Then dump the rom of you vga card
Then dump the rom of you vga card:
<pre>
<pre>
# cd /sys/bus/pci/devices/0000:01:00.0/
# cd /sys/bus/pci/devices/0000:01:00.0/
# echo 1 > rom
# echo 1 > rom
Line 322: Line 301:
</pre>
</pre>


and test it with
and test it with:
<pre>
<pre>
./rom-parser /tmp/image.rom
./rom-parser /tmp/image.rom
Line 338: Line 317:
To be UEFI compatible, you need a "type 3" in the result.
To be UEFI compatible, you need a "type 3" in the result.


=== Nvidia tips ===
=== NVIDIA Tips ===


Some applications like geforce experience, Passmark Performance Test and SiSoftware Sandra crash can crash the vm.
Some Windows applications like geforce experience, Passmark Performance Test and SiSoftware Sandra crash can crash the VM.
you need to add:
You need to add:
<pre>
<pre>
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
</pre>
</pre>


User have reported that Nvidia Kepler K80 need this in vmid.conf
If you see a lot of warning messages in your 'dmesg' system log, add the following instead:
<pre>
echo "options kvm ignore_msrs=1 report_ignored_msrs=0" > /etc/modprobe.d/kvm.conf
</pre>
 
User have reported that NVIDIA Kepler K80 GPUs need this in vmid.conf:
<pre>
<pre>
args: -machine pc,max-ram-below-4g=1G
args: -machine pc,max-ram-below-4g=1G
</pre>
</pre>


=== The 'romfile' Option ===


=== romfile ===
http://lime-technology.com/forum/index.php?topic=43644.msg482110#msg482110
http://lime-technology.com/forum/index.php?topic=43644.msg482110#msg482110


Some motherboard can't gpu passthrough on the first pci slot by default because its vbios is shadowed during bootup. So we need to capture its bios when its working "normally" then when we move the card to slot 1 we can start the vm using the dumped vbios.
Some motherboards can't passthrough GPUs on the first PCI(e) slot by default, because its vbios is shadowed during bootup. You need to capture its vBIOS when its working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.
 
to dump the bios


To dump the bios:
<pre>
<pre>
cd /sys/bus/pci/devices/0000:01:00.0/
cd /sys/bus/pci/devices/0000:01:00.0/
Line 366: Line 349:
</pre>
</pre>


then you can pass the vbios file (must be located in /usr/share/kvm/) with
Then you can pass the vbios file (must be located in /usr/share/kvm/) with:
<pre>
<pre>
hostpci0: 01:00,x-vga=on,romfile=vbios.bin
hostpci0: 01:00,x-vga=on,romfile=vbios.bin
</pre>
</pre>


(romfile option is available in proxmox since january 2017)
==  Troubleshooting ==


=== BAR 3: can't reserve [mem] error ===
=== BAR 3: can't reserve [mem] error ===


If you have this error when try to use the card used by the host
If you have this error when you try to use the card for a VM:
 
<pre>
<pre>
vfio-pci 0000:04:00.0: BAR 3: can't reserve [mem 0xca000000-0xcbffffff 64bit]
vfio-pci 0000:04:00.0: BAR 3: can't reserve [mem 0xca000000-0xcbffffff 64bit]
</pre>
</pre>


you can try to add to grub kernel option:
you can try to add the following kernel commandline option:
<pre>
<pre>
video=efifb:off
video=efifb:off
</pre>
</pre>


==  Troubleshooting ==
Checkout the documentation [https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline about Editing the kernel commandline]


=== SPICE ===
=== SPICE ===


Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that even when both cards show up.
Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that, even when both cards show up.
It's always worth a try to disable SPICE and check again if something fails.
It's always worth a try to disable SPICE and check again if something fails.


== Verify Operation ==
=== HDMI Audio crackling/broken ===


Start the VM from the UI.
Some digital audio devices (usually added via GPU functions) may require MSI (Message Signaled Interrupts) to be enabled to function correctly. If you experience any issues, try changing MSI settings in the guest and rebooting the guest.


Enter the qm monitor. &nbsp;"qm monitor vmnumber"
A Windows-Tool to simplify this is available here: https://github.com/CHEF-KOCH/MSI-utility/releases/latest


Verify that your card is listed here: "info pci"
Linux guests usually enable MSI by themselves.


Then install drivers on your guest OS. &nbsp;
This can potentially also improve performance for other passthrough devices, including GPUs, but that depends on the hardware being used.  


NOTE: Card support might be limited to 2 or 3 devices.
== Verify Operation ==


NOTE: This process will remove the card from the proxmox host OS. &nbsp;
Start the VM and enter the qm monitor onn the CLI: "qm monitor vmnumber"
Verify that your card is listed here: "info pci"
Then install drivers on your guest OS.


Editorial Note: Using PCI passthrough to present drives direct to a ZFS (FreeNAS, Openfiler, OmniOS) virtual machine is OK for testing, but '''not recommended''' for production use.  Specific FreeNAS warnings can be found here:  http://forums.freenas.org/threads/absolutely-must-virtualize-freenas-a-guide-to-not-completely-losing-your-data.12714/
NOTE: Card support might be limited to 2 or 3 devices.
NOTE: A PCI device can only ever be attached to a single VM.
NOTE: This process will remove the card from the proxmox host OS as long as the VM it's attached to is running.
NOTE: Using PCI passthrough to present drives direct to a ZFS (FreeNAS, Openfiler, OmniOS) virtual machine is OK for testing, but '''not recommended''' for production use.  Specific FreeNAS warnings can be found here:  http://forums.freenas.org/threads/absolutely-must-virtualize-freenas-a-guide-to-not-completely-losing-your-data.12714/




== USB PASSTHROUGH ==
== USB Passthrough ==
if you need to passthrough usb devices (keyboard,mouse),
If you need to passthrough usb devices (keyboard, mouse), please follow this wiki article:
please follow this wiki:


https://pve.proxmox.com/wiki/USB_physical_port_mapping
https://pve.proxmox.com/wiki/USB_physical_port_mapping
[[Category:HOWTO]]
[[Category:HOWTO]]

Revision as of 15:02, 23 March 2020

Introduction

PCI passthrough allows you to use a physical PCI device (graphics card, network card) inside a VM (KVM virtualization only). If you "PCI passthrough" a device, the device is not available to the host anymore.

Note:

PCI passthrough is an experimental feature in Proxmox VE


Enable the IOMMU

You need to enable the IOMMU, by editing the kernel commandline.

First open your bootloader kernel command line config file, for grub:

nano /etc/default/grub

or systemd-boot

nano /etc/kernel/cmdline

Find the line with "GRUB_CMDLINE_LINUX_DEFAULT" (for GRUB), create the file for systemd-boot (it'S format is a single line with options)

Intel CPU

For Intel CPUs add "intel_iommu=on", for example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" 

Safe the changes and update grub:

update-grub

or: pve-efiboot-tool refresh

Then reboot, after that run "dmesg | grep -e DMAR -e IOMMU" from the command line. If there is no output, then something is wrong.

AMD CPU

For AMD CPUs add "amd_iommu=on", for example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" 

Safe the changes and update grub:

update-grub

or: pve-efiboot-tool refresh

Then reboot, after that run "dmesg | grep -e DMAR -e IOMMU" from the command line. If there is no output, then something is wrong.

PT Mode

Both Intel and AMD chips can use the additional parameter "iommu=pt", added in the same way as above. This enabled the IOMMU translation only when necessary, and can thus improve performance for PCIe devices not used in VMs.

Required Modules

add to /etc/modules

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Note that in the 5.4 based kernel (will be used for Proxmox VE 6.2 in Q2/2020) some of those modules are already built into the kernel directly.

IOMMU Interrupt Remapping

it will not be possible to use PCI passthrough without interrupt remapping. Device assignment will fail with 'Failed to assign device "[device name]": Operation not permitted' or 'Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.' error. All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel). To identify if your system has support for interrupt remapping:

1) Run the "dmesg | grep ecap" command. 2) On the IOMMU lines, the hexadecimal value after "ecap" indicates whether interrupt remapping is supported. If the last character of this value is an 8, 9, a, b, c, d, e, or an f, interrupt remapping is supported. For example, "ecap 1000" indicates there is no interrupt remapping support. "ecap 10207f" indicates interrupt remapping support, as the last character is an "f".

Interrupt remapping will only be enabled if every IOMMU supports it.

Alternatively, run the following script to determine if your system has interrupt remapping support:

#!/bin/sh
if [ $(dmesg | grep ecap | wc -l) -eq 0 ]; then
  echo "No interrupt remapping support found"
  exit 1
fi

for i in $(dmesg | grep ecap | awk '{print $NF}'); do
  if [ $(( (0x$i & 0xf) >> 3 )) -ne 1 ]; then
    echo "Interrupt remapping not supported"
    exit 1
  fi
done

If your system doesn't support interrupt remapping, you can allow unsafe interrupts with:

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

Verify IOMMU Isolation

For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM.

You should have something like:

# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/5/devices/0000:00:1b.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.5
/sys/kernel/iommu_groups/8/devices/0000:00:1c.6
/sys/kernel/iommu_groups/9/devices/0000:00:1c.7
/sys/kernel/iommu_groups/9/devices/0000:05:00.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.2
/sys/kernel/iommu_groups/11/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/13/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0


To have separate IOMMU groups, your processor needs to have support for a feature called ACS (Access Control Services).

All Xeon processor support them (E3,E5) excluding Xeon E3-1200.

For Intel Core it's different, only some processors support ACS:

Anything newer than listed below should support ACS, as long as VT-d is supported. See ark.intel.com for more info.

Haswell-E (LGA2011-v3)
i7-5960X (8-core, 3/3.5GHz)
i7-5930K (6-core, 3.2/3.8GHz)
i7-5820K (6-core, 3.3/3.6GHz)

Ivy Bridge-E (LGA2011)
i7-4960X (6-core, 3.6/4GHz)
i7-4930K (6-core, 3.4/3.6GHz)
i7-4820K (4-core, 3.7/3.9GHz)

Sandy Bridge-E (LGA2011)
i7-3960X (6-core, 3.3/3.9GHz)
i7-3970X (6-core, 3.5/4GHz)
i7-3930K (6-core, 3.2/3.8GHz)
i7-3820 (4-core, 3.6/3.8GHz)

AMD chips from Ryzen 1st generation and newer are fine too.

If you don't have dedicated IOMMU groups, you can try:

1) moving the card to another pci slot 2) adding "pcie_acs_override=downstream" to kernel boot commandline (grub or systemd-boot) options, which can help on some setup with bad ACS implementation.

Checkout the documentation about Editing the kernel commandline

More infos:

http://vfio.blogspot.be/2015/10/intel-processors-with-acs-support.html http://vfio.blogspot.be/2014/08/iommu-groups-inside-and-out.html

Determine your PCI card address, and configure your VM

The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's hardware tab.

Alternatively, you can use the command line:

Locate your card using "lspci". The address should be in the form of: 01:00.0 Edit the <vmid>.conf file. It can be located at: /etc/pve/qemu-server/vmid.conf.

Add this line to the end of the file:

hostpci0: 01:00.0

If you have a multi-function device (like a vga card with embedded audio chipset), you can pass all functions manually with:

hostpci0: 01:00.0;01:00.1

or, to pass all functions automatically:

hostpci0: 01:00

PCI Express Passthrough

Check the "PCI-E" checkbox in the GUI when adding your device, or manually add the pcie=1 parameter to your VM config:

machine: q35
hostpci0: 01:00.0,pcie=1

PCIe passthrough is only supported on Q35 machines.

Note that this does not mean that devices assigned without this setting will only have PCI speeds, it just sets a flag for the guest to tell it that the device is a PCIe device instead of a "really-fast legacy PCI device". Some guest applications benefit from this.

GPU Passthrough

Yellowpin.svg Note: See http://blog.quindorian.org/2018/03/building-a-2u-amd-ryzen-server-proxmox-gpu-passthrough.html/ if you like an article with a HOWTO approach.
  • AMD RADEON 5xxx, 6xxx, 7xxx, Navi 5XXX(XT), NVIDIA GEFORCE 7, 8, GTX 4xx, 5xx, 6xx, 7xx, 9xx, 10xx and RTX 16xx/20xx have been reported working.
  • You might need to load some specific options in grub.cfg or other tuning values to get your configuration specifically working/stable
  • Here's a good forum thread of archlinux: https://bbs.archlinux.org/viewtopic.php?id=162768

For a GPU, it's often helpful if the host doesn't try to use the GPU, which avoids issues with the host driver unbinding and re-binding to the device. Sometimes making sure the host BIOS POST messages are displayed on a different GPU is helpful too. This can sometimes be acomplished via BIOS settings, moving the card to a different slot or enabling/disabling legacy boot support.

First, find the device and vendor id of your vga card:

$ lspci -n -s 01:00
01:00.0 0300: 10de:1381 (rev a2)
01:00.1 0403: 10de:0fbc (rev a1)

The Vendor:Device IDs for this GPU and it's audio functions are therefore 10de:1381, 10de:0fbc.

Then, create a file:

echo "options vfio-pci ids=10de:1381,10de:0fbc" > /etc/modprobe.d/vfio.conf

blacklist the drivers:

echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf 
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf 
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf 

and reboot your machine.

For VM configuration, They are 4 configurations possible:

GPU OVMF PCI Passthrough (recommended)

Select "OVMF" as "BIOS" for your VM instead of the default "SeaBIOS". You need to install your guest OS with uefi support. (for Windows, try win >=8)

Using OVMF, you can also add disable_vga=1 to vfio-pci module, which try to to opt-out devices from vga arbitration if possible:

echo "options vfio-pci ids=10de:1381,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf

and you need to make sure your graphics card has an UEFI bootable rom: http://vfio.blogspot.fr/2014/08/does-my-graphics-card-rom-support-efi.html

bios: ovmf
scsihw: virtio-scsi-pci
bootdisk: scsi0
scsi0: .....
hostpci0: 01:00,x-vga=on

GPU OVMF PCI Express Passthrough

Same as above, but set machine type to q35 and enable pcie=1:

bios: ovmf
scsihw: virtio-scsi-pci
bootdisk: scsi0
scsi0: .....
machine: q35
hostpci0: 01:00,pcie=1,x-vga=on

GPU Seabios PCI Passthrough

hostpci0: 01:00,x-vga=on

GPU Seabios PCI Express Passthrough

machine: q35
hostpci0: 01:00,pcie=1,x-vga=on

How to know if a Graphics Card is UEFI (OVMF) compatible

Get and compile the software "rom-parser":

$ git clone https://github.com/awilliam/rom-parser
$ cd rom-parser
$ make

Then dump the rom of you vga card:

# cd /sys/bus/pci/devices/0000:01:00.0/
# echo 1 > rom
# cat rom > /tmp/image.rom
# echo 0 > rom

and test it with:

./rom-parser /tmp/image.rom

Valid ROM signature found @0h, PCIR offset 190h
 PCIR: type 0, vendor: 10de, device: 1280, class: 030000
 PCIR: revision 0, vendor revision: 1
Valid ROM signature found @f400h, PCIR offset 1ch
 PCIR: type 3, vendor: 10de, device: 1280, class: 030000
 PCIR: revision 3, vendor revision: 0
  EFI: Signature Valid
 Last image

To be UEFI compatible, you need a "type 3" in the result.

NVIDIA Tips

Some Windows applications like geforce experience, Passmark Performance Test and SiSoftware Sandra crash can crash the VM. You need to add:

echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

If you see a lot of warning messages in your 'dmesg' system log, add the following instead:

echo "options kvm ignore_msrs=1 report_ignored_msrs=0" > /etc/modprobe.d/kvm.conf

User have reported that NVIDIA Kepler K80 GPUs need this in vmid.conf:

args: -machine pc,max-ram-below-4g=1G

The 'romfile' Option

http://lime-technology.com/forum/index.php?topic=43644.msg482110#msg482110

Some motherboards can't passthrough GPUs on the first PCI(e) slot by default, because its vbios is shadowed during bootup. You need to capture its vBIOS when its working "normally" (i.e. installed in a different slot), then you can move the card to slot 1 and start the vm using the dumped vBIOS.

To dump the bios:

cd /sys/bus/pci/devices/0000:01:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom

Then you can pass the vbios file (must be located in /usr/share/kvm/) with:

hostpci0: 01:00,x-vga=on,romfile=vbios.bin

Troubleshooting

BAR 3: can't reserve [mem] error

If you have this error when you try to use the card for a VM:

vfio-pci 0000:04:00.0: BAR 3: can't reserve [mem 0xca000000-0xcbffffff 64bit]

you can try to add the following kernel commandline option:

video=efifb:off

Checkout the documentation about Editing the kernel commandline

SPICE

Spice may give trouble when passing through a GPU as it presents a "virtual" PCI graphic card to the guest and some drivers have problems with that, even when both cards show up. It's always worth a try to disable SPICE and check again if something fails.

HDMI Audio crackling/broken

Some digital audio devices (usually added via GPU functions) may require MSI (Message Signaled Interrupts) to be enabled to function correctly. If you experience any issues, try changing MSI settings in the guest and rebooting the guest.

A Windows-Tool to simplify this is available here: https://github.com/CHEF-KOCH/MSI-utility/releases/latest

Linux guests usually enable MSI by themselves.

This can potentially also improve performance for other passthrough devices, including GPUs, but that depends on the hardware being used.

Verify Operation

Start the VM and enter the qm monitor onn the CLI: "qm monitor vmnumber" Verify that your card is listed here: "info pci" Then install drivers on your guest OS.

NOTE: Card support might be limited to 2 or 3 devices. NOTE: A PCI device can only ever be attached to a single VM. NOTE: This process will remove the card from the proxmox host OS as long as the VM it's attached to is running. NOTE: Using PCI passthrough to present drives direct to a ZFS (FreeNAS, Openfiler, OmniOS) virtual machine is OK for testing, but not recommended for production use. Specific FreeNAS warnings can be found here: http://forums.freenas.org/threads/absolutely-must-virtualize-freenas-a-guide-to-not-completely-losing-your-data.12714/


USB Passthrough

If you need to passthrough usb devices (keyboard, mouse), please follow this wiki article:

https://pve.proxmox.com/wiki/USB_physical_port_mapping