NVIDIA vGPU on Proxmox VE: Difference between revisions

From Proxmox VE
Jump to navigation Jump to search
No edit summary
(explicitly mention that all nvidia supported gpus are supported on proxmox ve)
 
(26 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==


NVIDIA vGPU technology enables multiple virtual machines to use a single supported<ref name="supported-gpus">NVIDIA GPUs supported by vGPU https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html</ref> physical GPU.
NVIDIA vGPU Software enables multiple virtual machines to use a single supported<ref name="supported-gpus">NVIDIA GPUs supported by vGPU https://docs.nvidia.com/vgpu/gpus-supported-by-vgpu.html</ref> physical GPU.


This article explains how to use NVIDIA vGPU on Proxmox VE. The instructions were tested using an RTX A5000.
This article describes how to use NVIDIA vGPU software with the Proxmox Virtual Environment (Proxmox VE). The instructions were tested using an RTX A5000.


== Disclaimer ==
More information can also be found in the NVIDIA documentation<ref name="nvidia-docs">Latest NVIDIA vGPU Documentation https://docs.nvidia.com/vgpu/latest/index.html</ref><ref>NVIDIA vGPU Linux with KVM Documenation https://docs.nvidia.com/vgpu/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html</ref>.


At the time of writing, Proxmox VE is not an officially supported platform for NVIDIA vGPU. This means that even with valid vGPU licenses, you may not be eligible for NVIDIA enterprise support for this use-case.
== Support ==
However, Proxmox VE's kernel is derived from the Ubuntu kernel, which is a supported platform for NVIDIA vGPU as of 2024.


Note that although we are using some consumer hardware in this article, for optimal performance in production workloads, we recommend using appropriate enterprise-grade hardware.
Beginning with NVIDIA vGPU Software 18, Proxmox Virtual Environment is an officially supported platform.
Please refer to NVIDIA's support page to verify hardware compatibility
 
<ref>NVIDIA vGPU Certified Servers https://www.nvidia.com/en-us/data-center/resources/vgpu-certified-servers/</ref>
To be eligible for support tickets, you must have an '''active and valid NVIDIA vGPU entitlement''' as well as an active and valid Proxmox VE subscription on your cluster, with level '''Basic, Standard or Premium'''. See the Proxmox VE Subscription Agreement<ref>Proxmox VE Subscription Agreement https://www.proxmox.com/en/downloads/proxmox-virtual-environment/agreements/proxmox-ve-subscription-agreement</ref> and the Proxmox Support Offerings<ref>Proxmox VE Subscriptions https://www.proxmox.com/en/products/proxmox-virtual-environment/pricing</ref> for more details.
<ref name="supported-gpus">NVIDIA GPUs supported by vGPU https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html</ref>.
 
For a list of supported hardware, see NVIDIA's Qualified System Catalog <ref name="system-catalog">NVIDIA Qualified System Catalog https://marketplace.nvidia.com/en-us/enterprise/qualified-system-catalog/</ref>.
 
To get support, open a ticket on the [https://my.proxmox.com/ Proxmox Server Solutions Support Portal].


== Hardware Setup ==
== Hardware Setup ==


We're using the following hardware configuration for our test:
For optimal performance in production workloads, we recommend using appropriate enterprise-grade hardware.
 
All GPUs on [https://docs.nvidia.com/vgpu/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html#hardware-configuration NVIDIAs list of supported GPUs] are supported on Proxmox VE.
Please refer to NVIDIA's support page to verify server and version compatibility
<ref>NVIDIA vGPU Support Matrix https://docs.nvidia.com/vgpu/latest/product-support-matrix/index.html</ref>
<ref name="system-catalog" />
<ref name="supported-gpus" />.
 
Some Workstation NVIDIA GPUs do not have vGPU enabled by default, even though they support vGPU, like the RTX A5000. To enable vGPU for these models, switch the display mode using the NVIDIA Display Mode Selector Tool<ref>NVIDIA Display Mode Selector Tool https://developer.nvidia.com/displaymodeselector</ref>.
This will disable the display ports.
 
For a list of GPUs where this is necessary check their documentation<ref>Latest NVIDIA vGPU user guide: Switching the Mode of a GPU that Supports Multiple Display Modes https://docs.nvidia.com/vgpu/latest/grid-vgpu-user-guide/index.html#displaymodeselector</ref>.
Note that this should be the exception and should only be necessary for workstation GPUs.
 
== Software Versions ==
 
The installation is supported on the following versions of Proxmox VE, Linux kernel, and NVIDIA drivers:


{| class="wikitable"
{| class="wikitable"
|+ Test System
|-
|-
| CPU || Intel Core i7-12700K
! pve-manager !! Kernel !! vGPU Software Branch !! NVIDIA Host drivers
|-
|-
| Motherboard || ASUS PRIME Z690-A
| 8.3.4 || 6.8.12-8-pve || 18.0 || 570.124.03
|-
|-
| Memory || 128 GB DDR5 Memory: 4x Crucial CT32G48C40U5
| 8.3.4 || 6.11.11-1-pve|| 18.0 || 570.124.03
|-
| GPU || PNY NVIDIA RTX A5000
|}
|}


Some supported NVIDIA GPUs don't have vGPU enabled out of the box and need to have their display ports disabled.
{| class="mw-collapsible mw-collapsed wikitable"
This is the case with our RTX A5000, and can be achieved by using their ''display mode selector'' tool
! colspan="4" | Older, now outdated, tested versions.
<ref>NVIDIA displaymodeselector  tool https://developer.nvidia.com/displaymodeselector</ref>.
For a list of GPUs where this is necessary check their documentation.
 
The following Proxmox VE, kernel and driver versions were tested for installation:
 
(Note that newer versions in one vGPU Software Branch should also work for the same or older kernel versions)
 
{| class="wikitable"
|-
|-
! pve-manager !! kernel !! vGPU Software Branch !! NVIDIA Host drivers
! pve-manager !! Kernel !! vGPU Software Branch !! NVIDIA Host drivers
|-
|-
| 7.2-7 || 5.15.39-2-pve || 14.1 || 510.73.06
| 7.2-7 || 5.15.39-2-pve || 14.1 || 510.73.06
Line 53: Line 60:
|-
|-
| 8.1.4 || 6.5.11-8-pve || 16.3 || 535.154.02
| 8.1.4 || 6.5.11-8-pve || 16.3 || 535.154.02
|-
| 8.1.4 || 6.5.13-1-pve || 16.3 || 535.154.02
|-
| 8.2.8 || 6.8.12-4-pve || 17.3 || 550.90.05
|-
| 8.2.8 || 6.11.0-1-pve || 17.3 || 550.90.05
|-
| 8.2.8 || 6.8.12-4-pve || 17.4 || 550.127.06
|-
| 8.2.8 || 6.11.0-1-pve || 17.4 || 550.127.06
|}
|}


Generally it's recommended to use the latest stable and supported version for Proxmox VE and the NVIDIA drivers.
{{Note| With 6.8+ based kernels / GRID version 17.3+, the lower level interface of the driver changed and requires <code>qemu-server &ge; 8.2.6</code> to be installed on the host. }}


== Prerequisites ==
It is recommended to use the latest stable and supported version of Proxmox VE and NVIDIA drivers.
However, newer versions in one vGPU Software Branch should also work for the same or older kernel version.


You need to make sure that your system is suited for PCIe passthrough, see the
A mapping of which NVIDIA vGPU software version corresponds to which driver version is available in the official documentation
[https://pve.proxmox.com/wiki/PCI(e)_Passthrough PCI(e) Passhtrough] documentation.
<ref name="driver-versions">NVIDIA vGPU Driver versions https://docs.nvidia.com/vgpu/#driver-versions</ref>.


Additionally, make sure you enable the following features in your BIOS/UEFI
Since version 16.0, certain cards are no longer supported by the NVIDIA vGPU driver, but are supported by NVIDIA AI Enterprise
<ref name="supported-gpus" />
<ref name="supported-gpus-ai">NVIDIA GPUs supported by NVIDIA AI Enterprise https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html</ref>.
The NVIDIA AI Enterprise driver behaves similarly to the vGPU driver. Therefore, the following steps should also apply.


* VT-d for Intel, or AMD-v for AMD (sometimes named IOMMU)
Note that vGPU and NVIDIA AI Enterprise are different products with different licenses, and '''NVIDIA AI Enterprise''' is currently '''not officially supported''' with Proxmox VE.
* SR-IOV (may not be necessary for pre-Ampere GPUs)
 
== Preparation ==
 
Before actually installing the host drivers, there are a few steps that need to be done on the Proxmox VE host.
 
{{Note| If you need to use a root shell, you can, open one by connecting via SSH or using the node shell on the Proxmox VE web interface.}}
 
=== Enable PCIe Passthrough ===
 
Make sure that your system is compatible with PCIe passthrough. See the [https://pve.proxmox.com/wiki/PCI(e)_Passthrough PCI(e) Passthrough] documentation for details.
 
Additionally, confirm that the following features are enabled in your firmware settings (BIOS/UEFI):
 
* ''VT-d'' for Intel, or ''AMD-v'' for AMD (sometimes named IOMMU)
* ''SR-IOV'' (this may not be necessary for older pre-Ampere GPU generations)
* Above 4G decoding
* Above 4G decoding
* PCI AER (Advanced Error Reporting)
* Alternative Routing ID Interpretation (ARI) (not necessary for pre-Ampere GPUs)
* PCI ASPM (Active State Power Management)


The names and location of these options may vary from BIOS to BIOS, so check your vendor's
The firmware of your host might use different naming. If you are unable to locate some of these options, refer to the documentation provided by your firmware or motherboard manufacturer.
documentation.


Make especially sure that the IOMMU options are activated for your kernel and in your BIOS.
{{Note| It is crucial to ensure that both the IOMMU options are enabled in your firmware and the kernel. See [https://pve.proxmox.com/wiki/PCI(e)_Passthrough#_general_requirements General Requirements for PCI(e) Passthrough] for how to do that.}}


== Preparation ==
=== Setup Proxmox VE Repositories ===


Before actually installing the host drivers, there are a few steps to be done on the host.
Proxmox VE ships with the enterprise repository set up by default, as this repository provides better tested software and is recommended for production use.
The enterprise repository needs a valid subscription per node. For evaluation or non-production use cases, you can switch to the public <code>no-subscription</code> repository. Packages in the <code>no-subscription</code> repository get updated more frequently but are not as well tested. There is no difference in available features.


First, make sure that you have configured the correct [https://pve.proxmox.com/wiki/Package_Repositories repositories] and updated your hosts to the currently supported versions.
You can use the <code>Repositories</code> management panel in the Proxmox VE web UI for managing package repositories, see the [https://pve.proxmox.com/wiki/Package_Repositories documentation] for details.


Next, you want to blacklist the <code>nouveau</code> drivers.
=== Update to Latest Package Versions ===
To do that add <code>blacklist nouveau</code> to a file in <code>/etc/modprobe.d</code>, e.g. by doing
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf


After that [https://pve.proxmox.com/wiki/PCI(e)_Passthrough#qm_pci_passthrough_update_initramfs update your initramfs] and reboot your host.
Proxmox VE uses a rolling release model and should be updated frequently to ensure that your Proxmox VE installation has the latest bug and security fixes, and features available.


As we want to install the driver with dkms support, you must also install the dkms prerequisistes with:
You can update your Proxmox VE node using the <code>Updates</code> panel on the web UI.
apt update
apt install dkms libc6-dev proxmox-default-headers --no-install-recommends


{{Note| If you don't have the default kernel version installed, but e.g. an opt-in kernel, use the appropriate <code>proxmox-headers-X.Y</code> package instead of <code>proxmox-default-headers</code>.
=== Prepare using <code>pve-nvidia-vgpu-helper</code> ===
|warn}}


== Host Driver Installation ==
Since ''pve-manager'' version <code>8.3.4</code> the <code>pve-nvidia-vgpu-helper</code> tool is included. If you're on an older version, please upgrade to the latest version or install it manually with
apt install pve-nvidia-vgpu-helper


{{Note| The driver/file versions depicted in this section are only an example, use the correct file names for the chosen driver you're installing.
The <code>pve-nvidia-vgpu-helper</code> tool will set up some basics, like blacklisting the <code>nouveau</code> driver, installing header packages, DKMS and so on.
|warn}}
You can start the setup with
pve-nvidia-vgpu-helper setup


To begin you need the appropriate host and guest drivers. See their support page on how to get them
You will be asked if you want to install missing packages, answer 'y'.
<ref>Getting your NVIDIA GRID Software. https://docs.nvidia.com/grid/latest/grid-software-quick-start-guide/index.html#getting-your-nvidia-grid-software</ref>.
Once all the required packages have been successfully installed, you should see this message
Choose <code>Linux KVM</code> as target hypervisor.
All done, you can continue with the NVIDIA vGPU driver installation.


In our case we got the host driver:
NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run


If the <code>nouveau</code> driver was loaded previously, you have to reboot after this step so it isn't loaded afterward.


{{ Note| If you install an opt-in kernel later, you have to also install the corresponding <code>proxmox-header-X.Y</code> package for DKMS to work.}}


To start the installation run the installer, we recommend using the <code>--dkms</code> option, ensuring that the module gets rebuilt after a kernel upgrade:
== Host Driver Installation ==


# chmod +x NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run
{{ Note| The driver/file versions shown in this section are examples only; use the correct file names for the selected driver you're installing.}}
# ./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms
{{ Note| If you're using Secure boot, please refer to the Chapter [[NVIDIA vGPU on Proxmox VE#Secure Boot|Secure Boot]] before continuing. }}


Follow the steps of the installer, after that you need to reboot your system, for example with
To get started, you will need the appropriate host and guest drivers; see the NVIDIA Virtual GPU Software Quick Start Guide<ref>Getting your NVIDIA vGPU Software: https://docs.nvidia.com/vgpu/latest/grid-software-quick-start-guide/index.html#getting-your-nvidia-grid-software</ref> for instructions on how to obtain them.
Choose <code>Linux KVM</code> as target hypervisor when downloading.


  # reboot
In our case, we got the following host driver file:
  NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run


=== Enabling SR-IOV ===
Copy this file over to your Proxmox VE node, for example with SCP or an SSH file copy tool. If you are on Windows, [https://winscp.net WinSCP] can be used for this step.


On some NVIDIA GPUs (for example those based on the Ampere architecture), you must first enable SR-IOV before
To start the installation, you need to make the installer executable first, and then pass the <code>--dkms</code> option when running it, to ensure that the module is rebuilt after a kernel upgrade:
you can use vGPUs. You can do that with the <code>sriov-manage</code> script from NVIDIA.
chmod +x NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run
./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms


# /usr/lib/nvidia/sriov-manage -e <pciid|ALL>
Follow the steps of the installer.


Since that setting gets lost on reboot, it might be a good idea to write a cronjob or systemd service to enable it on reboot.
When you're asked if you want to register the kernel module sources with DKMS, answer 'yes'.


Here is an example systemd service for enabling SR-IOV on all found NVIDIA GPUs:
After the installer has finished successfully, you will need to reboot your system, either using the web interface or by executing <code>reboot</code>.


<pre>
=== Enabling SR-IOV ===
[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service
Before=pve-guests.service


[Service]
On newer NVIDIA GPUs (based on the Ampere architecture and beyond), you must first enable SR-IOV before being able to use vGPU.
Type=oneshot
This can be done manually with the <code>sriov-manage</code> script from NVIDIA (this is lost on reboot).
ExecStart=/usr/lib/nvidia/sriov-manage -e ALL


[Install]
Alternatively, the <code>pve-nvidia-vgpu-helper</code> package comes with a systemd service template which calls it automatically on every boot.
WantedBy=multi-user.target
</pre>


Depending on the actual hardware, it might be necessary to give the <code>nvidia-vgpud.service</code> a bit more time to start, you can do that by adding
To enable it, use
ExecStartPre=/bin/sleep 5
systemctl enable --now pve-nvidia-sriov@ALL.service
just before the <code>ExecStart</code> line in the service file (replace '5' by an appropriate amount of seconds)


This will run after the nvidia-daemons were started, but before the Proxmox VE virtual guest auto start-up.
You can replace <code>ALL</code> with a specific PCI ID (like <code>0000:01:00.0</code>) if you only want to enable it for a specific card.  


You can save this in <code>/etc/systemd/system/nvidia-sriov.service</code>. Then enable and start it with:
This will then run before the NVIDIA vGPU daemons and the Proxmox VE virtual guest auto start-up.
 
Due to the <code>--now</code> parameter, it will be started immediately.
# systemctl daemon-reload
# systemctl enable --now nvidia-sriov.service


Verify that there are multiple virtual functions for your device with:
Verify that there are multiple virtual functions for your device with:
Line 157: Line 181:
  # lspci -d 10de:
  # lspci -d 10de:


In our case there are now 24 virtual functions in addition to the physical card (01:00.0):
In our case, there are now 24 virtual functions in addition to the physical card (01:00.0):


  01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
  01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
Line 184: Line 208:
  01:03.2 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
  01:03.2 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
  01:03.3 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
  01:03.3 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
=== Create a PCI Resource Mapping ===
For convenience and privilege separation, you can now create resource mappings for PCI devices. This can contain multiple PCI IDs, such as all virtual functions. The first available ID is automatically selected when the guest is started.
Go to Datacenter → Resource mappings to create a new one. For details, see [[QEMU/KVM_Virtual_Machines#resource_mapping|Resource Mapping]]
In the resource mapping you need to enable 'Use with mediated devices'<ref>Mediated Devices https://pve.proxmox.com/wiki/PCI(e)_Passthrough#_mediated_devices_vgpu_gvt_g</ref> and select all relevant devices. For GPUs with SR-IOV (Ampere and later), this means the virtual functions which the card exposes. For example, using an RTX A5000 card, we want to select all virtual functions:
[[File:Pve-vgpu-mapping.png]]


== Guest Configuration ==
== Guest Configuration ==
Line 189: Line 222:
=== General Setup ===
=== General Setup ===


First, set up a VM like you normally would, without adding a vGPU.
First, set up a VM as you normally would, without adding a vGPU. This can be done either with the Virtual Machine wizard in the Web UI or via the CLI tool <code>qm</code>. For guest specific notes, see for example
[https://pve.proxmox.com/wiki/Windows%2011%20guest%20best%20practices Windows 11 guest best practices].
 
Please note that all Linux commands shown are assumed to be run as a privileged user. For example, directly as the <code>root</code> user, or prefixed with <code>sudo</code>.
 
=== Remote Desktop Software ===
 
Since the built-in VNC and SPICE console cannot display the virtual display provided by the vGPU, you need some kind of remote desktop software installed in the guest to access the guest.
There are many options available, see the NVIDIA documentation<ref name="nvidia-docs" /> or [https://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software Wikipedia Comparison of Remote Desktop Software] for examples.
 
We show how to enable two examples here, Remote Desktop for Windows 10/11, and VNC (via x11vnc) on Linux:
 
==== Remote Desktop on Windows 10/11 ====
 
To enable Remote Desktop on Windows 10/11, go to Settings → System → Remote Desktop and enable the Remote Desktop option.
 
<gallery>
Windows rdp.png
Win11-remote-desktop.png
</gallery>


After you have configured the VM to your liking, shut down the VM and add a vGPU by
==== VNC on Linux via x11vnc (Ubuntu/Rocky Linux) ====
selecting one of the virtual functions, and selecting the appropriate mediated device type.


For example:
Note that this is just an example; how you want to configure remote desktops on Linux will depend on your use case.
 
Ubuntu 24.04 and Rocky Linux 9 ship with GDM3 + Gnome per default, which make it a bit harder to share the screen with x11vnc.
So the first step is to install a different display manager. We successfully tested LightDM here, but others may work as well.
 
Note that for Rocky linux you might need to enable the EPEL repository beforehand with:
# dnf install epel-release
 
First, we install and activate the new display manager:
 
Ubuntu:
 
# apt install lightdm
 
Select 'LightDM' as default login manager when prompted.
 
Rocky Linux:
 
# dnf install lightdm
# systemctl disable --now gdm.service
# systemctl enable --now lightdm.service


CLI:
After that install <code>x11vnc</code> with


# qm set VMID -hostpci0 01:00.4,mdev=nvidia-660
Ubuntu:


GUI:
# apt install x11vnc


[[File:Pve select vgpu.png|none|Selecting a vGPU model]]
Rocky Linux:
# dnf install x11vnc


To find the correct mediated device type, you can use <code>sysfs</code>. Here is an example shell
We then added a systemd service that starts the VNC server on the x.org server provided by LightDM in <code>/etc/systemd/system/x11vnc.service</code>
script to print the type, then the name (which corresponds to the NVIDIA documentation) and the
description, which contains helpful information (such as the maximum available instances).
Adapt the PCI path to your needs:


<pre>
<pre>
#!/bin/sh
[Unit]
set -e
Description=Start x11vnc
After=multi-user.target


for i in /sys/bus/pci/devices/0000:01:00.4/mdev_supported_types/*; do
[Service]
    basename "$i"
Type=simple
    cat "$i/name"
ExecStart=/usr/bin/x11vnc -display :0 -auth /var/run/lightdm/root/:0 -forever -loop -repeat -rfbauth /etc/x11vnc.passwd -rfbport 5900 -shared -noxdamage
    cat "$i/description"
 
    echo
[Install]
done
WantedBy=multi-user.target
</pre>
</pre>


Since pve-manager version 7.2-8 and libpve-common-perl version 7.2-3, the GUI shows the correct name for the type.
You can set the password by executing:


If your qemu-server version is below 7.2-4, you must add an additional parameter
# x11vnc -storepasswd /etc/x11vnc.passwd
to the vm:
# chmod 0400 /etc/x11vnc.passwd


# qm set VMID -args '-uuid <UUID-OF-THE-MDEV>'
On Rocky Linux, you might need to allow VNC in the firewall:


The UUID of the mediated device is auto-generated from the VMID and the hostpciX index of the
# firewall-cmd --permanent --add-port=5900/tcp
config, where the hostpci index gets set as the first part and the VMID as the last. If you use
hostpci2 on VM 12345 for example, the generated UUID will be:


00000002-0000-0000-0000-000000012345
After setting up LightDM and x11vnc and restarting the VM, you should now be able to connect via VNC.


After that, you can start the VM and continue with the guest configuration. We installed
=== vGPU Configuration ===
a Windows 10 and Ubuntu 22.04 VM, but it's similar for other supported operating systems.


=== Windows 10 ===
After configuring the VM to your liking, shut down the VM and add a vGPU by selecting one of the virtual functions and selecting the appropriate mediated device type.


First install and configure a desktop sharing software that matches your requirements. Some
For example:
examples of such software include:


* '''VNC'''<br />many different options, some free, some commercial
Via the CLI:
* '''Remote Desktop'''<br />built into Windows itself
* '''Parsec'''<br />Costs money for commercial use, allows using hardware accelerated encoding
* '''RustDesk'''<br />free and open source, but relatively new as of 2022


We used simple Windows built-in remote desktop for testing.
qm set VMID -hostpci0 01:00.4,mdev=nvidia-660


[[File:Windows rdp.png|thumb|Enabling Remote Desktop in Windows 10]]
Via the web interface:


Then you can install the Windows guest driver that is published by NVIDIA.
[[File:PVE select a vgpu with mapping.png|none|Selecting a vGPU model]]
Refer to their documentation<ref name="gpu-mapping">NVIDIA Virtual GPU (vGPU) Software Documentation https://docs.nvidia.com/grid/</ref>to find a compatible guest driver to host driver mapping.
In our case this was the file


528.89_grid_win10_win11_server2019_server2022_dch_64bit_international.exe
To find the correct mediated device type, you can use <code>pvesh get /nodes/NODENAME/hardware/pci/MAPPINGNAME/mdev</code>.
This will query sysfs for all supported types that can be created. Note that, depending on the driver and kernel versions in use, not all models may be visible here, but only those that are currently available.


Start the installer and follow the instructions, then, after it finished restart the guest as prompted.
=== NVIDIA Guest Driver Installation ===
 
==== Windows 10/11 ====
 
Refer to their documentation<ref name="driver-versions" /> to find a compatible guest driver to host driver mapping.
For example:
 
553.24_grid_win10_win11_server2022_dch_64bit_international.exe
 
Start the installer and follow the instructions, then, after it finished, restart the guest as prompted.


<gallery>
<gallery>
Windows nv install01.png|Starting NVIDIA driver installation
Windows nv install01.png|Starting NVIDIA driver installation on Windows 10
Windows nv install02.png|Accepting the license agreement
Windows nv install02.png|Accepting the license agreement on Windows 10
Windows nv install03.png|Finishing the installation
Windows nv install03.png|Finishing the installation on Windows 10
</gallery>
</gallery>
From this point on, the integrated noVNC console of PVE will not be usable anymore, so use
your desktop sharing software to connect to the guest. Now you can use the vGPU for starting
3D applications such as Blender, 3D games, etc.


<gallery>
<gallery>
Windows valley.png|Unigine Valley
Win11-nv-install01.jpg|Starting NVIDIA driver installation on Windows 11
Windows supertuxkart.png|SuperTuxKart
Win11-nv-install02.jpg|Accepting the license agreement on Windows 11
Windows blender.png|Blender
Win11-nv-install03.jpg|Finishing the installation on Windows 11
</gallery>
</gallery>


=== Ubuntu 22.04 Desktop ===
From this point on, Proxmox VE's built-in noVNC console will no longer work, so use your desktop sharing software to connect to the Guest.
Now you can use the vGPU for starting 3D applications such as Blender, 3D games, etc.


Before installing the guest driver, install and configure a desktop sharing software, for example:


* '''VNC'''<br />many options. We use x11vnc here, which is free and open source, but does not currently provide hardware accelerated encoding
<gallery>
* '''NoMachine'''</br>provides hardware accelerated encoding, but is not open source and costs money for business use
Windows valley.png|Unigine Valley on Windows 10
* '''RustDesk'''<br />free and open source, but relatively new as of 2022
Windows supertuxkart.png|SuperTuxKart on Windows 10
Windows blender.png|Blender on Windows 10
Win11-blender.jpg|Blender on Windows 11
Win11-superposition.jpg|Unigine Superposition on Windows 11
Win11-supertuxkart.jpg|SuperTuxKart on Windows 11
</gallery>


We installed x11vnc in this example. While we're showing how to install and configure it, this is
==== Ubuntu Desktop ====
not the only way to achieve the goal of having properly configured desktop sharing.


Since Ubuntu 22.04 ships GDM3 + Gnome + Wayland per default, you first need to switch the login manager to one that uses X.org.
To install the NVIDIA driver on Ubuntu, use <code>apt</code> to install the .deb package that NVIDIA provides for Ubuntu.
We successfully tested LightDM here, but others may work as well.
Check the NVIDIA documentation<ref name="driver-versions" /> for a compatible guest driver to host driver mapping.


# apt install lightdm
In our case, this was <code>nvidia-linux-grid-550_550.127.05_amd64.deb</code>.
For that to work you must prefix the relative path, for example <code>./</code> if the <code>.deb</code> file is located in the current directory.


Select 'LightDM' as default login manager when prompted. After that install <code>x11vnc</code> with
# apt install ./nvidia-linux-grid-550_550.127.05_amd64.deb


# apt install x11vnc
<gallery>
Ubuntu-nv-install01.jpg|List NVIDIA driver files
Ubuntu-nv-install02.jpg|Start NVIDIA driver installation
Ubuntu-nv-install03.jpg|Finished NVIDIA driver installation
</gallery>


We then added a systemd service that starts the vnc server on the x.org server provided by lightm in <code>/etc/systemd/system/x11vnc.service</code>
Then you must use NVIDIA's tools to configure the x.org configuration with:


<pre>
# nvidia-xconfig
[Unit]
Description=Start x11vnc
After=multi-user.target


[Service]
Now you can reboot and use a VNC client to connect and use the vGPU for 3D applications.
Type=simple
ExecStart=/usr/bin/x11vnc -display :0 -auth /var/run/lightdm/root/:0 -forever -loop -repeat -rfbauth /etc/x11vnc.passwd -rfbport 5900 -shared -noxdamage


[Install]
<gallery>
WantedBy=multi-user.target
Ubuntu valley.png|Unigine Valley (Ubuntu 22.04)
</pre>
Nv Ubuntu supertuxkart.png|SuperTuxKart (Ubuntu 22.04)
Ubuntu blender.png|Blender (Ubuntu 22.04)
Ubuntu-24-superposition.jpg|Unigine Superposition (Ubuntu 24.04)
Ubuntu-24-supertuxkart.jpg|SuperTuxKart (Ubuntu 24.04)
Ubuntu-24-blender.jpg|Blender Classroom (Ubuntu 24.04)
</gallery>


You can set the password by executing:
==== Rocky Linux ====


# x11vnc -storepasswd /etc/x11vnc.passwd
To install the NVIDIA driver on Rocky Linux, use <code>dnf</code> to install the .rpm package that NVIDIA provides for Red Hat based distributions.
# chmod 0400 /etc/x11vnc.passwd
Check the NVIDIA documentation<ref name="driver-versions" /> for a compatible guest driver to host driver mapping.


After setting up LightDM and x11vnc and restarting the VM, you can connect via VNC.
In our case, this was <code>nvidia-linux-grid-550-550.127.05-1.x86_64.rpm</code>.
If the file is located in the current directory, run:


Now, install the .deb package that NVIDIA provides for Ubuntu.
# dnf install nvidia-linux-grid-550-550.127.05-1.x86_64.rpm
Check the NVIDIA documentation<ref name="gpu-mapping" /> for a compatible guest driver to host driver mapping.


In our case this was <code>nvidia-linux-grid-525_525.105.17_amd64.deb</code>, and we directly installed from the local file using apt.
<gallery>
For that to work you must prefix the relative path, for example <code>./</code> if the <code>.deb</code> file is located in the current directory.
Rocky-9-nv-install01.jpg|Start driver installation
Rocky-9-nv-install02.jpg|Finished driver installation
</gallery>


# apt install ./nvidia-linux-grid-525_525.105.17_amd64.deb
Then you must use NVIDIA's tools to configure the x.org configuration with:
 
Then you must use NVIDIA's tools to configure the x.org confiuration with:


  # nvidia-xconfig
  # nvidia-xconfig
Line 332: Line 409:


<gallery>
<gallery>
Ubuntu valley.png|Unigine Valley
Rocky-9-blender.jpg|Blender Classroom render
Nv Ubuntu supertuxkart.png|SuperTuxKart
Rocky-9-superposition.jpg|Unigine Superposition
Ubuntu blender.png|Blender
Rocky-9-supertuxkart.jpg|SuperTuxKart
</gallery>
</gallery>


{{Note| If you want to use CUDA on a Linux Guest, you must install the CUDA Toolkit manually<ref>NVIDIA CUDA Toolkit Download https://developer.nvidia.com/cuda-downloads</ref>.
==== CUDA on Linux ====
 
If you want to use CUDA on a Linux Guest, you might need to install the CUDA Toolkit manually<ref>NVIDIA CUDA Toolkit Download https://developer.nvidia.com/cuda-downloads</ref>.
Check the NVIDIA documentation which version of CUDA is supported for your vGPU drivers.
Check the NVIDIA documentation which version of CUDA is supported for your vGPU drivers.


In our case we needed to install CUDA 11.6 (only the toolkit, not the driver) with the file:
=== Guest vGPU Licensing ===
 
To use the vGPU without restriction, you must adhere to NVIDIA's licensing.
Check the NVIDIA vGPU documentation<ref>NVIDIA vGPU Licensing User Guide: https://docs.nvidia.com/vgpu/latest/grid-licensing-user-guide/index.html</ref> for instructions on how to do so.
 
For setting up a DLS (Delegated License Service), see NVIDIAs DLS Documentation<ref>NVIDIA DLS Documentation https://docs.nvidia.com/license-system/latest/</ref>.
 
'''Tip''': Ensure that the guest system time is properly synchronized using NTP. Otherwise, the guest will be unable to request a license for the vGPU.
 
=== Troubleshooting ===
 
A warning like the following might get logged by QEMU on VM startup. This usually only happens on consumer hardware which does not support PCIe AER<ref>PCI Express Advanced Error Reporting Driver Guide: https://www.kernel.org/doc/html/v6.12-rc4/PCI/pcieaer-howto.html</ref> error recovery properly, it generally should not have any adverse effects on normal operation, but PCIe link errors might not be (soft-)recoverable in such cases.


cuda_11.6.2_510.47.03_linux.run
  kvm: -device vfio-pci,host=0000:09:00.5,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: warning: vfio 0000:09:00.5: Could not enable error recovery for the device
|warn}}


=== Guest vGPU Licensing ===
=== Known Issues ===
 
==== Windows 10/11 'Fast Startup' ====
 
In Windows 10/11 'fast startup' is enabled by default. When enabled, a shutdown via ACPI or the start menu will use 'hybrid shutdown'. The next boot will fail with a blue screen and the vGPU will be disabled.
 
Disable 'fast boot' in Windows to prevent this. In the Control Panel → Power Options → Choose what the power button does → Uncheck 'fast startup'.
 
Alternatively, disable hybrid shutdown in a command prompt with admin privileges:
Powercfg -h off
 
== Secure Boot ==
 
When booting the host with secure boot, kernel modules must be signed with a trusted key. We will show you how to set up your host so that the NVIDIA driver is signed and ready to load. For more details, see [[Secure Boot Setup]].
To be able to enroll the keys into the UEFI, make sure you have access to the physical display output during boot. This is necessary for confirming the key import. On servers, this can usually be achieved with IPMI/iKVM/etc.
 
Before installing the NVIDIA Host driver, we need to install a few prerequisites to enroll the DKMS signing key into UEFI:
<syntaxhighlight lang="bash">apt install shim-signed grub-efi-amd64-signed mokutil</syntaxhighlight>
 
Now you can install the NVIDIA driver, but with an additional parameter:
./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms --skip-module-load
 
When asked if the installer should sign the module, select 'no'.
 
After the installer is finished, we now want to rebuild the kernel modules with DKMS, which will sign the kernel module for us with a generated key.
First, check what module version is installed with:
dkms status
 
Which will output a line like this:
 
nvidia/550.144.02, 6.8.12-6-pve, x86_64: installed
 
You need to rebuild and reinstall the listed module with (replace the version with the one on your system)
dkms build -m nvidia -v 550.144.02 --force
dkms install -m nvidia -v 550.144.02 --force
 
This will ensure that the modules are signed with the DKMS key located in /var/lib/dkms/mok.pub
If you have not already done so, enroll the DKMS key as described in [[Secure_Boot_Setup#Using_DKMS_with_Secure_Boot|Using DKMS with Secure Boot]].
 
You should then be able to load the signed NVIDIA kernel module. You can verify this by checking if the PCI devices have their driver loaded, e.g. with
lspci -d 10de: -nnk
 
It should say
Kernel driver in use: nvidia


To use the vGPU unrestricted, you must adhere to NVIDIA's licensing. Check the NVIDIA
You can now continue with the next step after the driver installation.
documentation<ref>NVIDIA GRID Licensing User Guide. https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html</ref> for how to do that.
Make sure the guest time is properly synced, otherwise the guest will not be able to request a license for the vGPU.


== Notes ==
== Notes ==

Latest revision as of 14:52, 12 March 2025

Introduction

NVIDIA vGPU Software enables multiple virtual machines to use a single supported[1] physical GPU.

This article describes how to use NVIDIA vGPU software with the Proxmox Virtual Environment (Proxmox VE). The instructions were tested using an RTX A5000.

More information can also be found in the NVIDIA documentation[2][3].

Support

Beginning with NVIDIA vGPU Software 18, Proxmox Virtual Environment is an officially supported platform.

To be eligible for support tickets, you must have an active and valid NVIDIA vGPU entitlement as well as an active and valid Proxmox VE subscription on your cluster, with level Basic, Standard or Premium. See the Proxmox VE Subscription Agreement[4] and the Proxmox Support Offerings[5] for more details.

For a list of supported hardware, see NVIDIA's Qualified System Catalog [6].

To get support, open a ticket on the Proxmox Server Solutions Support Portal.

Hardware Setup

For optimal performance in production workloads, we recommend using appropriate enterprise-grade hardware.

All GPUs on NVIDIAs list of supported GPUs are supported on Proxmox VE. Please refer to NVIDIA's support page to verify server and version compatibility [7] [6] [1].

Some Workstation NVIDIA GPUs do not have vGPU enabled by default, even though they support vGPU, like the RTX A5000. To enable vGPU for these models, switch the display mode using the NVIDIA Display Mode Selector Tool[8]. This will disable the display ports.

For a list of GPUs where this is necessary check their documentation[9]. Note that this should be the exception and should only be necessary for workstation GPUs.

Software Versions

The installation is supported on the following versions of Proxmox VE, Linux kernel, and NVIDIA drivers:

pve-manager Kernel vGPU Software Branch NVIDIA Host drivers
8.3.4 6.8.12-8-pve 18.0 570.124.03
8.3.4 6.11.11-1-pve 18.0 570.124.03
Older, now outdated, tested versions.
pve-manager Kernel vGPU Software Branch NVIDIA Host drivers
7.2-7 5.15.39-2-pve 14.1 510.73.06
7.2-7 5.15.39-2-pve 14.2 510.85.03
7.4-3 5.15.107-2-pve 15.2 525.105.14
7.4-17 6.2.16-20-bpo11-pve 16.0 535.54.06
8.1.4 6.5.11-8-pve 16.3 535.154.02
8.1.4 6.5.13-1-pve 16.3 535.154.02
8.2.8 6.8.12-4-pve 17.3 550.90.05
8.2.8 6.11.0-1-pve 17.3 550.90.05
8.2.8 6.8.12-4-pve 17.4 550.127.06
8.2.8 6.11.0-1-pve 17.4 550.127.06
Yellowpin.svg Note: With 6.8+ based kernels / GRID version 17.3+, the lower level interface of the driver changed and requires qemu-server ≥ 8.2.6 to be installed on the host.

It is recommended to use the latest stable and supported version of Proxmox VE and NVIDIA drivers. However, newer versions in one vGPU Software Branch should also work for the same or older kernel version.

A mapping of which NVIDIA vGPU software version corresponds to which driver version is available in the official documentation [10].

Since version 16.0, certain cards are no longer supported by the NVIDIA vGPU driver, but are supported by NVIDIA AI Enterprise [1] [11]. The NVIDIA AI Enterprise driver behaves similarly to the vGPU driver. Therefore, the following steps should also apply.

Note that vGPU and NVIDIA AI Enterprise are different products with different licenses, and NVIDIA AI Enterprise is currently not officially supported with Proxmox VE.

Preparation

Before actually installing the host drivers, there are a few steps that need to be done on the Proxmox VE host.

Yellowpin.svg Note: If you need to use a root shell, you can, open one by connecting via SSH or using the node shell on the Proxmox VE web interface.

Enable PCIe Passthrough

Make sure that your system is compatible with PCIe passthrough. See the PCI(e) Passthrough documentation for details.

Additionally, confirm that the following features are enabled in your firmware settings (BIOS/UEFI):

  • VT-d for Intel, or AMD-v for AMD (sometimes named IOMMU)
  • SR-IOV (this may not be necessary for older pre-Ampere GPU generations)
  • Above 4G decoding
  • Alternative Routing ID Interpretation (ARI) (not necessary for pre-Ampere GPUs)

The firmware of your host might use different naming. If you are unable to locate some of these options, refer to the documentation provided by your firmware or motherboard manufacturer.

Yellowpin.svg Note: It is crucial to ensure that both the IOMMU options are enabled in your firmware and the kernel. See General Requirements for PCI(e) Passthrough for how to do that.

Setup Proxmox VE Repositories

Proxmox VE ships with the enterprise repository set up by default, as this repository provides better tested software and is recommended for production use. The enterprise repository needs a valid subscription per node. For evaluation or non-production use cases, you can switch to the public no-subscription repository. Packages in the no-subscription repository get updated more frequently but are not as well tested. There is no difference in available features.

You can use the Repositories management panel in the Proxmox VE web UI for managing package repositories, see the documentation for details.

Update to Latest Package Versions

Proxmox VE uses a rolling release model and should be updated frequently to ensure that your Proxmox VE installation has the latest bug and security fixes, and features available.

You can update your Proxmox VE node using the Updates panel on the web UI.

Prepare using pve-nvidia-vgpu-helper

Since pve-manager version 8.3.4 the pve-nvidia-vgpu-helper tool is included. If you're on an older version, please upgrade to the latest version or install it manually with

apt install pve-nvidia-vgpu-helper

The pve-nvidia-vgpu-helper tool will set up some basics, like blacklisting the nouveau driver, installing header packages, DKMS and so on. You can start the setup with

pve-nvidia-vgpu-helper setup

You will be asked if you want to install missing packages, answer 'y'. Once all the required packages have been successfully installed, you should see this message

All done, you can continue with the NVIDIA vGPU driver installation.


If the nouveau driver was loaded previously, you have to reboot after this step so it isn't loaded afterward.

Yellowpin.svg Note: If you install an opt-in kernel later, you have to also install the corresponding proxmox-header-X.Y package for DKMS to work.

Host Driver Installation

Yellowpin.svg Note: The driver/file versions shown in this section are examples only; use the correct file names for the selected driver you're installing.
Yellowpin.svg Note: If you're using Secure boot, please refer to the Chapter Secure Boot before continuing.

To get started, you will need the appropriate host and guest drivers; see the NVIDIA Virtual GPU Software Quick Start Guide[12] for instructions on how to obtain them. Choose Linux KVM as target hypervisor when downloading.

In our case, we got the following host driver file:

NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run

Copy this file over to your Proxmox VE node, for example with SCP or an SSH file copy tool. If you are on Windows, WinSCP can be used for this step.

To start the installation, you need to make the installer executable first, and then pass the --dkms option when running it, to ensure that the module is rebuilt after a kernel upgrade:

chmod +x NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run
./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms

Follow the steps of the installer.

When you're asked if you want to register the kernel module sources with DKMS, answer 'yes'.

After the installer has finished successfully, you will need to reboot your system, either using the web interface or by executing reboot.

Enabling SR-IOV

On newer NVIDIA GPUs (based on the Ampere architecture and beyond), you must first enable SR-IOV before being able to use vGPU. This can be done manually with the sriov-manage script from NVIDIA (this is lost on reboot).

Alternatively, the pve-nvidia-vgpu-helper package comes with a systemd service template which calls it automatically on every boot.

To enable it, use

systemctl enable --now pve-nvidia-sriov@ALL.service

You can replace ALL with a specific PCI ID (like 0000:01:00.0) if you only want to enable it for a specific card.

This will then run before the NVIDIA vGPU daemons and the Proxmox VE virtual guest auto start-up. Due to the --now parameter, it will be started immediately.

Verify that there are multiple virtual functions for your device with:

# lspci -d 10de:

In our case, there are now 24 virtual functions in addition to the physical card (01:00.0):

01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.4 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.5 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.6 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.7 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.1 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.2 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.3 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.4 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.5 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.6 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:01.7 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.1 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.2 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.3 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.4 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.5 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.6 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:02.7 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:03.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:03.1 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:03.2 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:03.3 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)

Create a PCI Resource Mapping

For convenience and privilege separation, you can now create resource mappings for PCI devices. This can contain multiple PCI IDs, such as all virtual functions. The first available ID is automatically selected when the guest is started. Go to Datacenter → Resource mappings to create a new one. For details, see Resource Mapping

In the resource mapping you need to enable 'Use with mediated devices'[13] and select all relevant devices. For GPUs with SR-IOV (Ampere and later), this means the virtual functions which the card exposes. For example, using an RTX A5000 card, we want to select all virtual functions:

Pve-vgpu-mapping.png

Guest Configuration

General Setup

First, set up a VM as you normally would, without adding a vGPU. This can be done either with the Virtual Machine wizard in the Web UI or via the CLI tool qm. For guest specific notes, see for example Windows 11 guest best practices.

Please note that all Linux commands shown are assumed to be run as a privileged user. For example, directly as the root user, or prefixed with sudo.

Remote Desktop Software

Since the built-in VNC and SPICE console cannot display the virtual display provided by the vGPU, you need some kind of remote desktop software installed in the guest to access the guest. There are many options available, see the NVIDIA documentation[2] or Wikipedia Comparison of Remote Desktop Software for examples.

We show how to enable two examples here, Remote Desktop for Windows 10/11, and VNC (via x11vnc) on Linux:

Remote Desktop on Windows 10/11

To enable Remote Desktop on Windows 10/11, go to Settings → System → Remote Desktop and enable the Remote Desktop option.

VNC on Linux via x11vnc (Ubuntu/Rocky Linux)

Note that this is just an example; how you want to configure remote desktops on Linux will depend on your use case.

Ubuntu 24.04 and Rocky Linux 9 ship with GDM3 + Gnome per default, which make it a bit harder to share the screen with x11vnc. So the first step is to install a different display manager. We successfully tested LightDM here, but others may work as well.

Note that for Rocky linux you might need to enable the EPEL repository beforehand with:

# dnf install epel-release

First, we install and activate the new display manager:

Ubuntu:

# apt install lightdm

Select 'LightDM' as default login manager when prompted.

Rocky Linux:

# dnf install lightdm
# systemctl disable --now gdm.service
# systemctl enable --now lightdm.service

After that install x11vnc with

Ubuntu:

# apt install x11vnc

Rocky Linux:

# dnf install x11vnc

We then added a systemd service that starts the VNC server on the x.org server provided by LightDM in /etc/systemd/system/x11vnc.service

[Unit]
Description=Start x11vnc
After=multi-user.target

[Service]
Type=simple
ExecStart=/usr/bin/x11vnc -display :0 -auth /var/run/lightdm/root/:0 -forever -loop -repeat -rfbauth /etc/x11vnc.passwd -rfbport 5900 -shared -noxdamage

[Install]
WantedBy=multi-user.target

You can set the password by executing:

# x11vnc -storepasswd /etc/x11vnc.passwd
# chmod 0400 /etc/x11vnc.passwd

On Rocky Linux, you might need to allow VNC in the firewall:

# firewall-cmd --permanent --add-port=5900/tcp

After setting up LightDM and x11vnc and restarting the VM, you should now be able to connect via VNC.

vGPU Configuration

After configuring the VM to your liking, shut down the VM and add a vGPU by selecting one of the virtual functions and selecting the appropriate mediated device type.

For example:

Via the CLI:

qm set VMID -hostpci0 01:00.4,mdev=nvidia-660

Via the web interface:

Selecting a vGPU model

To find the correct mediated device type, you can use pvesh get /nodes/NODENAME/hardware/pci/MAPPINGNAME/mdev. This will query sysfs for all supported types that can be created. Note that, depending on the driver and kernel versions in use, not all models may be visible here, but only those that are currently available.

NVIDIA Guest Driver Installation

Windows 10/11

Refer to their documentation[10] to find a compatible guest driver to host driver mapping. For example:

553.24_grid_win10_win11_server2022_dch_64bit_international.exe

Start the installer and follow the instructions, then, after it finished, restart the guest as prompted.

From this point on, Proxmox VE's built-in noVNC console will no longer work, so use your desktop sharing software to connect to the Guest. Now you can use the vGPU for starting 3D applications such as Blender, 3D games, etc.


Ubuntu Desktop

To install the NVIDIA driver on Ubuntu, use apt to install the .deb package that NVIDIA provides for Ubuntu. Check the NVIDIA documentation[10] for a compatible guest driver to host driver mapping.

In our case, this was nvidia-linux-grid-550_550.127.05_amd64.deb. For that to work you must prefix the relative path, for example ./ if the .deb file is located in the current directory.

# apt install ./nvidia-linux-grid-550_550.127.05_amd64.deb

Then you must use NVIDIA's tools to configure the x.org configuration with:

# nvidia-xconfig

Now you can reboot and use a VNC client to connect and use the vGPU for 3D applications.

Rocky Linux

To install the NVIDIA driver on Rocky Linux, use dnf to install the .rpm package that NVIDIA provides for Red Hat based distributions. Check the NVIDIA documentation[10] for a compatible guest driver to host driver mapping.

In our case, this was nvidia-linux-grid-550-550.127.05-1.x86_64.rpm. If the file is located in the current directory, run:

# dnf install nvidia-linux-grid-550-550.127.05-1.x86_64.rpm

Then you must use NVIDIA's tools to configure the x.org configuration with:

# nvidia-xconfig

Now you can reboot and use a VNC client to connect and use the vGPU for 3D applications.

CUDA on Linux

If you want to use CUDA on a Linux Guest, you might need to install the CUDA Toolkit manually[14]. Check the NVIDIA documentation which version of CUDA is supported for your vGPU drivers.

Guest vGPU Licensing

To use the vGPU without restriction, you must adhere to NVIDIA's licensing. Check the NVIDIA vGPU documentation[15] for instructions on how to do so.

For setting up a DLS (Delegated License Service), see NVIDIAs DLS Documentation[16].

Tip: Ensure that the guest system time is properly synchronized using NTP. Otherwise, the guest will be unable to request a license for the vGPU.

Troubleshooting

A warning like the following might get logged by QEMU on VM startup. This usually only happens on consumer hardware which does not support PCIe AER[17] error recovery properly, it generally should not have any adverse effects on normal operation, but PCIe link errors might not be (soft-)recoverable in such cases.

 kvm: -device vfio-pci,host=0000:09:00.5,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: warning: vfio 0000:09:00.5: Could not enable error recovery for the device

Known Issues

Windows 10/11 'Fast Startup'

In Windows 10/11 'fast startup' is enabled by default. When enabled, a shutdown via ACPI or the start menu will use 'hybrid shutdown'. The next boot will fail with a blue screen and the vGPU will be disabled.

Disable 'fast boot' in Windows to prevent this. In the Control Panel → Power Options → Choose what the power button does → Uncheck 'fast startup'.

Alternatively, disable hybrid shutdown in a command prompt with admin privileges:

Powercfg -h off

Secure Boot

When booting the host with secure boot, kernel modules must be signed with a trusted key. We will show you how to set up your host so that the NVIDIA driver is signed and ready to load. For more details, see Secure Boot Setup. To be able to enroll the keys into the UEFI, make sure you have access to the physical display output during boot. This is necessary for confirming the key import. On servers, this can usually be achieved with IPMI/iKVM/etc.

Before installing the NVIDIA Host driver, we need to install a few prerequisites to enroll the DKMS signing key into UEFI:

apt install shim-signed grub-efi-amd64-signed mokutil

Now you can install the NVIDIA driver, but with an additional parameter:

./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms --skip-module-load

When asked if the installer should sign the module, select 'no'.

After the installer is finished, we now want to rebuild the kernel modules with DKMS, which will sign the kernel module for us with a generated key. First, check what module version is installed with:

dkms status

Which will output a line like this:

nvidia/550.144.02, 6.8.12-6-pve, x86_64: installed 

You need to rebuild and reinstall the listed module with (replace the version with the one on your system)

dkms build -m nvidia -v 550.144.02 --force
dkms install -m nvidia -v 550.144.02 --force

This will ensure that the modules are signed with the DKMS key located in /var/lib/dkms/mok.pub If you have not already done so, enroll the DKMS key as described in Using DKMS with Secure Boot.

You should then be able to load the signed NVIDIA kernel module. You can verify this by checking if the PCI devices have their driver loaded, e.g. with

lspci -d 10de: -nnk

It should say

Kernel driver in use: nvidia

You can now continue with the next step after the driver installation.

Notes

  1. Jump up to: 1.0 1.1 1.2 NVIDIA GPUs supported by vGPU https://docs.nvidia.com/vgpu/gpus-supported-by-vgpu.html
  2. Jump up to: 2.0 2.1 Latest NVIDIA vGPU Documentation https://docs.nvidia.com/vgpu/latest/index.html
  3. NVIDIA vGPU Linux with KVM Documenation https://docs.nvidia.com/vgpu/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html
  4. Proxmox VE Subscription Agreement https://www.proxmox.com/en/downloads/proxmox-virtual-environment/agreements/proxmox-ve-subscription-agreement
  5. Proxmox VE Subscriptions https://www.proxmox.com/en/products/proxmox-virtual-environment/pricing
  6. Jump up to: 6.0 6.1 NVIDIA Qualified System Catalog https://marketplace.nvidia.com/en-us/enterprise/qualified-system-catalog/
  7. NVIDIA vGPU Support Matrix https://docs.nvidia.com/vgpu/latest/product-support-matrix/index.html
  8. NVIDIA Display Mode Selector Tool https://developer.nvidia.com/displaymodeselector
  9. Latest NVIDIA vGPU user guide: Switching the Mode of a GPU that Supports Multiple Display Modes https://docs.nvidia.com/vgpu/latest/grid-vgpu-user-guide/index.html#displaymodeselector
  10. Jump up to: 10.0 10.1 10.2 10.3 NVIDIA vGPU Driver versions https://docs.nvidia.com/vgpu/#driver-versions
  11. NVIDIA GPUs supported by NVIDIA AI Enterprise https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html
  12. Getting your NVIDIA vGPU Software: https://docs.nvidia.com/vgpu/latest/grid-software-quick-start-guide/index.html#getting-your-nvidia-grid-software
  13. Mediated Devices https://pve.proxmox.com/wiki/PCI(e)_Passthrough#_mediated_devices_vgpu_gvt_g
  14. NVIDIA CUDA Toolkit Download https://developer.nvidia.com/cuda-downloads
  15. NVIDIA vGPU Licensing User Guide: https://docs.nvidia.com/vgpu/latest/grid-licensing-user-guide/index.html
  16. NVIDIA DLS Documentation https://docs.nvidia.com/license-system/latest/
  17. PCI Express Advanced Error Reporting Driver Guide: https://www.kernel.org/doc/html/v6.12-rc4/PCI/pcieaer-howto.html