Saturday, February 18, 2023

Dynamic VFIO binding and unbinding for GPU passthrough

Recently, I installed Proxmox with a desktop environment on a HP Z240 SFF computer for the purpose of learning how to do dynamic VFIO binding and unbinding of a discrete GPU for GPU passthrough to a virtual machine.

First, a bit about VFIO binding and GPU passthrough. In order to pass a GPU (or some other PCI device) to a VM, it is necessary that the host computer is not using that device. That device is then binded to VFIO so that it can be passed to the VM. The usual method, given in the Proxmox docs, is to pass the devices to the vfio module at boot. It is a straightforward method, but this means that the device will not be available to the host at all.

(Another disadvantage of the method in the official docs is when you have several devices with the same vendor:device ID pair, and you only want to passthrough some of them. Passing vendor:device ID pairs to the vfio module at boot will mean that all devices with that ID pair will not be available for the host.)

In my case, the HP Z240 SFF has an Intel Xeon E3-1245v5 CPU, which comes with Intel HD Graphics P530 as its integrated GPU. At the same time, I installed a Nvidia Quadro P600 as a discrete GPU in its PCIe slot. Here, I want to use the integrated GPU for the host, and passthrough the discrete GPU to a VM.

1. Make sure that the integrated GPU is used during boot. Usually, there will be a setting in BIOS to select the integrated GPU as the primary video device. This setting needs to be selected.

2. Make sure X11 uses the integrated GPU. This is done by creating a file called /etc/X11/xorg.conf.d/10-i915.conf with the following.
Section "ServerLayout"
    Identifier "layout"
    Screen 0 "intel"
EndSection

Section "Device"
    Identifier "intel"
    Driver "modesetting"
    BusID "PCI:0@0:2:0"
EndSection

Section "Screen"
    Identifier "intel"
    Device "intel"
EndSection


Replace the portion in red with the correct PCI slot of your integrated GPU. For my case, running
lspci | grep VGA
gives me
00:02.0 VGA compatible controller: Intel Corporation HD Graphics P530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P600] (rev a1)
My integrated GPU is at 00:02.0, so
BusID "PCI:0@0:2:0"
was what I used.

3. Follow the Proxmox docs to do the necessary kernel boot parameters and such for passthrough. Open /etc/default/grub and change the line containing GRUB_CMDLINE_LINUX_DEFAULT to
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt pcie_acs_override=downstream"
Then, run
update-grub
to make sure the new parameters are reflected.
Create /etc/modprobe.d/kvm.conf with the following
options kvm ignore_msrs=1 report_ignored_msrs=0
As the Quadro P600 has HDMI audio, I also created /etc/modprobe.d/snd-hda-intel.conf with the following
options snd-hda-intel enable_msi=1
Then run
update-initramfs -u -k all
to make sure the modules are properly reflected.

4. For switching between integrated and discrete GPU, I also installed bumblee using
apt install bumblebee
which will also add a list of modules that will be blacklisted so that Nvidia drivers do not get loaded automatically.

5. Next, I created a directory in the /root directory called passthrough, which will be used to hold various scripts needed.
mkdir /root/passthrough

6. Create the configuration file /root/passthrough/kvm.conf which will hold the PCI addresses of the device to pass through.
## Contents of kvm.conf
VIRSH_GPU_VIDEO=0000:01:00.0
VIRSH_GPU_AUDIO=0000:01:00.1
GPU_VIDEO_DRIVER=nvidia
GPU_AUDIO_DRIVER=snd_hda_intel


7. Create the vfio binding/unbinding helper script called /root/passhthrough/vfio.sh with the following:
#!/bin/bash

set -x

function unbind() {
    busid=$1

    vendor=$(cat /sys/bus/pci/devices/$busid/vendor)
    device=$(cat /sys/bus/pci/devices/$busid/device)

    modprobe vfio-pci #>/dev/null 2>/dev/null

    if [ -e /sys/bus/pci/devices/$busid/driver ]; then
        printf "Unbinding %s (%s:%s)\n" "$busid" "$vendor" "$device"
        echo $busid > /sys/bus/pci/devices/$busid/driver/unbind
        #echo vfio-pci > /sys/bus/pci/devices/$dev/driver/driver_override
    fi
    echo -n "$vendor $device" > /sys/bus/pci/drivers/vfio-pci/new_id &&     echo "$busid -> vfio-pci"
}

function rebind() {
    busid=$1; shift
    drv=$1; shift

    vendor=$(cat /sys/bus/pci/devices/$busid/vendor)
    device=$(cat /sys/bus/pci/devices/$busid/device)

    printf "Unbinding %s (%s:%s)\n" "$busid" "$vendor" "$device"
    echo -n "$vendor $device" > /sys/bus/pci/drivers/vfio-pci/remove_id
    echo $busid > /sys/bus/pci/devices/$busid/driver/unbind
    echo $busid > /sys/bus/pci/drivers/$drv/bind && echo "$busid -> $drv"
}

cmd=$1; shift || true
case $cmd in
    unbind)
        unbind $@
        ;;
    rebind)
        rebind $@
        ;;
    *)
        printf "Usage: %s [unbind <busid> | rebind <busid> <drv>]\n" "$0"
        exit 1
        ;;
esac  


Remember to make it executable by
chmod +x /root/passthrough/vfio.sh

8. Create the script to bind the GPU called /root/passhthrough/bind_vfio.sh with the following:
#!/bin/bash

## Load the config file
source "/root/passthrough/kvm.conf"

systemctl stop bumbleed
systemctl stop nvidia-persistenced

# Unload Nvidia
modprobe -r nvidia_drm
modprobe -r nvidia_modeset
modprobe -r drm_kms_helper
modprobe -r nvidia
modprobe -r i2c_nvidia_gpu
modprobe -r drm
modprobe -r nvidia_uvm

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci
modprobe vfio_virqfd

## Unbind gpu from nvidia and bind to vfio
/root/passthrough/vfio.sh unbind $VIRSH_GPU_VIDEO
/root/passthrough/vfio.sh unbind $VIRSH_GPU_AUDIO

 
Remember to make it executable by
chmod +x /root/passthrough/bind_vfio.sh
 
9. Create the script to bind the GPU called /root/passhthrough/unbind_vfio.sh with the following:
#!/bin/bash

## Load the config file
source "/root/passthrough/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
/root/passthrough/vfio.sh rebind $VIRSH_GPU_VIDEO $GPU_VIDEO_DRIVER
/root/passthrough/vfio.sh rebind $VIRSH_GPU_AUDIO $GPU_AUDIO_DRIVER

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio
modprobe -r vfio_virqfd

# Reload nvidia modules
modprobe nvidia_drm
modprobe nvidia_modeset
modprobe drm_kms_helper
modprobe nvidia
modprobe drm
modprobe nvidia_uvm

systemctl start nvidia-persistenced
systemctl start bumbleed


Remember to make it executable by
chmod +x /root/passthrough/unbind_vfio.sh

10. Test the scripts by first running bind_vfio.sh, after which
lspci -v -s 01:00
should show
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P600] (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia

01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel


11. Make sure unbinding works by running unbind_vfio.sh, after which
lspci -v -s 01:00
should show
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P600] (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidia

01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel


12. If this works, create a hook script
/var/lib/vz/snippets/gpu-hookscript.pl
with the following:
#!/usr/bin/perl

# Exmple hook script for PVE guests (hookscript config option)
# You can set this via pct/qm with
# pct set <vmid> -hookscript <volume-id>
# qm set <vmid> -hookscript <volume-id>
# where <volume-id> has to be an executable file in the snippets folder
# of any storage with directories e.g.:
# qm set 100 -hookscript local:snippets/hookscript.pl

use strict;
use warnings;

print "GUEST HOOK: " . join(' ', @ARGV). "\n";

# First argument is the vmid

my $vmid = shift;

# Second argument is the phase

my $phase = shift;

if ($phase eq 'pre-start') {

    # First phase 'pre-start' will be executed before the guest
    # is started. Exiting with a code != 0 will abort the start

    print "$vmid is starting, doing preparations.\n";

    system("/root/passthrough/bind_vfio.sh");

    # print "preparations failed, aborting."
    # exit(1);

} elsif ($phase eq 'post-start') {

    # Second phase 'post-start' will be executed after the guest
    # successfully started.

    print "$vmid started successfully.\n";

} elsif ($phase eq 'pre-stop') {

    # Third phase 'pre-stop' will be executed before stopping the guest
    # via the API. Will not be executed if the guest is stopped from
    # within e.g., with a 'poweroff'

    print "$vmid will be stopped.\n";

} elsif ($phase eq 'post-stop') {

    # Last phase 'post-stop' will be executed after the guest stopped.
    # This should even be executed in case the guest crashes or stopped
    # unexpectedly.

    print "$vmid stopped. Doing cleanup.\n";

    system("/root/passthrough/unbind_vfio.sh");

} else {
    die "got unknown phase '$phase'\n";
}

exit(0);


Remember to make it executable by
chmod +x /var/lib/vz/snippets/gpu-hookscript.pl

13. If you have not created a VM, create a VM (here, it will be VM 100 for this example).

14. This hook script can then be added to a VM by running
qm set 100 --hookscript local:snippets/gpu-hookscript.pl
Here, 100 is the VM number (which can be replaced with the actual VM number).

15. Now, when you start the VM, it should bind the GPU to vfio, and after stopping the VM, it should unbind the GPU from vfio.

Time to test... so I created a Linux Mint VM on this server. After the post install stuff such as updating the packages, I shut down the VM. Then, on the host server, as root, I added the hookscript using 
qm set 100 --hookscript local:snippets/gpu-hookscript.pl
and started the VM.
 
On the host server, I ran
lspci -vv -s 01:00 | grep Kernel
and got:
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

Good! The Quadro P600 was being binded to VFIO.

Next, I proceeded to shutdown the VM. After the VM was stopped, on the host server, I ran
lspci -vv -s 01:00 | grep Kernel
and got:
    Kernel driver in use: nvidia
    Kernel modules: nvidia
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel


Good! The Quadro P600 has been unbinded from VFIO. I could even run nvidia-smi to check that it is available to the host server.
 
I started the VM again, and installed the proprietary Nvidia drivers from within the VM, rebooted the VM, and could confirm that the Nvidia drivers were loaded correctly.
 
Hurray! Dynamic VFIO binding and unbinding!

Note: If your GPU has more "devices" like USB and serial and such, just add them as additional variables to the /root/passthrough/kvm.conf files, and edit /root/passthrough/bind_vfio.sh and /root/passthrough/unbind_vfio.sh to add in the commands to bind/unbind these devices.
For example, in /root/passthrough/kvm.conf:
VIRSH_GPU_USB=0000:01:00.2
GPU_USB_DRIVER=xhci_pci
Then, in /root/passthrough/bind_vfio.sh, add:
/root/passthrough/vfio.sh unbind $VIRSH_GPU_USB
And in /root/passthrough/unbind_vfio.sh, add:
/root/passthrough/vfio.sh rebind $VIRSH_GPU_USB $GPU_USB_DRIVER
This should allow you to make sure all the "devices" on the GPU are passed to the guest VM.

References used in developing the scripts and such:
https://pve.proxmox.com/wiki/Pci_passthrough
https://github.com/bryansteiner/gpu-passthrough-tutorial  (for hints on the bind_vfio.sh and unbind_vfio.sh scripts)

No comments: