Split lock detection

From Proxmox VE
Jump to navigation Jump to search

Symptoms

If the host CPU supports split lock detection and VM guests perform misaligned atomic memory access, the host Linux kernel may artificially slow down the virtual CPUs of the guests as part of the split lock mitigation. For the VM, this looks like a virtual CPU core freezes for several milliseconds every time a misaligned atomic memory access is performed. In the case of Windows VMs, in-guest latency monitors may report a high amount of time spent handling Deferred Procedure Calls (DPCs).

Your VM is affected by this if all of the following is true:

  • Your Proxmox VE host is running kernel 5.19 or higher. You can check the running kernel with the following command:
pveversion
  • Your Proxmox VE host CPU supports the split_lock_detect flag. This is the case if the following command prints "1":
lscpu | grep -c 'split_lock_detect'
  • The journal of your Proxmox VE host contains warnings like the following:
Apr 15 09:20:19 myhost kernel: x86/split lock detection: #AC: CPU 0/KVM/5677 took a split_lock trap at address: 0x56339a9888c3
  • Install the bpftrace package on the Proxmox VE host: apt install bpftrace. Then, run the following bpftrace command line:
bpftrace -e 'kfunc:vmlinux:split_lock_warn /0<*(uint32*)kaddr("sysctl_sld_mitigate")/{time("%H:%M:%S: "); printf("slowing down %s (pid=%d) for >10ms\n", comm, pid);}'

If you run the VM workload and the bpftrace command starts printing a lot of output mentioning CPU n/KVM and the PID of your VM, split lock mitigation artificially slows down your VM.

For example:

09:49:02: slowing down CPU 0/KVM (pid=10433) for >10ms
09:49:02: slowing down CPU 2/KVM (pid=10433) for >10ms
09:49:03: slowing down CPU 3/KVM (pid=10433) for >10ms
09:49:03: slowing down CPU 1/KVM (pid=10433) for >10ms
09:49:03: slowing down CPU 0/KVM (pid=10433) for >10ms
09:49:03: slowing down CPU 2/KVM (pid=10433) for >10ms

See below for options how to remove these artificial slowdowns.

Yellowpin.svg Note: If the bpftrace command only sporadically prints a message (e.g., every few minutes), split lock detection probably does not have a large impact on VM performance.
Yellowpin.svg Note: For VMs using OVMF, the bpftrace command may print some messages during boot. This is not problematic and does not impact performance of the running VM. Split lock mitigation only affects VM performance if a large number of messages continues to appear after boot.

Background

As explained by this LWN article, split locks occur when an atomic instruction accesses memory that spans two cache lines, for example because of a misaligned memory access. To ensure that the instruction sees consistent data, the processor core acquires a global bus lock. This is much slower than an access within one cache line, and also slows down other cores. In case the host is a hypervisor running potentially untrusted VMs, this opens the door for denial-of-service attacks: A VM that performs a lot of unaligned memory accesses can force the processor to repeatedly acquire the global bus lock, which indirectly slows down other VMs.

Starting with Linux kernel 5.19, every time a thread takes a split lock, the Linux kernel artificially slows down the thread by making it sleep for 10 milliseconds and synchronizing it with other threads taking split locks. This is controlled by the split_lock_mitigate sysctl parameter. See the related LWN article for more details. This is done to make denial-of-service attacks infeasible and keeping the offending thread from affecting the other running processes. The first time a split lock is detected for a thread, the kernel prints a warning to the journal (x86/split lock detection: #AC: [...] took a split_lock trap at address: [...]).

If a VM performs a lot of misaligned memory access, the host kernel will artificially slow down the VM. The bpftrace command above prints a message every time the kernel artificially slows down a thread. In the VM, this will be noticeable as virtual CPU freezes with various kinds of effects. In the case of Windows VMs, in-guest latency monitors may report a high amount of time spent handling DPCs during that time.

Options

If a VM is artificially slowed down by split lock mitigation,

  • the preferred solution is to fix the in-guest workload that is responsible for the misaligned memory accesses. The responsible in-guest process should be adjusted to only perform aligned memory access, which will eliminate the split locks and thus the artificial slowdowns. However, this is only possible if you control the responsible in-guest process (e.g., because you can modify its source code).
  • If you cannot fix the in-guest workload, you can consider disabling the split lock mitigation. However, if the split lock mitigation is disabled, malicious VM guests can perform denial-of-service attacks against the host and other VM guests, as described on LWN:
    • If you are aware of the risks, you can disable split lock mitigation temporarily by running sysctl -w kernel.split_lock_mitigate=0 (see the kernel docs). You can disable it permanently by creating a file /etc/sysctl.d/50-split-lock.conf with contents kernel.split_lock_mitigate=0, and running sysctl -p /etc/sysctl.d/50-split-lock.conf. You will still get the warnings in the journal, but the bpftrace script should report no more artificial slowdowns.
    • Alternatively, and if you are aware of the risks, you can generally disable split lock detection by adding split_lock_detect=off parameter to the kernel command line and rebooting. See the kernel docs for more information on the split_lock_detect parameter.