[pve-devel] [PATCH qemu-server v4 1/2] config: QEMU AMD SEV enable

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Apr 18 12:39:19 CEST 2024


Am 18/04/2024 um 10:25 schrieb Markus Frank:
> This patch is for enabling AMD SEV (Secure Encrypted
> Virtualization) support in QEMU.
> 
> VM-Config-Examples:
> amd_sev: type=std,nodbg=1,noks=1
> amd_sev: es,nodbg=1,kernel-hashes=1
> 
> Node-Config-Example (gets generated automatically):
> amd_sev: cbitpos=47,reduced-phys-bios=1
> 
> kernel-hashes, reduced-phys-bios & cbitpos correspond to the varibles
> with the same name in qemu.
> 
> kernel-hashes=1 adds kernel-hashes to enable measured linux kernel
> launch since it is per default off for backward compatibility.
> 
> reduced-phys-bios and cbitpos are system specific and can be read out
> with QMP. If not set by the user, a dummy-vm gets started to read QMP
> for these variables out and save them to the node config.
> Afterwards the dummy-vm gets stopped.
> 
> type=std stands for standard sev to differentiate it from sev-es (es)
> or sev-snp (snp) when support is upstream.
> 
> QEMU's sev-guest policy gets calculated with the parameters nodbg & noks
> These parameters correspond to policy-bits 0 & 1.
> If type is 'es' than policy-bit 2 gets set to 1 to activate SEV-ES.
> Policy bit 3 (nosend) is always set to 1, because migration
> features for sev are not upstream yet and are attackable.
> 
> SEV-ES is very experimental since it could not be tested.
> 
> see coherent doc patch
> 
> Signed-off-by: Markus Frank <m.frank at proxmox.com>
> ---
> v4: 
> * reduced lines of code
> * added text that SEV-ES is experimental
> 
>  PVE/API2/Qemu.pm  |  10 ++++
>  PVE/QemuServer.pm | 117 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 127 insertions(+)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 497987f..23d3cd7 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -4616,6 +4616,10 @@ __PACKAGE__->register_method({
>  	# test if VM exists
>  	my $conf = PVE::QemuConfig->load_config($vmid);
>  
> +	if ($conf->{amd_sev}) {
> +	    die "AMD SEV does not support migration\n";
> +	}
> +
>  	# try to detect errors early
>  
>  	PVE::QemuConfig->check_lock($conf);
> @@ -5170,6 +5174,12 @@ __PACKAGE__->register_method({
>  	die "unable to use snapshot name 'pending' (reserved name)\n"
>  	    if lc($snapname) eq 'pending';
>  
> +	my $conf = PVE::QemuConfig->load_config($vmid);
> +
> +	if ($conf->{amd_sev}) {
> +	    die "AMD SEV does not support snapshots\n"
> +	}
> +
>  	my $realcmd = sub {
>  	    PVE::Cluster::log_msg('info', $authuser, "snapshot VM $vmid: $snapname");
>  	    PVE::QemuConfig->snapshot_create($vmid, $snapname, $param->{vmstate},
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 6e2c805..ca26fc5 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -61,6 +61,7 @@ use PVE::QemuServer::Monitor qw(mon_cmd);
>  use PVE::QemuServer::PCI qw(print_pci_addr print_pcie_addr print_pcie_root_port parse_hostpci);
>  use PVE::QemuServer::QMPHelpers qw(qemu_deviceadd qemu_devicedel qemu_objectadd qemu_objectdel);
>  use PVE::QemuServer::USB;
> +use PVE::NodeConfig;

this is from pve-manager and adds a cyclic dependency

>  
>  my $have_sdn;
>  eval {
> @@ -185,6 +186,59 @@ my $agent_fmt = {
>      },
>  };
>  
> +my $sev_fmt = {
> +    type => {
> +	description => "Enable standard SEV with type='std' or enable experimental SEV-ES"
> +	." with the 'es' option.",
> +	type => 'string',
> +	default_key => 1,
> +	format_description => "qemu-sev-type",
> +	enum => ['std', 'es'],
> +	maxLength => 3,
> +    },
> +    nodbg => {
> +	description => "Sets policy bit 0 to 1 to disallow debugging of guest",
> +	type => 'boolean',
> +	format_description => "qemu-sev-nodbg",
> +	default => 0,
> +	optional => 1,
> +    },
> +    noks => {
> +	description => "Sets policy bit 1 to 1 to disallow key sharing with other guests",
> +	type => 'boolean',
> +	format_description => "qemu-sev-noks",
> +	default => 0,
> +	optional => 1,
> +    },
> +    "kernel-hashes" => {
> +	description => "Add kernel hashes to guest firmware for measured linux kernel launch",
> +	type => 'boolean',
> +	format_description => "qemu-sev-kernel-hashes",
> +	default => 0,
> +	optional => 1,
> +    },
> +};
> +PVE::JSONSchema::register_format('pve-qemu-sev-fmt', $sev_fmt);
> +
> +my $sev_node_fmt = {
> +    cbitpos => {
> +	description => "C-bit: marks if a memory page is protected. System dependent",
> +	type => 'integer',
> +	default => 47,
> +	optional => 1,
> +	minimum => 0,
> +	maximum => 100,
> +    },
> +    'reduced-phys-bits' => {
> +	description => "Number of bits the physical address space is reduced by. System dependent",
> +	type => 'integer',
> +	default => 1,
> +	optional => 1,
> +	minimum => 0,
> +	maximum => 100,
> +    },
> +};

even if it was OK to use PVE::NodeConfig here, which currently it isn't due to
the cyclic dependency as mentioned above, above format should live then in
NodeConfig, as it's rather ugly to have this at a completely different, and
unrelated, source code location than the actual rest of the config..

In the mid-term we could split out the node specific management from pve-manager
into an implementation (and possible also a separate api) package which could be
placed higher up our dependency stack, and then could be used here too.

But that's a significant amount of work, at least if it should be done somewhat
right and also follow the rat-tails of other cleanu-ps (like splitting ACME
plugin config out from the API specific perl module into a non-API specific one).

Anyhow, some potential alternative without any config (for now) futher below
in get_sev_parameters_from_node.

> +
>  my $vga_fmt = {
>      type => {
>  	description => "Select the VGA type.",
> @@ -366,6 +420,12 @@ my $confdesc = {
>  	description => "Memory properties.",
>  	format => $PVE::QemuServer::Memory::memory_fmt
>      },
> +    amd_sev => {
> +	description => "Secure Encrypted Virtualization (SEV) features by AMD CPUs",
> +	optional => 1,
> +	format => 'pve-qemu-sev-fmt',
> +	type => 'string',
> +    },
>      balloon => {
>  	optional => 1,
>  	type => 'integer',
> @@ -4084,6 +4144,26 @@ sub config_to_command {
>      }
>      push @$machineFlags, "type=${machine_type_min}";
>  
> +    if ($conf->{amd_sev}) {
> +	if ($conf->{bios} && $conf->{bios} ne 'ovmf') {
> +	    die "For using SEV you need to change your guest bios to ovmf.\n";
> +	}
> +	my $amd_sev_conf = parse_property_string($sev_fmt, $conf->{amd_sev});
> +	my $node_config = get_sev_parameters_from_node($nodename, $arch);
> +	my $memobjcmd = 'sev-guest,id=sev0,cbitpos='.$node_config->{cbitpos}
> +	    .',reduced-phys-bits='.$node_config->{'reduced-phys-bits'};

ineedsomeseparatorstoreadcodeandvariableswithoutwastingmentalenergyplease!

As you see, it's not easy to read such things if one isn't already accustomed
to the code, e.g., because they wrote it themselves, so use something like
$sev_mem_object here (it isn't really a command on its own after all).

> +	my $policy = 0b0;
> +	$policy += 0b1 if ($amd_sev_conf->{nodbg});
> +	$policy += 0b10 if ($amd_sev_conf->{noks});
> +	$policy += 0b100 if ($amd_sev_conf->{type} eq 'es');
> +	# disable migration with bit 3 nosend to prevent amd-sev-migration-attack
> +	$policy += 0b1000;
> +	$memobjcmd .= ',policy='.sprintf("%#x", $policy);
> +	$memobjcmd .= ',kernel-hashes=on' if ($amd_sev_conf->{'kernel-hashes'});
> +	push @$devices, '-object' , $memobjcmd;
> +	push @$machineFlags, 'confidential-guest-support=sev0';
> +    }
> +
>      push @$cmd, @$devices;
>      push @$cmd, '-rtc', join(',', @$rtcFlags) if scalar(@$rtcFlags);
>      push @$cmd, '-machine', join(',', @$machineFlags) if scalar(@$machineFlags);
> @@ -4127,6 +4207,43 @@ sub check_rng_source {
>      }
>  }
>  
> +sub get_sev_parameters_from_node {
> +    my ($nodename, $arch) = @_;
> +    # Get reduced-phys-bits & cbitpos from QMP, if not set
> +    my $node_config = PVE::NodeConfig::load_config($nodename);
> +    my $sev_node_config;
> +    if ($node_config->{amd_sev}) {
> +	$sev_node_config = parse_property_string($sev_node_fmt, $node_config->{amd_sev});
> +    }
> +    if (
> +	!$sev_node_config->{'reduced-phys-bits'}
> +	|| !$sev_node_config->{cbitpos}
> +    ) {
> +	my $fakevmid = -1;
> +	my $qemu_cmd = get_command_for_arch($arch);
> +	my $pidfile = PVE::QemuServer::Helpers::pidfile_name($fakevmid);
> +	my $default_machine = $default_machines->{$arch};
> +	my $cmd = [
> +	    $qemu_cmd,
> +	    '-machine', $default_machine,
> +	    '-display', 'none',
> +	    '-chardev', "socket,id=qmp,path=/var/run/qemu-server/$fakevmid.qmp,server=on,wait=off",
> +	    '-mon', 'chardev=qmp,mode=control',
> +	    '-pidfile', $pidfile,
> +	    '-S', '-daemonize'
> +	];
> +	my $rc = run_command($cmd, noerr => 1, quiet => 0);
> +	die "QEMU flag querying VM exited with code " . $rc . "\n" if $rc;

This seems relatively expensive for that it seems to not change at all for a
specific HW, or at least a specific boot?

So this should be rather:
- done once and cached, at least in-memory (module wide variable) here or
  even in some file in run (e.g., written at boot).
- avoid starting a VM just for this, as you only need a small subset of
  values returned by sev_get_capabilities. One is reduced_phys_bits, which
  is hard-coded to 1 if SEV is supported (which we do not really have to care
  here at this moment). The second is cbitpos, which can be determined by
  a cpuid call, which is trivial assembly (but not trivial to do directly
  from perl).

So, I'd prefer a tiny C program that gets executed on boot, calls cpuid and
writes the resulting info into run – it would be easy to throw in if SEV, SEV-ES,
... is supported along the way, i.e. something like (untested):


```
uint32_t eax, ebx, ecx, edx;

// query Encrypted Memory Capabilities, see:
// https://en.wikipedia.org/wiki/CPUID#EAX=8000001Fh:_Encrypted_Memory_Capabilities
uint32_t query_function = 0x8000001F;
asm volatile("cpuid" : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx) : "0"(query_function);

bool sev_support = (eax & (1<<1)) != 0;
bool sev_es_support = (eax & (1<<3)) != 0;
bool sev_snp_support = (eax & (1<<4)) != 0;

uint8_t c_bit = ebx & 0x3f;

printf("{\"c-bit\": %u, \"sev\": %s, ...}", c_bit, sev_support ? "true" : "false"/*, ...*/);

```

I put a full-fledged example on my staff repo under host-cpu-capability-detector,
I'd keep in not to focused on just SEV as this could be used to detect other
capabilities as well.



> +	my $res = mon_cmd($fakevmid, 'query-sev-capabilities');
> +	vm_stop(undef, $fakevmid, 1, 1, 10, 0, 1);
> +	$sev_node_config->{'reduced-phys-bits'} = $res->{'reduced-phys-bits'};
> +	$sev_node_config->{cbitpos} = $res->{cbitpos};
> +	$node_config->{amd_sev} = PVE::JSONSchema::print_property_string($sev_node_config, $sev_node_fmt);
> +	PVE::NodeConfig::write_config($nodename, $node_config);

yuck, writing an unrelated config to cache (?!) some HW related info are
multiple things that one should try hard to avoid doing!

And to be sure, the node config then was never used as actual config, but
really just to cache the HW SEV capabilities? meh..

> +    }
> +    return $sev_node_config;
> +}
> +
>  sub spice_port {
>      my ($vmid) = @_;
>  
 




More information about the pve-devel mailing list