[pve-devel] applied: [PATCH container v2] lxc_config: mount /sys as mixed for unprivileged by default

Wolfgang Bumiller w.bumiller at proxmox.com
Thu Mar 19 09:19:18 CET 2020


applied

On 3/18/20 10:46 AM, Thomas Lamprecht wrote:
> CONTAINER_INTERFACE[0] is something systemd people call their API and
> we need to adapt to it a bit, even if it means doing stupid
> unnecessary things, as else systemd decides to regress and suddenly
> break network stack in CT after an upgrade[1].
> 
> This mounts the parent /sys as mixed, which is:
>> mount /sys as read-only but with /sys/devices/virtual/net writable.
> -- man 5 lxc.container.conf
> 
> Allow users to overwrite that with a features knob, as surely some
> run into other issues else and manually adding a "lxc.mount.auto"
> entry in the container .conf is not an nice user experience for most.
> 
> Fixes the system regression in up to date Arch installations
> introduced by[2].
> 
> [0]: https://systemd.io/CONTAINER_INTERFACE/
> [1]: https://github.com/systemd/systemd/issues/15101#issuecomment-598607582
> [2]: https://github.com/systemd/systemd/commit/bf331d87171b7750d1c72ab0b140a240c0cf32c3
> 
> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> ---
> 
> changes v1 -> v2:
> * use sys:mixed and only do this for upriv. CTs
> * add knob to allow easier opting out of this
> 
>   src/PVE/LXC.pm        | 6 ++++++
>   src/PVE/LXC/Config.pm | 7 +++++++
>   2 files changed, 13 insertions(+)
> 
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index 0742a53..df52afa 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -662,6 +662,12 @@ sub update_lxc_config {
>   	$raw .= "lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0\n";
>       }
>   
> +    if ($unprivileged && !$features->{force_rw_sys}) {
> +	# unpriv. CT default to sys:rw, but that doesn't always plays well with
> +	# systemd, e.g., systemd-networkd https://systemd.io/CONTAINER_INTERFACE/
> +	$raw .= "lxc.mount.auto = sys:mixed\n";
> +    }
> +
>       # WARNING: DO NOT REMOVE this without making sure that loop device nodes
>       # cannot be exposed to the container with r/w access (cgroup perms).
>       # When this is enabled mounts will still remain in the monitor's namespace
> diff --git a/src/PVE/LXC/Config.pm b/src/PVE/LXC/Config.pm
> index e88ba0b..0909773 100644
> --- a/src/PVE/LXC/Config.pm
> +++ b/src/PVE/LXC/Config.pm
> @@ -331,6 +331,13 @@ my $features_desc = {
>   	    ." This requires a kernel with seccomp trap to user space support (5.3 or newer)."
>   	    ." This is experimental.",
>       },
> +    force_rw_sys => {
> +	optional => 1,
> +	type => 'boolean',
> +	default => 0,
> +	description => "Mount /sys in unprivileged containers as `rw` instead of `mixed`."
> +	    ." This can break networking under newer (>= v245) systemd-network use."
> +    },
>   };
>   
>   my $confdesc = {
> 





More information about the pve-devel mailing list