[pve-devel] [RFC v0/2 manager] statd: cpuset scheduling for containers

Wolfgang Bumiller w.bumiller at proxmox.com
Thu Oct 20 13:43:51 CEST 2016


These patches are two versions of cpuset scheduling:

The first is similar to what lxd does: it just spreads containers across cores
balanced by numbers. This is simple and effectice, but can lead to cases where
two busy containers share the same cores while other containers are sitting
around idle on underutilized cores.
This takes containers with fixed cpusets into account (iow. ones where the user
manually set lxc.cgroup.cpuset.cpus).

The second is a more reactive approach which tries to react to high core usage:
If a core is used >90% of the time and multiple containers are assigned to it
it'll try to spread them across less-utilized cores, unless most of the
utilization comes from the host itself (by comparing the stats of the cpuacct
cgroup's /lxc subdirectory to the stats from its root (/) directory.)
If all cores are >90% utilized it'll simply do nothing.
It does not take the number of threads/processes of each container into
account, so this is not perfect and can cause pointless reordering when your
system is "mostly under high-ish load most but not all of the time" - but
we're looking at time frames of at least 10 seconds, so it should be fine most
of the time.

Both should cover the case where the host decides to manually change
/sys/fs/cgroups/cpuset/lxc/cpuset.cpus to limit containers altogether.




More information about the pve-devel mailing list