[pve-devel] [PATCH] add numa options

Cesar Peschiera brain at click.com.py
Tue Jan 6 17:33:42 CET 2015


Hi Alexandre

Please excuse me if i don't talk with property, i meant the cpu pinning that
will have pve-manager and QEMU in the next release. Ie, that i would like to
have the option of enable or disable in PVE GUI the cpu pinning that QEMU
can apply for each VM, if so, i will can to choose if i want that QEMU or
the application inside of the VM managed the cpu pinning with the numa
nodes. And the DBA says that the MS-SQL Server will manage better the cpu
pinning that QEMU, and i would like to do some tests for confirm it.

Moreover, as i have 2 servers identical in Hardware, where is running this 
unique VM, i would like also to have the option of live migration enabled.

>I'm interested to see results between both method
With pleasure i will report the results

Moreover, talking about of the download of qemu-server deb from git, as very
soon this server will be in production, i would like to wait that this
package is in the "pve-no-subscription" repository for apply a upgrade, that
being well, I will run less risks of down times, unless you tell me you have
already tested and is very stable.


----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com>
To: "Cesar Peschiera" <brain at click.com.py>
Cc: "dietmar" <dietmar at proxmox.com>; "pve-devel" <pve-devel at pve.proxmox.com>
Sent: Tuesday, January 06, 2015 5:02 AM
Subject: Re: [pve-devel] [PATCH] add numa options


Hi,

>>As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive
>>for
>>MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can
>>manage
>>his own numa-processes better than QEMU, and as i guess that also will
>>exist
>>many applications that will manage his own numa-processes better than
>>QEMU,
>>is that i would like to order that PVE GUI has a option of enable or
>>disable
>>the automatic administration of the numa-processes, also with the
>>possibility of do live migration.

I'm not sure to understand what do you mean by
"says that MS-SQL Server can manage his own numa-processes better than
QEMU,"


Numa are not process, it's an architecture to regroup cpus with memory
bank,for fast memory access.


They are 2 parts:

1)currently, qemu expose the virtual numa nodes to the guest.
(each numa node = X cores  with X memory)

This can be simply enabled with numa:1  with last patches,
(I'll create 1 numa node by virtual socket, and split the ram amount between
each node


or if you want to custom memory access, cores by nodes,or setup specific
virtual numa nodes to specific host numa nodes
you can do it with
numa0: ....,
numa1:
"cpus=<id[-id],memory=<mb>[[,hostnodes=<id[-id]>][,policy=<preferred|bind|interleave>]]"


But this is always the application inside the guest which manage the memory
access.


2) Now with kernel 3.10, we have also auto numabalancing at the host side.
I'll try to map if possible the virtual numa nodes to host numa node.

you can disable this feature with "echo 0 > /proc/sys/kernel/numa_balancing"


So for my point of view, numa:1 + auto numa balancing should give you
already good results,
and it's allow live migration between different hosts numa architecture


Maybe with only 1vm, you can try to manually map virtual nodes to specific
nodes.

I'm interested to see results between both method (Maybe do you want last
qemu-server deb from git ?)



I plan to add gui for part1.




----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>, "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mardi 6 Janvier 2015 06:35:15
Objet: Re: [pve-devel] [PATCH] add numa options

Hi Alexandre and developers team.

I would like to order a feature for the next release of pve-manager:

As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive for
MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can manage
his own numa-processes better than QEMU, and as i guess that also will exist
many applications that will manage his own numa-processes better than QEMU,
is that i would like to order that PVE GUI has a option of enable or disable
the automatic administration of the numa-processes, also with the
possibility of do live migration.

Moreover, if you can to add such feature, i will can to run a test with
MS-SQL Server for know which of the two options give me better results and
publish it (with the times of wait for each case)

@Alexandre:
Moreover, with your temporal patches for manage the numa-processes, in
MS-SQL Server i saw a difference of time between two to three times more
quick for get the results (that it is fantastic, a great difference), but as
i yet don't finish of do the tests (talking about of do some changes in the
Bios Hardware, HugePages managed for the Windows Server, etc), is that yet i
don't publish a resume very detailed of the tests. I guess that soon i will
do it (I depend on third parties, and the PVE host not must lose the cluster
communication).

And talking about of lose the cluster communication, from that i have "I/OAT
DMA engine" enabled in the Hardware Bios, the node never more lost the
cluster communication, but i must do some extensive testing to confirm it.

Best regards
Cesar

----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com>
To: "Dietmar Maurer" <dietmar at proxmox.com>
Cc: <pve-devel at pve.proxmox.com>
Sent: Tuesday, December 02, 2014 8:17 PM
Subject: Re: [pve-devel] [PATCH] add numa options


> Ok,
>
> Finally I found the last pieces of the puzzle:
>
> to have autonuma balancing, we just need:
>
> 2sockes-2cores-2gb ram
>
> -object memory-backend-ram,size=1024M,id=ram-node0
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0
> -object memory-backend-ram,size=1024M,id=ram-node1
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1
>
> Like this, the host kernel will try to balance the numa node.
> This command line works if the host don't support numa.
>
>
>
> now if we want to bind guest numa node to specific host numa node,
>
> -object
> memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=preferred
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0
> -object
> memory-backend-ram,size=1024M,id=ram-node1,host-nodes=1,policy=bind \
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1
>
> This require that host-nodes=X exist on the physical host
> and need also the qemu-kvm --enable-numa flag
>
>
>
> So,
> I think we could add:
>
> numa:0|1.
>
> which generate the first config, create 1numa node by socket, and share
> the ram across the the nodes
>
>
>
> and also,for advanced users which need manual pinning:
>
>
> numa0:cpus=<X-X>,memory=<mb>,hostnode=<X-X>,policy="bind|preferred|....)
> numa1:...
>
>
>
> what do you think about it ?
>
>
>
>
> BTW, about pc-dimm hotplug, it's possible to add nume nodeid in
> "device_add pc-dimm,node=X"
>
>
> ----- Mail original ----- 
>
> De: "Alexandre DERUMIER" <aderumier at odiso.com>
> À: "Dietmar Maurer" <dietmar at proxmox.com>
> Cc: pve-devel at pve.proxmox.com
> Envoyé: Mardi 2 Décembre 2014 20:25:51
> Objet: Re: [pve-devel] [PATCH] add numa options
>
>>>shared? That looks strange to me.
> I mean split across the both nodes.
>
>
> I have check a little libvirt,
> and I'm not sure, but I think that memory-backend-ram is optionnal, to
> have autonuma.
>
> It's more about cpu pinning/memory pinning on selected host node
>
> Here an example for libvirt:
> http://www.redhat.com/archives/libvir-list/2014-July/msg00715.html
> "qemu: pass numa node binding preferences to qemu"
>
> +-object
> memory-backend-ram,size=20M,id=ram-node0,host-nodes=3,policy=preferred \
> +-numa node,nodeid=0,cpus=0,memdev=ram-node0 \
> +-object
> memory-backend-ram,size=645M,id=ram-node1,host-nodes=0-7,policy=bind \
> +-numa node,nodeid=1,cpus=1-27,cpus=29,memdev=ram-node1 \
> +-object memory-backend-ram,size=23440M,id=ram-node2,\
> +host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind \
> +-numa node,nodeid=2,cpus=28,cpus=30-31,memdev=ram-node2 \
>
> ----- Mail original ----- 
>
> De: "Dietmar Maurer" <dietmar at proxmox.com>
> À: "Alexandre DERUMIER" <aderumier at odiso.com>
> Cc: pve-devel at pve.proxmox.com
> Envoyé: Mardi 2 Décembre 2014 19:42:45
> Objet: RE: [pve-devel] [PATCH] add numa options
>
>> "When do memory hotplug, if there is numa node, we should add the memory
>> size to the corresponding node memory size.
>>
>> For now, it mainly affects the result of hmp command "info numa"."
>>
>>
>> So, it's seem to be done automaticaly.
>> Not sure on which node is assigne the pc-dimm, but maybe the free slots
>> are
>> shared at start between the numa nodes.
>
> shared? That looks strange to me.
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>




More information about the pve-devel mailing list