[pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?

Alexandre DERUMIER aderumier at odiso.com
Wed Sep 21 10:51:12 CEST 2016


>>Note that I have around 1000vms, so I don't known impact of number of messages/s. 

a simple tcpdump give me an average of:

udp/5404: 500packets/s
udp/5405 : 1300 packets/s

----- Mail original -----
De: "Alexandre Derumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 21 Septembre 2016 09:57:42
Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?

>>@Alexandre, you say that with 16 nodes the cluster is quite at is maximum, 
>>can I get some more infos from you as I currently do not have the 
>>hardware to 
>>test this :) 
>> 
>>Do you use IGMP snooping/queriers? 
>>On which network communicates corosync, on an independent? And how fast 
>>is it? 
>>Redundant rings also? 

I have a full 2x10gb network through lacp (no Redundant ring). 
Dedicated vlan for nodes, but sharing same physical links (but far to be saturated) 
Cluster node are 2x10cores 3,1ghz xeon, with ssd for local storage 
currently mtu 1500, but I'm planning to increase it to 9000, as it seem that allow more messages. 
I'm using igmp snooping/queriers (multicast stable). 

and I'm seeing a lot of retransmit, time to time (around 5-10s of retransmit), 1 or twice by hour :/ 

so I'm really scared to increase the cluster size. 

Note that I have around 1000vms, so I don't known impact of number of messages/s. 

Question : do you think streaming all vm statistics could impact number of message/s ? 






----- Mail original ----- 
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mercredi 21 Septembre 2016 09:40:01 
Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ? 

On 09/21/2016 08:50 AM, Alexandre DERUMIER wrote: 
>>> Forgot to mention that consul supports multiple clusters and/or multi 
>>> center clusters out of the box. 
> yes, I read the doc yesterday. seem very interesting. 
> 
> The most work could be to replace pmxcs by consul kv store. I have seen some consul fuse fs implementation, 
> but it don't have all pmxcs features (like symlinks for example). 
> 
> Zookeeper seem to be lower level. 
> 
> reading sheedog plugin:(1500loc) 
> 
> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/zookeeper.c 
> vs 
> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/corosync.c 
Discussion and evaluating options is good but throwing instantly all away, 
and switching to another - not necessarily better - cluster stack is 
maybe a bit overreacted. :) I also think that our current cluster stack, 
with corosync + pve-cluser (pmxcfs) is quite stable and a lot of things 
depend on it. 

Also corosync is very well tested software and works really good, at least 
with small to mid size clusters (< 60 nodes - which I find is quite an 
achievement for a cluster!). You have also to consider 
that quite some overhead, and thus node limitation, may come from the 
database used by pmxcfs, the transaction needs to be synced with disk to 
make everything reliable and while this is quite optimized it makes things 
slower (placing the DB on really fast storage could help here). 

I, personally, would prefer to keep corosync and introduce a protocol which 
allows connecting multiple clusters (easier said, but still less change and 
work then adapting to another cluster stack, which is most surely not 
better, or has other drawbacks.) 

Also taking a look at the corosync satellite approach sounds interesting. 

Connecting multiple clusters is also another approach then a small cluster 
with a lot of satellite nodes per cluster node, I see the former better as 
its more decentralized and seems to fit netter in our current design. :) 

> 
> Note that for scaling, zookeeper,consul,... have some kind of master nodes for the quorum, and client nodes. (same than corosync satelitte). 
> I don't think it's technically possible to scale with full mesh masters nodes with lot of nodes. 

No, with full mesh you wont really overcome the limits and problems corosync 
has here, corosync utilizes the possibilities quite well with multicast 
here. 

@Alexandre, you say that with 16 nodes the cluster is quite at is maximum, 
can I get some more infos from you as I currently do not have the 
hardware to 
test this :) 

Do you use IGMP snooping/queriers? 
On which network communicates corosync, on an independent? And how fast 
is it? 
Redundant rings also? 


> 
> ----- Mail original ----- 
> De: "datanom.net" <mir at datanom.net> 
> À: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Mercredi 21 Septembre 2016 07:49:06 
> Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ? 
> 
> On Wed, 21 Sep 2016 01:45:18 +0200 
> Michael Rasmussen <mir at datanom.net> wrote: 
> 
>> https://github.com/hashicorp/consul 
>> 
> Forgot to mention that consul supports multiple clusters and/or multi 
> center clusters out of the box. 
> 


_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list