[pve-devel] need help, lost quorum on all nodes

Alexandre DERUMIER aderumier at odiso.com
Wed Jan 16 13:59:02 CET 2013


I have just tried with last corosync from git, with my 3.7 kernel same problem.
I'll test with last pve-kernel this afternoon.

I think it's really sending some crap to multicast, maybe flood it

because,  I see the problem on my test cluster (same lan than my production server), when I use kernel 3.7 on my production server !

I see a lot of (on each node)
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c60 c61 c62 c5c c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5d c5f c63 c64 c5e c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5d c5f c63 c64 c5e c48 c49 a45 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c5a c5b c5c c60 c61 c62 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5d c5f c63 c64 c5e c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5c c60 c61 c62 c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c54 c5c c60 c61 c62 c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c55 c56 c57 c58 c59 c5a c5b c5d c5e c5f c63 c64 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5c c60 c61 c62 c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c5d c5e c5f c63 c64 c54 c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5e c5f c63 c64 c54 c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c55 c56 c57 c58 c59 c5a c5b c5c c5d c60 c61 c62 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5f c63 c64 c54 c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c5c c5d c60 c61 c62 c5e c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5d c60 c61 c62 c5e c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c5a c5b c5c c5f c63 c64 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c60 c61 c62 c5e c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5c c5f c63 c64 c5d c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5c c5f c63 c64 c5d c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c5a c5b c5e c60 c61 c62 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5f c63 c64 c5d c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5e c60 c61 c62 c5c c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5e c60 c61 c62 c5c c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c5a c5b c5d c5f c63 c64 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c60 c61 c62 c5c c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5d c5f c63 c64 c5e c6e c55 c56 c57 c58 c59 c5a c5b 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5d c5f c63 c64 c5e c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c5a c5b c5c c60 c61 c62 c65 
Jan 16 13:39:35 corosync [TOTEM ] Retransmit List: c5f c63 c64 c5e c48 c49 c4a c4b c4c c4d c4e c4f c50 c51 c52 c53 c54 c5c c60 c61 c62 c5d c6e c55 c56 c57 c58 c59 c5a c5b 

then after nodes from test cluster and production server doesn't see other nodes.

After reboot the node with 3.7 kernel, each nodes see again other nodes.

This is very strange....


----- Mail original ----- 

De: "Dietmar Maurer" <dietmar at proxmox.com> 
À: "Alexandre DERUMIER" <aderumier at odiso.com>, "Stefan Priebe" <s.priebe at profihost.ag> 
Cc: pve-devel at pve.proxmox.com 
Envoyé: Mercredi 16 Janvier 2013 13:01:40 
Objet: RE: [pve-devel] need help, lost quorum on all nodes 


> >>Any news on this? 
> No, I really down what is the problem. 
> I have tried with updated corosync, doesn't help and my custom kernel 
> doesn't use CONFIG_RT_GROUG_SCHED. 


Please can you also test with latest kernel/corosync from git? 



More information about the pve-devel mailing list