[pve-devel] corosync problems - need help

Alexandre DERUMIER aderumier at odiso.com
Mon Sep 15 18:58:38 CEST 2014


some news,

I forgot to say that I'm using openvswitch

on the defect node I see in
/var/log/openvswitch/ovs-vswitchd.log

a lot of

2014-09-15T15:44:07.536Z|77368|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:253 (56% CPU usage)
2014-09-15T15:44:07.536Z|77369|poll_loop|INFO|wakeup due to [POLLIN] on fd 28 (FIFO pipe:[29855]) at ../lib/seq.c:157 (56% CPU usage)
2014-09-15T15:44:07.536Z|77370|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:253 (56% CPU usage)
2014-09-15T15:44:07.537Z|77371|poll_loop|INFO|wakeup due to [POLLIN] on fd 28 (FIFO pipe:[29855]) at ../lib/seq.c:157 (56% CPU usage)
2014-09-15T15:44:10.535Z|77375|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:253 (54% CPU usage)
2014-09-15T15:44:19.535Z|77379|poll_loop|INFO|wakeup due to [POLLIN] on fd 28 (FIFO pipe:[29855]) at ../lib/seq.c:157 (51% CPU usage)
2014-09-15T15:44:28.537Z|77385|poll_loop|INFO|wakeup due to [POLLIN] on fd 28 (FIFO pipe:[29855]) at ../lib/seq.c:157 (53% CPU usage)
2014-09-15T15:44:28.537Z|77386|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:253 (53% CPU usage)
2014-09-15T15:44:34.535Z|77390|poll_loop|INFO|wakeup due to [POLLIN] on fd 28 (FIFO pipe:[29855]) at ../lib/seq.c:157 (52% CPU usage)


I'm not sure it's related, but cpu of ovs-vswitchd daemon is indeed high (50-70% of 1core) (But I don't have packets lost in vms or host)

I found a patch about this
http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commit;h=9b32ece62481706b0a340f7a100fe79ad9caad9e


It's possibly related to the number of taps/ports on ovs bridge. (I have a lot of them)

but seem that it's not yet in current ovs 2.0.1.

So, I'm going to test ovs 2.3. (seem to work with kernel 3.10 ovs module)



----- Mail original -----

De: "Alexandre DERUMIER" <aderumier at odiso.com>
À: "Dietmar Maurer" <dietmar at proxmox.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Lundi 15 Septembre 2014 07:26:52
Objet: Re: [pve-devel] corosync problems - need help

Also, about the pmxcfs sefgaults,

I have see this messages

Sep 14 09:06:33 kvm1 pmxcfs[65403]: [dcdb] notice: cpg_join retry 62840

Sep 14 10:57:25 kvm11 pmxcfs[13112]: [dcdb] notice: cpg_join retry 65090

with retry around 65000 (16bits)



and
int retries = 0;
result = cpg_join(dfsm->cpg_handle, &dfsm->cpg_group_name);
if (result == CPG_ERR_TRY_AGAIN) {
nanosleep(&tvreq, NULL);
++retries;
if ((retries % 10) == 0)
cfs_dom_message(dfsm->log_domain, "cpg_join retry %d", retries);
goto loop;
}


could it be related to retries integer type?



----- Mail original -----

De: "Alexandre DERUMIER" <aderumier at odiso.com>
À: "Dietmar Maurer" <dietmar at proxmox.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Lundi 15 Septembre 2014 07:06:40
Objet: Re: [pve-devel] corosync problems - need help

>>This just indicates that corosync does not work as expected.

My understand is that the faulty node join the mutlicast group, other see it.

but when others nodes try to talk with him, they have no response ?



I'm going to do some wireshark network traces today

I'll also try to update all other nodes to kernel 3.10. (not sure it's related)


----- Mail original -----

De: "Dietmar Maurer" <dietmar at proxmox.com>
À: "Alexandre DERUMIER" <aderumier at odiso.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Lundi 15 Septembre 2014 05:43:56
Objet: RE: [pve-devel] corosync problems - need help

> seem to be in:
> data/src/dfsm.c
>
> result = cpg_mcast_joined(dfsm->cpg_handle, CPG_TYPE_AGREED, iov, len);
> if (retry && result == CPG_ERR_TRY_AGAIN) {

This just indicates that corosync does not work as expected.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



More information about the pve-devel mailing list