[PVE-User] Cman crash problem

Alexandre DERUMIER aderumier at odiso.com
Mon Jul 8 15:31:52 CEST 2013


>>It appeared after the network team changed network active equipments in 
>>the building (but this might not be the origin of the problem).

Hi, what is the previous and new equipments ? (I have had some cisco problem in the past).


----- Mail original ----- 

De: "Jonathan Schaeffer" <jonathan.schaeffer at univ-brest.fr> 
À: pve-user at pve.proxmox.com 
Envoyé: Lundi 8 Juillet 2013 12:56:48 
Objet: [PVE-User] Cman crash problem 

Hi all, 

I'm experiencing a serious problem on our 4 nodes cluster (PVE 3.0). 

It appeared after the network team changed network active equipments in 
the building (but this might not be the origin of the problem). 

The symptoms are : 

- The nodes appear in red on the web gui, except the one hosting the web 
service IP 
- The VM, while still running correctly, do not show any information 
(running, rrd graphs, etc) 

- clustat shows nodes as "online" 
- some nodes seems to have been fenced (while not restarted) 
(see log extracts : barbossa_fenced.log and jim_fenced.log) 

- /var/log/cluster/corosync.log shows LOT of messages : 
Jul 08 07:06:49 corosync [TOTEM ] Retransmit List: 13f54a 13f54b 13f54c 
13f54d 13f54e 13f54f 13f550 13f551 13f552 13f553 13f554 13f555 13f556 
13f557 13f558 13f559 13f55a 13f55b 13f55c 13f55d 13f55e 

If I restart one node, the fencing is going to happen, other nodes will 
reboot and all the VMs hosted allong with them. I don't want this to happen. 

I can provide more logs if necessary. Do you have an idea to help me 
understand what is going on here ? 

Thanks, 

Jonathan 


barbossa_fenced.log : 
Jul 03 12:07:21 fenced fencing deferred to jim 
Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 
Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 
Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 
Jul 04 00:29:35 fenced receive_start 1:16 add node with started_count 8 
Jul 04 00:29:35 fenced receive_start 3:8 add node with started_count 1 
Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 
Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 
Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 
Jul 04 10:44:12 fenced receive_start 1:22 add node with started_count 8 
Jul 04 10:44:12 fenced receive_start 3:14 add node with started_count 1 
Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 
Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster 


jim_fenced.log : 
Jul 03 12:07:21 fenced fencing node longjohn 
Jul 03 12:07:32 fenced fence longjohn success 
Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 
Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 
Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 
Jul 04 00:29:35 fenced receive_start 3:8 add node with started_count 1 
Jul 04 00:29:35 fenced receive_start 5:14 add node with started_count 6 
Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 
Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 
Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 
Jul 04 10:44:12 fenced receive_start 5:20 add node with started_count 6 
Jul 04 10:44:12 fenced receive_start 3:14 add node with started_count 1 
Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster 
Jul 04 10:44:24 fenced receive_start 2:23 add node with started_count 4 
Jul 04 10:44:24 fenced receive_start 3:15 add node with started_count 1 
Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 
Jul 04 10:44:46 fenced receive_start 5:22 add node with started_count 6 
Jul 04 10:44:46 fenced receive_start 3:16 add node with started_count 1 

longjohn_fenced.log : 
Jul 03 09:47:12 fenced fenced 1352871249 started 
Jul 03 11:28:46 fenced cluster is down, exiting 
Jul 03 11:28:46 fenced daemon cpg_dispatch error 2 
Jul 03 12:11:43 fenced fenced 1364188437 started 
Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 
Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 
Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 
Jul 04 00:29:35 fenced receive_start 1:16 add node with started_count 8 
Jul 04 00:29:35 fenced receive_start 5:14 add node with started_count 6 
Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 
Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 
Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 
Jul 04 10:44:12 fenced receive_start 1:22 add node with started_count 8 
Jul 04 10:44:12 fenced receive_start 5:20 add node with started_count 6 
Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 
Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster 
Jul 04 10:44:24 fenced receive_start 2:23 add node with started_count 4 
Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 
Jul 04 10:44:46 fenced receive_start 5:22 add node with started_count 6 
Jul 04 10:44:46 fenced receive_start 1:24 add node with started_count 8 

flint_fenced.log : 
Jul 03 11:18:30 fenced fenced 1364188437 started 
Jul 03 12:07:21 fenced fencing deferred to jim 
Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 
Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 
Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 
Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 
Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 
Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 
Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 
Jul 04 10:44:24 fenced receive_start 3:15 add node with started_count 1 
Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 
Jul 04 10:44:24 fenced cluster is down, exiting 




-- 
IUEM - Service Informatique 
rue Dumont D'Urville 
Technopôle Brest-Iroise 
29280 Plouzané 
France 
http://www-iuem.univ-brest.fr/feiri 
tel: +33 2 98 49 87 94 
_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 



More information about the pve-user mailing list