[PVE-User] 3-node cluster sync stops after a few days

Aaron C. de Bruyn aaron at heyaaron.com
Tue Sep 28 20:33:17 CEST 2010


I have a 3-node Proxmox cluster.
If I boot up all the servers, they show they are synced.
After a few days up to a week, they stop syncing and the
master web interface is extremely slow.

When I am able to pull up the cluster page, the master
says "ERROR: 500 read timeout" and the other two nodes
say "nosync".

Restarting the various /etc/init.d/pv* services on the
various nodes doesn't fix the problem.  If I shutdown
all the VMs, and then shutdown the proxmox servers and
restart them, the problem is fixed for a few days.

In syslog on the master I see this:
Sep 28 10:06:03 kvm1 pvemirror[3318]: starting cluster syncronization
Sep 28 10:06:13 kvm1 pvemirror[3318]: syncing vzlist from '10.47.0.181' failed: 500 read timeout
Sep 28 10:06:13 kvm1 pvemirror[3318]: syncing templates
Sep 28 10:06:13 kvm1 pvemirror[3318]: cluster syncronization finished (10.04 seconds (files 0.00, config 0.00))

10.47.0.181 is the IP of the master named kvm1

I also see:
kvm1:/var/log# pveca -l
CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
 2 : 10.47.0.181     M     ERROR: 500 Can't connect to 127.0.0.1:83 (connect: timeout)

 3 : 10.47.0.182     N     S   12 days 23:10   0.35    70%     2%
 4 : 10.47.0.183     N     S   12 days 23:08   0.16    31%     2%
kvm1:/var/log# 

To top it off, if I do a netstat, I see the pvedaemon process listening on port 83:

tcp        0      0 127.0.0.1:83            0.0.0.0:*               LISTEN      13195/pvedaemon wor

Any pointers?

-A



More information about the pve-user mailing list