[PVE-User] System hangs / CPU 100% Windows 2008 Server

Martin Schuchmann ms at city-pc.de
Wed Sep 5 13:52:07 CEST 2012


>> We have a cluster of 3 proxmox servers and one serious problem on a 
>> Win 2008 Std (No R2) guest: Approximately every 5-15 days on 
>> different times the CPU turns up to 100% and the systems hangs. Today 
>> at 11:59:57 am this failure occurs the last time. We have had the 
>> failure in the past also on a sunday, when no one was working on the 
>> machine. So we do not think, that any software installed on the 
>> Win-Server itself causes the problem. Also the Windows Event-Logs 
>> does not show anything.
>>
>> The Proxmox syslog says (the nodes 301 and 501 are located at Server 
>> 1 (local storage), the hanging Win2008 machine runs as node 402 on 
>> Server 2 - also in local storage):
>>
>> Sep 5 11:42:12 promo2 rrdcached[1847]: removing old journal 
>> /var/lib/rrdcached/journal//rrd.journal.1346830932.227122
>> Sep 5 11:59:24 promo2 pmxcfs[1869]: [dcdb] notice: data verification 
>> successful
>> Sep 5 12:00:01 promo2 /USR/SBIN/CRON[348613]: (root) CMD (vzdump 301 
>> --quiet 1 --mode snapshot --compress lzo --maxfiles 18 --dumpdir 
>> /backup_sftp/vz/host1/hourly/)
>> Sep 5 12:00:01 promo2 /USR/SBIN/CRON[348614]: (root) CMD (vzdump 501 
>> --quiet 1 --mode snapshot --compress lzo --maxfiles 12 --dumpdir 
>> /backup_sftp/vz/elvis/hourly/)
>> Sep 5 12:00:02 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:00:02 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:00:38 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:05:01 promo2 pmxcfs[1869]: [status] notice: received log
>>
>>
>> Also in the past there seemed to be a possible connection between 
>> starting snapshots and killing the node 402.
>> The destination for the backups is a SFTP Server in another datacenter.
>>
>> Has anyone experiences with that behaviour? 
>
>
> Yes, we did, many many times. Everything solved (really!) after bios 
> update (we have many hp and dell servers with Xeon 3xxx and 5xxx 
> series and all suffered of a cpu microcode problem, solved at the end 
> of 2010 / beginning 2011). Look for a bios update.
>
> Massimo Santoro

Hi Massimo,

Thanks for that advice!

I have checked the bios and according to HP Support it is already a 
corrected version from May 2011.

I think if it would be caused by a hardware error, the problem would 
occur on other guests on this host also, or the complete host should 
freeze up?

On the same machine is a Win 2008 SBS running - for 6 month without any 
error.
The node which is freezing is used as a Terminalserver with about 5-10 
active Users.

Regards, Martin







More information about the pve-user mailing list