[PVE-User] Constant crashes with high IO load from FreeBSD guests

Myke G mykel at mWare.ca
Tue Dec 28 19:24:33 CET 2010


Crowbar:~# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
Crowbar:~#

I have 3 SunFire X2200s in my cluster, with 8-16GB RAM, and 4 or 8 CPU 
cores. I just upgraded one of them to v1.7, the rest are still running 
v1.5. It doesn't seem to matter - the problem remains:

If the guest fires up a tar of / or a particularly large ./configure of 
something, eventually, within seconds to minutes, the Proxmox node 
hangs. Usually there's no console messages, and I have to use the IPMI 
interface to reset the machine. (I'm 500KM from the datacenter)
We've tried FreeBSD 7 and 8, with SCSI and IDE, Realtek and Intel 
NICs... no difference. We've only tried with i386 builds (stock from 
ISO, and also updated to -STABLE). I've moved the guest around to 
different nodes in the cluster, and the VM-IO-induced crashing is 
universal. AFAIK the hardware is fine, the crashes follow this user's 
workpatterns - which aren't exceptional IMO. We've recreated this 
instance 4 times now I think.

Sometimes, the guest machine just ends up being "stopped" in the Proxmox 
management, but this is only about 1/20 occurrences. Even this is 
undesirable, but nowhere near as bad as wiping out the hardware - which 
is the typical mode of failure. I should add, sometimes the node 
self-reboots...

I have other FreeBSD guests, but they're all mostly idle... now I'm 
worried about actually making them do some real work.

OpenVZ VMs seem to be fine, and I haven't struggled with Linux machines 
running in QEMU... 32b WindowsXP occasionally caused some problems like 
this, but that went away after v1.5. (Hey, I only use it to manage 
unruly IE-Only Webapps!)

Any advice? Suggestions? Would someone like to audit, or is there some 
report/logs I can provide? I don't really see anything of interest to 
report.

Thanks for any help,

Myke



More information about the pve-user mailing list