[PVE-User] PVE 5.1 - Intel <-> AMD migration crash with Debian 9

Eneko Lacunza elacunza at binovo.es
Fri Feb 2 10:14:33 CET 2018


Hi all,

We have replaced an old node in our office Proxmox 5.1 cluster, with a 
Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from 
the old Intel server to the new AMD machine. So far so good, everything 
booted OK, Ceph OSD started OK after adjusting network, replacement went 
really nice.

But we have found _one_ Debian 9 VM that kernel panics shortly after 
migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter 
of seconds, sometimes it needs some minutes or even rarely one or two hours.

The strange thing is that we have done that king of migration with other 
VMs (serveral Windows VMs with different versions, another CentOS VM, 
Debian 8 VM) and works perfectly.

If we restart this problematic VM after the migration+crash, it works 
flawlessly (no more crashes until migration to another CPU maker). 
Migration between Intel CPUs (with ECC memory) works OK too. We don't 
have a second AMD machine to test migration between AMD nodes.

VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at 
IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network 
virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes, 
BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with 
virtio-block too.

# pveversion -v
proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.4.67-1-pve: 4.4.67-92
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-19
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-1~bpo90+1

Any ideas? This is a production VM but it isn't critical, we can play 
with it. We can also live with the problem, but I think it could be of 
interest to try to debug the problem.

Thanks a lot
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list