[PVE-User] PVE 5.1 - Intel <-> AMD migration crash with Debian 9

Fri Feb 2 10:42:08 CET 2018

Hi
I think that would be nice if you could send us kernel panic message or
even the dmesg output.
Do you have any modules that was compiled but hand in this system?

Em sex, 2 de fev de 2018 07:14, Eneko Lacunza <elacunza at binovo.es> escreveu:

> Hi all,
>
> We have replaced an old node in our office Proxmox 5.1 cluster, with a
> Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from
> the old Intel server to the new AMD machine. So far so good, everything
> booted OK, Ceph OSD started OK after adjusting network, replacement went
> really nice.
>
> But we have found _one_ Debian 9 VM that kernel panics shortly after
> migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter
> of seconds, sometimes it needs some minutes or even rarely one or two
> hours.
>
> The strange thing is that we have done that king of migration with other
> VMs (serveral Windows VMs with different versions, another CentOS VM,
> Debian 8 VM) and works perfectly.
>
> If we restart this problematic VM after the migration+crash, it works
> flawlessly (no more crashes until migration to another CPU maker).
> Migration between Intel CPUs (with ECC memory) works OK too. We don't
> have a second AMD machine to test migration between AMD nodes.
>
> VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at
> IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network
> virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes,
> BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with
> virtio-block too.
>
> # pveversion -v
> proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
> pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
> pve-kernel-4.4.83-1-pve: 4.4.83-96
> pve-kernel-4.13.4-1-pve: 4.13.4-26
> pve-kernel-4.4.76-1-pve: 4.4.76-94
> pve-kernel-4.13.13-4-pve: 4.13.13-35
> pve-kernel-4.4.67-1-pve: 4.4.67-92
> libpve-http-server-perl: 2.0-8
> lvm2: 2.02.168-pve6
> corosync: 2.4.2-pve3
> libqb0: 1.0.1-1
> pve-cluster: 5.0-19
> qemu-server: 5.0-19
> pve-firmware: 2.0-3
> libpve-common-perl: 5.0-25
> libpve-guest-common-perl: 2.0-14
> libpve-access-control: 5.0-7
> libpve-storage-perl: 5.0-17
> pve-libspice-server1: 0.12.8-3
> vncterm: 1.5-3
> pve-docs: 5.1-16
> pve-qemu-kvm: 2.9.1-5
> pve-container: 2.0-18
> pve-firewall: 3.0-5
> pve-ha-manager: 2.0-4
> ksm-control-daemon: 1.2-2
> glusterfs-client: 3.8.8-1
> lxc-pve: 2.1.1-2
> lxcfs: 2.0.8-1
> criu: 2.11.1-1~bpo90
> novnc-pve: 0.6-4
> smartmontools: 6.5+svn4324-1
> zfsutils-linux: 0.7.3-pve1~bpo9
> ceph: 12.2.2-1~bpo90+1
>
> Any ideas? This is a production VM but it isn't critical, we can play
> with it. We can also live with the problem, but I think it could be of
> interest to try to debug the problem.
>
> Thanks a lot
> Eneko
>
> --
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>