[PVE-User] PVE 5.1 - Intel <-> AMD migration crash with Debian 9

Fri Feb 2 11:53:12 CET 2018

Hi

Well.... My best shot, is about the CPU your try to use in your vm... I am
not kernel expert not linux expert either, but in 3 of screenshot that you
sent, there's kvm_kick_cup appears...
Perhaps you can try change the kernel inside the VM, using a LiveCD for
that, or even change the CPU model to host or other AMD CPU...
Again, it is just a shot in the dark!

Good luck!
Cheers

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

2018-02-02 8:16 GMT-02:00 Eneko Lacunza <elacunza at binovo.es>:

> Hi,
>
> I have some screenshots, they aren't complete as console shows about 4
> crashes in a few seconds:
>
> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
> pantalla%20de%202018-02-02%2009-33-24.png
> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
> pantalla%20de%202018-02-02%2009-56-29.png
> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
> pantalla%20de%202018-02-02%2009-57-05.png
> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
> pantalla%20de%202018-02-02%2009-57-27.png
>
> Crashed don't get logged to syslog/debug/dmesg
>
> Cheers
>
> El 02/02/18 a las 10:42, Gilberto Nunes escribió:
>
> Hi
>> I think that would be nice if you could send us kernel panic message or
>> even the dmesg output.
>> Do you have any modules that was compiled but hand in this system?
>>
>>
>> Em sex, 2 de fev de 2018 07:14, Eneko Lacunza <elacunza at binovo.es>
>> escreveu:
>>
>> Hi all,
>>>
>>> We have replaced an old node in our office Proxmox 5.1 cluster, with a
>>> Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from
>>> the old Intel server to the new AMD machine. So far so good, everything
>>> booted OK, Ceph OSD started OK after adjusting network, replacement went
>>> really nice.
>>>
>>> But we have found _one_ Debian 9 VM that kernel panics shortly after
>>> migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter
>>> of seconds, sometimes it needs some minutes or even rarely one or two
>>> hours.
>>>
>>> The strange thing is that we have done that king of migration with other
>>> VMs (serveral Windows VMs with different versions, another CentOS VM,
>>> Debian 8 VM) and works perfectly.
>>>
>>> If we restart this problematic VM after the migration+crash, it works
>>> flawlessly (no more crashes until migration to another CPU maker).
>>> Migration between Intel CPUs (with ECC memory) works OK too. We don't
>>> have a second AMD machine to test migration between AMD nodes.
>>>
>>> VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at
>>> IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network
>>> virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes,
>>> BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with
>>> virtio-block too.
>>>
>>> # pveversion -v
>>> proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
>>> pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
>>> pve-kernel-4.4.83-1-pve: 4.4.83-96
>>> pve-kernel-4.13.4-1-pve: 4.13.4-26
>>> pve-kernel-4.4.76-1-pve: 4.4.76-94
>>> pve-kernel-4.13.13-4-pve: 4.13.13-35
>>> pve-kernel-4.4.67-1-pve: 4.4.67-92
>>> libpve-http-server-perl: 2.0-8
>>> lvm2: 2.02.168-pve6
>>> corosync: 2.4.2-pve3
>>> libqb0: 1.0.1-1
>>> pve-cluster: 5.0-19
>>> qemu-server: 5.0-19
>>> pve-firmware: 2.0-3
>>> libpve-common-perl: 5.0-25
>>> libpve-guest-common-perl: 2.0-14
>>> libpve-access-control: 5.0-7
>>> libpve-storage-perl: 5.0-17
>>> pve-libspice-server1: 0.12.8-3
>>> vncterm: 1.5-3
>>> pve-docs: 5.1-16
>>> pve-qemu-kvm: 2.9.1-5
>>> pve-container: 2.0-18
>>> pve-firewall: 3.0-5
>>> pve-ha-manager: 2.0-4
>>> ksm-control-daemon: 1.2-2
>>> glusterfs-client: 3.8.8-1
>>> lxc-pve: 2.1.1-2
>>> lxcfs: 2.0.8-1
>>> criu: 2.11.1-1~bpo90
>>> novnc-pve: 0.6-4
>>> smartmontools: 6.5+svn4324-1
>>> zfsutils-linux: 0.7.3-pve1~bpo9
>>> ceph: 12.2.2-1~bpo90+1
>>>
>>> Any ideas? This is a production VM but it isn't critical, we can play
>>> with it. We can also live with the problem, but I think it could be of
>>> interest to try to debug the problem.
>>>
>>> Thanks a lot
>>> Eneko
>>>
>>> --
>>> Zuzendari Teknikoa / Director Técnico
>>> Binovo IT Human Project, S.L.
>>> Telf. 943569206
>>> Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
>>> www.binovo.es
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>
> --
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>