[PVE-User] PVE 5.1 - Intel <-> AMD migration crash with Debian 9

Fri Feb 2 12:09:23 CET 2018

Suposedly kvm64 CPU model is just for this (abstracting from CPU 
model/brand). :-)

El 02/02/18 a las 11:53, Gilberto Nunes escribió:
> Hi
>
> Well.... My best shot, is about the CPU your try to use in your vm... I am
> not kernel expert not linux expert either, but in 3 of screenshot that you
> sent, there's kvm_kick_cup appears...
> Perhaps you can try change the kernel inside the VM, using a LiveCD for
> that, or even change the CPU model to host or other AMD CPU...
> Again, it is just a shot in the dark!
>
> Good luck!
> Cheers
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
> 2018-02-02 8:16 GMT-02:00 Eneko Lacunza <elacunza at binovo.es>:
>
>> Hi,
>>
>> I have some screenshots, they aren't complete as console shows about 4
>> crashes in a few seconds:
>>
>> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
>> pantalla%20de%202018-02-02%2009-33-24.png
>> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
>> pantalla%20de%202018-02-02%2009-56-29.png
>> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
>> pantalla%20de%202018-02-02%2009-57-05.png
>> ftp://ftp.binovo.es/elacunza/migration-crash/Captura%20de%20
>> pantalla%20de%202018-02-02%2009-57-27.png
>>
>> Crashed don't get logged to syslog/debug/dmesg
>>
>> Cheers
>>
>> El 02/02/18 a las 10:42, Gilberto Nunes escribió:
>>
>> Hi
>>> I think that would be nice if you could send us kernel panic message or
>>> even the dmesg output.
>>> Do you have any modules that was compiled but hand in this system?
>>>
>>>
>>> Em sex, 2 de fev de 2018 07:14, Eneko Lacunza <elacunza at binovo.es>
>>> escreveu:
>>>
>>> Hi all,
>>>> We have replaced an old node in our office Proxmox 5.1 cluster, with a
>>>> Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from
>>>> the old Intel server to the new AMD machine. So far so good, everything
>>>> booted OK, Ceph OSD started OK after adjusting network, replacement went
>>>> really nice.
>>>>
>>>> But we have found _one_ Debian 9 VM that kernel panics shortly after
>>>> migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter
>>>> of seconds, sometimes it needs some minutes or even rarely one or two
>>>> hours.
>>>>
>>>> The strange thing is that we have done that king of migration with other
>>>> VMs (serveral Windows VMs with different versions, another CentOS VM,
>>>> Debian 8 VM) and works perfectly.
>>>>
>>>> If we restart this problematic VM after the migration+crash, it works
>>>> flawlessly (no more crashes until migration to another CPU maker).
>>>> Migration between Intel CPUs (with ECC memory) works OK too. We don't
>>>> have a second AMD machine to test migration between AMD nodes.
>>>>
>>>> VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at
>>>> IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network
>>>> virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes,
>>>> BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with
>>>> virtio-block too.
>>>>
>>>> # pveversion -v
>>>> proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
>>>> pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
>>>> pve-kernel-4.4.83-1-pve: 4.4.83-96
>>>> pve-kernel-4.13.4-1-pve: 4.13.4-26
>>>> pve-kernel-4.4.76-1-pve: 4.4.76-94
>>>> pve-kernel-4.13.13-4-pve: 4.13.13-35
>>>> pve-kernel-4.4.67-1-pve: 4.4.67-92
>>>> libpve-http-server-perl: 2.0-8
>>>> lvm2: 2.02.168-pve6
>>>> corosync: 2.4.2-pve3
>>>> libqb0: 1.0.1-1
>>>> pve-cluster: 5.0-19
>>>> qemu-server: 5.0-19
>>>> pve-firmware: 2.0-3
>>>> libpve-common-perl: 5.0-25
>>>> libpve-guest-common-perl: 2.0-14
>>>> libpve-access-control: 5.0-7
>>>> libpve-storage-perl: 5.0-17
>>>> pve-libspice-server1: 0.12.8-3
>>>> vncterm: 1.5-3
>>>> pve-docs: 5.1-16
>>>> pve-qemu-kvm: 2.9.1-5
>>>> pve-container: 2.0-18
>>>> pve-firewall: 3.0-5
>>>> pve-ha-manager: 2.0-4
>>>> ksm-control-daemon: 1.2-2
>>>> glusterfs-client: 3.8.8-1
>>>> lxc-pve: 2.1.1-2
>>>> lxcfs: 2.0.8-1
>>>> criu: 2.11.1-1~bpo90
>>>> novnc-pve: 0.6-4
>>>> smartmontools: 6.5+svn4324-1
>>>> zfsutils-linux: 0.7.3-pve1~bpo9
>>>> ceph: 12.2.2-1~bpo90+1
>>>>
>>>> Any ideas? This is a production VM but it isn't critical, we can play
>>>> with it. We can also live with the problem, but I think it could be of
>>>> interest to try to debug the problem.
>>>>
>>>> Thanks a lot
>>>> Eneko
>>>>
>>>> --
>>>> Zuzendari Teknikoa / Director Técnico
>>>> Binovo IT Human Project, S.L.
>>>> Telf. 943569206
>>>> Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
>>>> www.binovo.es
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at pve.proxmox.com
>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>
>>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>
>> --
>> Zuzendari Teknikoa / Director Técnico
>> Binovo IT Human Project, S.L.
>> Telf. 943569206
>> Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
>> www.binovo.es
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es