[PVE-User] Ceph jewel to luminous upgrade problem

Mon Nov 13 16:44:56 CET 2017

Hi again,

It seems we hit this reported/won't fix bug:
http://tracker.ceph.com/issues/16211

I managed to start an affected VM following step #12, will continue 
applying the fix to see if all affected VMs are fixed this way.

Thanks

El 13/11/17 a las 16:26, Eneko Lacunza escribió:
> Hi all,
>
> We're in the process of upgrading our office Proxmox v4.4 cluster to 
> v5.1 .
>
> For that we first have followed instructions in
> https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
>
> to upgrade Ceph Jewel to Luminous.
>
> Upgrade was apparently a success:
> # ceph -s
>   cluster:
>     id:     8ee074d4-005c-4bd6-a077-85eddde543b5
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum 0,2,3
>     mgr: butroe(active), standbys: guadalupe, sanmarko
>     osd: 12 osds: 12 up, 12 in
>
>   data:
>     pools:   2 pools, 640 pgs
>     objects: 518k objects, 1966 GB
>     usage:   4120 GB used, 7052 GB / 11172 GB avail
>     pgs:     640 active+clean
>
>   io:
>     client:   644 kB/s rd, 3299 kB/s wr, 61 op/s rd, 166 op/s wr
>
> And versions seem good too:
> # ceph mon versions
> {
>     "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
> luminous (stable)": 3
> }
> # ceph osd versions
> {
>     "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) 
> luminous (stable)": 12
> }
>
> But this weeked there were problems backing up some VMs, all with the 
> same error:
> no such volume 'ceph-proxmox:vm-120-disk-1'
>
> The "missing" volumes don't show in storage content, but they DO if we 
> do a "rbd -p proxmox ls".
>
> When we try an info command we get an error though:
> # rbd -p proxmox info vm-120-disk-1
> 2017-11-13 16:04:02.979006 7f99d8ff9700 -1 librbd::image::OpenRequest: 
> failed to retreive immutable metadata: (2) No such file or directory
> rbd: error opening image vm-120-disk-1: (2) No such file or directory
>
> Other VM disk images behave normally:
> # rbd -p proxmox info vm-119-disk-1
> rbd image 'vm-119-disk-1':
>     size 3072 MB in 768 objects
>     order 22 (4096 kB objects)
>     block_name_prefix: rbd_data.575762ae8944a
>     format: 2
>     features: layering
>     flags:
>
> I don't really know what to look at to further diagnose this. I recall 
> that there was a version 1 format for rbd, but I doubt "missing" disk 
> images are in that old format (and really don't know how to check that 
> if info doesn't work...)
>
> Some of the missing VMs continue to be used by "old" running qemu 
> processes and work correctly; but if we stop the VM, then it won't 
> start again with the error reported above. I can start and stop VMs 
> with non-"missing" disk images normally.
>
> Any hints about what to try next?
>
> OSDs are filestore with XFS (created from GUI).
>
> # pveversion -v
> proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
> pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
> pve-kernel-4.4.67-1-pve: 4.4.67-92
> pve-kernel-4.4.76-1-pve: 4.4.76-94
> pve-kernel-4.4.83-1-pve: 4.4.83-96
> lvm2: 2.02.116-pve3
> corosync-pve: 2.4.2-2~pve4+1
> libqb0: 1.0.1-1
> pve-cluster: 4.0-53
> qemu-server: 4.0-113
> pve-firmware: 1.1-11
> libpve-common-perl: 4.0-96
> libpve-access-control: 4.0-23
> libpve-storage-perl: 4.0-76
> pve-libspice-server1: 0.12.8-2
> vncterm: 1.3-2
> pve-docs: 4.4-4
> pve-qemu-kvm: 2.9.0-5~pve4
> pve-container: 1.0-101
> pve-firewall: 2.0-33
> pve-ha-manager: 1.0-41
> ksm-control-daemon: 1.2-1
> glusterfs-client: 3.5.2-2+deb8u3
> lxc-pve: 2.0.7-4
> lxcfs: 2.0.6-pve1
> criu: 1.6.0-1
> novnc-pve: 0.5-9
> smartmontools: 6.5+svn4324-1~pve80
> zfsutils: 0.6.5.9-pve15~bpo80
> ceph: 12.2.1-1~bpo80+1
>
> Thanks a lot
> Eneko
>

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es