[PVE-User] Ceph: PANIC or DON'T PANIC? ;-)

Mon Nov 28 15:50:21 CET 2016

Hi Marco,

On 11/28/2016 03:31 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
>> What did the full ceph status show?
> 
> Do you mean 'ceph status'? I've not saved it, but was OK, as now:
> 
>  root at thor:~# ceph status
>     cluster 8794c124-c2ec-4e81-8631-742992159bd6
>      health HEALTH_OK
>      monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
>             election epoch 94, quorum 0,1,2,3 0,1,2,3
>      osdmap e114: 6 osds: 6 up, 6 in
>       pgmap v2524432: 768 pgs, 3 pools, 944 GB data, 237 kobjects
>             1874 GB used, 7435 GB / 9310 GB avail
>                  768 active+clean
>   client io 7693 B/s rd, 302 kB/s wr, 65 op/s
> 

Would have been interesting if all OSDs were up & in. As depending on the pool config, the min size for serving data out
of that pool might have prevented the storage to serve data.

> 
>> Did you add all the monitors to your storage config in proxmox?
>> A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be
>> available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg.
>> 1 mons down).
> 
> I've currently 4 nodes in my cluster: all node are pve clusterized, 2
> are cpu only (ceph mon), 2 (and one more to come) storage node
> (mon+osd(s)).
> 
> Yes, i've not changed the storage configuration, and when the CPU nodes
> started at least the two storage nodes where online.

I see from your ceph status that you have 4 mons, are they all in your storage conf? And are your storage nodes also mons?

It is important to have the monitors online, as these are accessed first and if those aren't then no storage is
available. With only one OSD node running the storage could be still available, besides a HEALTH_WARN.

> 
> 
>> Did you configure timesyncd properly?
>> On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the
>> proper time, so every host knows which map is the current one.
> 
> Now, yes. As stated, i've had configured with only a NTP server that was
> a VM in the same cluster; now, they use two NTP server, one remote.

Then a reboot should not do any harm.

> 
> Fixed the ntp server, servers get in sync, ceph status got OK but mons
> does not start to peers themself ('pgmap' logs).

If your mons aren't peering, then the status wouldn't be OK, so they must have done it after a while. May you please
show us the logs?

> 
> 
> Thanks.
> 

-- 
Cheers,
Alwin