[PVE-User] Ceph Cluster with proxmox failure

Ronny Aasen ronny+pve-user at aasen.cx
Fri Sep 28 22:52:35 CEST 2018


On 28.09.2018 21:49, Gilberto Nunes wrote:
> Hi there
> I have a 6 server Ceph Cluster maded with proxmox 5.2
> Suddenly, after power failure, I have only 3 servers UP, but even with 3
> server, Ceph Cluster doesn't work.
> pveceph status give me a timeout
> pveceph status got timeout
>
> Any advice?


out of your 6 servers, how many was mon hosts. and how many mon hosts 
are running at this time ?

does ceph -s work on the command line of the servers. ? do you have a 
mgr running ?

you will need that the quorum of the mon hosts are alive.
so if you had 3 mon hosts, you need 2 live ones, and can loose 1
if you had 5 mon hosts, you need 3 live ones, and can loose 2
if you had six mon hosts you would need 4 live ones, and can still only 
loose 2.

if a mon host is not running, try to restart it, read the logs and find 
out why not.
if the logs does not show a reason, increase log verbosity and try 
restart again.

once you have quorum of mon hosts (and a running mgr host)
you can start looking at osd's  recovery and backfilling. if you have 
the default 3x replication, pg's should come online as soon as it have 2 
whole copies.  try to pay attention to the fill level of disks, since 
you do not want to make a bad situation worse by filling up your osd's
use  things like
ceph osd tree
ceph osd df
ceph -s


good luck
Ronny Aasen




More information about the pve-user mailing list