[PVE-User] PVE, Ceph, OSD in stop/out state: how to restart from commandline?

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Jun 6 10:20:49 CEST 2017


On Mon, Jun 05, 2017 at 10:04:47AM +0200, Marco Gaiarin wrote:
> 
> Again my Ceph cluster suffer a main power outgage. ;-(
> 
> The cluster went down well, but after that the power come back a bit
> intermittently, so servers boot and shutdown some times...
> 
> 
> When power come back, all server run, cluster works as expected bu i've
> had 5 OSD (out of 12) down/out.
> 
> I was away, so i've connected via SSH, but i've found there's no way
> to restart OSD, because systemd does not have the ''stanza'', eg in a
> server where i've had 2 OSD down out of 4, i was able to do:
> 
> 	systemctl start ceph-osd.<TAB>
> 
> and i've seen the two running one, not the other. I've tried other
> systemd command (enable, restart, ...) but there's no stanza for the
> faulty osd.
> 
> 
> I was forced to do some ssh port forwarding, connect to the web
> interface and restart the faulty OSD with the 'start' button on the
> node ceph->osd page.
> 
> 
> Why?! Thanks.
> 

OSDs are supposed to be enabled by UDEV rules automatically. This does
not work on all systems, so PVE installs a ceph.service which triggers a
scan for OSDs on all available disks.

Either calling "systemctl restart ceph.service" or "ceph-disk
activate-all" should start all available OSDs which haven't been started
yet.

The reason why you are not seeing ceph-osd at X systemd units for OSDs
which haven't been available on this boot is that these units are
purposely lost on a reboot, and only re-enabled for the current boot
when ceph-disk starts the OSD (in systemd speech, they are "runtime"
enabled). This kind of makes sense, since a OSD service can only be
started if its disk is there, and if the disk is there it is supposed to
have already been started via the UDEV rule.

Which PVE and Ceph versions are you on? Is there anything out of the
ordinary about your setup? Could you provide a log of the boot where the
OSDs failed to start? The ceph.service should catch all the OSDs missed
by UDEV on boot..




More information about the pve-user mailing list