[PVE-User] PVE, Ceph, OSD in stop/out state: how to restart from commandline?

Wed Jun 7 11:22:17 CEST 2017

On Wed, Jun 07, 2017 at 11:17:09AM +0200, Marco Gaiarin wrote:
> Mandi! Fabian Grünbichler
>   In chel di` si favelave...
> 
> > OSDs are supposed to be enabled by UDEV rules automatically. This does
> > not work on all systems, so PVE installs a ceph.service which triggers a
> > scan for OSDs on all available disks.
> > Either calling "systemctl restart ceph.service" or "ceph-disk
> > activate-all" should start all available OSDs which haven't been started
> > yet.
> 
> Good. I've missed that.
> 
> 
> > The reason why you are not seeing ceph-osd at X systemd units for OSDs
> > which haven't been available on this boot is that these units are
> > purposely lost on a reboot, and only re-enabled for the current boot
> > when ceph-disk starts the OSD (in systemd speech, they are "runtime"
> > enabled). This kind of makes sense, since a OSD service can only be
> > started if its disk is there, and if the disk is there it is supposed to
> > have already been started via the UDEV rule.
> > Which PVE and Ceph versions are you on? Is there anything out of the
> > ordinary about your setup? Could you provide a log of the boot where the
> > OSDs failed to start? The ceph.service should catch all the OSDs missed
> > by UDEV on boot..
> 
> I'm using latest PVE 4.4, ceph hammer.
> 

Hammer is not using per-service-instance systemd units, but a single
init script. You can simply use that init script (directly or via the
"service" wrapper) to start single service instances. My guess is that
your cluster is overloaded when cold-booting, and thus not all OSDs
start within their timeout.