[PVE-User] Ceph: PANIC or DON'T PANIC? ;-)

Marco Gaiarin gaio at sv.lnf.it
Tue Nov 29 12:17:44 CET 2016


Mandi! Alwin Antreich
  In chel di` si favelave...

> May you please show us the logs?

Ok, i'm here. With the log.

A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes
(mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only
mon). In order, mon.0, mon.1, mon.2 and mon.3.

These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs
are all similar, so i hope these suffices.


I've started my activity at 15.00, but before take down all the stuff
i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked:

[...]
2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts
2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok
2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s
2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s
[...]
2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s
2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s


Finished the P2V, i've started to power off the cluster, starting from
the cpu nodes. After powering down a node, i've realized that i need it
to do another thing, so i've re-powered on. ;-)

2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election
2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election
2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election
2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s
2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s
[...]
2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s
2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s
2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election
2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election
2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts
2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok
2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts
2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok
2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s
2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s
2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized
2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s
[...]
2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s
2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts
2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok
2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s
[...]
2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s
2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s
2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s
[...]
2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s
2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts
2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok
2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts
2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok
2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s
[...]
2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s
2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s
[...]
2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s
2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s
2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s
[...]
2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s
2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized
2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s
[...]
2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s
2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s
[...]
2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s
2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s
2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s
[...]
2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s

OK, here server was shut down and so logs stop.


At power up, i got as sayed clock skew troubles, so i got status
HEALTH_WARN:

2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized
2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized
2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s
2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s
2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized
2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized
2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s
2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s
2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized
2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized
2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s
2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s
2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized
2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK
2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s
2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s
2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s
2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized
2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s
2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s
2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s
2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized
2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized
2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized
2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s
2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s
2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s
2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized
2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s
2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s
2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s
2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s
2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s
2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s
2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s
2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s
2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s
2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s
2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s
2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s
2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized
2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized
2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized
2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s
2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s
2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s
2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s
2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s
2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s
2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s
2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s
2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s
2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s
2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s
2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s
2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s
2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s
2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s
2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized
2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK
2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s
2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s
2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s
2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized
2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s
2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s
2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s
2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized
2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s
2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s
2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s
2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized
2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s
2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s
2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s
2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s
2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s
2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s
2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s
2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s
2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s
2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s
2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s
2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s
2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized
2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s
2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s
2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s
2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s
2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s
2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s
2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s
2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s
2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s

But after adding the new NTP sever and waiting some time, finally clock
get in sync and status go to OK.
But (this is the PANIC time) despite of the fact that 'ceph status' and
pve interface say 'all OK', cluster does not work.

So i've started to reboot the CPU nodes (mon.2 and .3):

2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK
2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK
2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized
2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK
2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s
2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized
2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s
2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized
2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected 
2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK


With no luck. So finally i've set 'nodown' and 'noout' flags and
rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as
normal:

2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in
2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set
2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3
2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s
2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s
2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s
2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized
2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election
2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election
2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized
2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s
2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s
2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s
2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in
2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot
2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot
2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in
2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized
2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s
2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s
2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected 
2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs
2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs
2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs
2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs
2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set
2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs
2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs
2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3
2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3
2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs
2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs
2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election
2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election
2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s
2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized
2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in
2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot
2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in
2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in
2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot
2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in
2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in
2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s
2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s
2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in
2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s
2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts
2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok
2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot
2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in
2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s
2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot
2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in
2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts
2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok
2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts
2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok
2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts
2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok
2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts
2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s
2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in
2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s
2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s
2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s
2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s
2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s
2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts
2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok
2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s
[...]
2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s
2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok
2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s
[...]
2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s
2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected 
2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s
[...]
2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s
2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s
[...]
2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s
2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in
2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s
[...]
2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s
2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in
2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s
[...]
2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s
2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK


Other things to note. In syslog (not ceph log) of mon.0 i've found for
the first (falied) boot:

Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 ===
Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica...
Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service.
Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica...
Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 ===
Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700  0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault
Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault
Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault
Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault
Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default'
Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1
Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 ===
Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault
Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault
Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault
Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault
Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault
Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default'
Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1
Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate

And for the second (working):

Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 ===
Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica...
Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service.
Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica...
Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 ===
Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map
Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica...
Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service.
Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 ===
Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map
Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica...
Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service.
Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal


So seems to me that at the first start (some) OSD fail to start. But,
again, PVE and 'ceph status' report all OSDs as up&in.


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)



More information about the pve-user mailing list