[PVE-User] High ceph OSD latency

Fabrizio Cuseo f.cuseo at panservice.it
Fri Jan 16 16:36:33 CET 2015


Following my problem, is correct that proxmox uses "barrier=1" on Ceph OSDS and "barrier=0" on /var/lib/vz  ? 

With barrier enabled, fsyncs/second values are really different:

root at proxmox:~# pveperf /var/lib/vz
CPU BOGOMIPS:      40000.24
REGEX/SECOND:      932650
HD SIZE:           325.08 GB (/dev/mapper/pve-data)
BUFFERED READS:    97.43 MB/sec
AVERAGE SEEK TIME: 11.57 ms
FSYNCS/SECOND:     20.88
DNS EXT:           69.87 ms
DNS INT:           63.98 ms (test.panservice)


root at proxmox:~# mount -o remount -o barrier=0 /var/lib/vz

root at proxmox:~# pveperf /var/lib/vz
CPU BOGOMIPS:      40000.24
REGEX/SECOND:      980519
HD SIZE:           325.08 GB (/dev/mapper/pve-data)
BUFFERED READS:    82.29 MB/sec
AVERAGE SEEK TIME: 12.10 ms
FSYNCS/SECOND:     561.09
DNS EXT:           64.09 ms
DNS INT:           77.50 ms (test.panservice)

Regards, Fabrizio 


----- Messaggio originale -----
Da: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
A: pve-user at pve.proxmox.com, "Fabrizio Cuseo" <f.cuseo at panservice.it>
Inviato: Giovedì, 15 gennaio 2015 13:17:07
Oggetto: Re: [PVE-User] High ceph OSD latency

On Thu, 15 Jan 2015 11:25:44 AM Fabrizio Cuseo wrote:
> What is strange is that on OSD tree I have high latency: tipically Apply
> latency is between 5 and 25, but commit lattency is between 150 and 300
> (and sometimes 5/600), with 5/10 op/s and some B/s rd/wr (i have only 3
> vms, and only 1 is working now, so the cluster is really unloaded).
> 
> I am using a pool with 3 copies, and I have increased pg_num to 256 (the
> default value of 64 is too low); but OSD latency is the same with a
> different pg_num value.
> 
> I have other clusters (similar configuration, using dell 2950, dual ethernet
> for ceph and proxmox, 4 x OSD with 1Tbyte drive, perc 5i controller), with
> several vlms, and the commit and apply latency is 1/2ms.
> 
> Another cluster (test cluster) with 3 x dell PE860, with only 1 OSD per
> node, have better latency (10/20 ms).
> 
> What can i check ? 


POOMA U, but if you have one drive or controller that is marginal or failing, 
it can slow down the whole cluster.

Might be worth while benching individual osd's

-- 
---
Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492



More information about the pve-user mailing list