[PVE-User] Poor CEPH performance? or normal?

Thu Jul 26 12:25:45 CEST 2018

Hi Ronny,

Thanks for your suggestions. Do you know if it is possible to change an
existing rbd pool to striping? or does this have to be done on first setup?

Regards,
Mark

On Wed, 25 Jul 2018, 19:20 Ronny Aasen, <ronny+pve-user at aasen.cx> wrote:

> On 25. juli 2018 02:19, Mark Adams wrote:
> > Hi All,
> >
> > I have a proxmox 5.1 + ceph cluster of 3 nodes, each with 12 x WD 10TB
> GOLD
> > drives. Network is 10Gbps on X550-T2, separate network for the ceph
> cluster.
> >
> > I have 1 VM currently running on this cluster, which is debian stretch
> with
> > a zpool on it. I'm zfs sending in to it, but only getting around ~15MiB/s
> > write speed. does this sound right? it seems very slow to me.
> >
> > Not only that, but when this zfs send is running - I can not do any
> > parallel sends to any other zfs datasets inside of the same VM. They just
> > seem to hang, then eventually say "dataset is busy".
> >
> > Any pointers or insights greatly appreciated!
>
> Greetings
>
> alwin gave you some good advice about filesystems and vm's, i wanted to
> say a little about ceph.
>
> with 3 nodes, and the default and reccomended size=3 pools, you can not
> tolerate any node failures. IOW, if you loose a node, or need to do
> lengthy maintainance on it, you are running degraded. I allways have a
> 4th "failure domain" node. so my cluster can selfheal (one of cephs
> killer features)  from a node failure. your cluster should be
>
> 3+[how-many-node-failures-i-want-to-be-able-to-survive-and-still-operate-sanely]
>
> spinning osd's with bluestore benefit greatly from ssd DB/WAL's if your
> osd's have ondisk DB/WAL you can gain a lot of performance by having the
> DB/WAL on a SSD or better.
>
> ceph gains performance with scale(number of osd nodes) . so while ceph's
> aggeregate performance is awesome, an individual single thread will not
> be amazing. A given set of data will exist on all 3 nodes, and you will
> hit 100% of nodes with any write.  so by using ceph with 3 nodes you
> give ceph the worst case for performance. eg
> with 4 nodes a write would hit 75%, with 6 nodes it would hit 50% of the
> cluster. you see where this is going...
>
> But a single write will only hit one disk in 3 nodes, and will not have
> a better performance then the disk it hits. you can cheat more
> performance with rbd caching. and it is important for performance to get
> a higher queue depth. afaik zfs uses a queue depth of 1, for ceph the
> worst possible. you may have some success by buffering on one or both
> ends of the transfer [1]
>
> if the vm have a RBD disk, you may (or may not) benefit from rbd fancy
> striping[2],  since operations can hit more osd's in parallel.
>
>
> good luck
> Ronny Aasen
>
>
> [1]
>
> https://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/
> [2] http://docs.ceph.com/docs/master/architecture/#data-striping
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>