[PVE-User] problems with 3.2-beta

Mon Feb 3 13:57:44 CET 2014

Hello Adam,

thanks for your feedback. For help in your issues with migration or HA, please submit ONE thread for each single problem, multiple complex issues in one thread will lead to  non-answers. 

Here I just answer to your ceph feedback:

> 3. The Wiki page on setting up CEPH Server doesn't mention that you can
> do most of the setup from within the GUI.  Since I have write access
> there, I guess I should fix it myself :-).

[Martin Maurer] 
The wiki tells on which step you can start using the GUI:
http://pve.proxmox.com/wiki/Ceph_Server#Creating_Ceph_Monitors

Also the video tutorial shows what is possible via GUI.

> Ceph speeds are barely acceptable (10-20MB/sec) but that's typical of
> Ceph in my experience so far, even with caching turned on. (Still a bit
> of a letdown compared to Sheepdog's 300MB/sec burst throughput,
> though.)

[Martin Maurer] 
20mb? If you tell your benchmark results, you need to specify what you measure.
Our ceph test cluster - explained and described in the wiki page - got about 260 mb/second (with replication 3) read and write speed inside a single KVM guest, eg. Windows (testing with crystaldiskmark). Ceph is not designed to maximum single performance, its designed to scale out, means you can get good performance with a big number of VMs and you can add always more server to increase storage and speed.

You can do a simple benchmark for your ceph setup, using the rados command:

First, create a new pool via gui, e.g. I name it test2 with replication 2:

Now, run a write test:
> rados -p test2 bench 60 write --no-cleanup
__
Total writes made:      5742
Write size:             4194304
Bandwidth (MB/sec):     381.573
__

Now, do a read test:
> rados -p test2 bench 60 seq

__
Total reads made:     5742
Read size:            4194304
Bandwidth (MB/sec):    974.951
__

Main important parts for performance:
- 10Gbit network 
- at least 4 OSD per node (and at least 3 nodes)
- fast SSD for journal

> One thing I'm not sure of is OSD placement... if I have two drives per
> host dedicated to Ceph (and thus two OSDs), and my pool "size" is 2,
> does that mean a single node failure could render some data
> unreachable?

[Martin Maurer] 
Size 2 means that you data is stored on at least two nodes. So if you lose one node, your data is still accessible from the others.
Take a look on the ceph docs, there is a lot of explanation there.

> I've adjusted my "size" to 3 just in case, but I don't
> understand how this works.  Sheepdog guarantees that multiple copies of
> an object won't be stored on the same host for exactly this reason, but
> I can't tell what Ceph does.

[Martin Maurer] 
Size=3 will be new default setting for ceph (the change will be with firefly release, afaik)

Via the gui you can easily stop and start, add and remove OSDs, so you can see how the cluster behaves in all scenarios. 

Martin