[PVE-User] (Very) basic question regarding PVE Ceph integration

Sun Dec 16 15:39:05 CET 2018

Hello Frank,

On Sun, Dec 16, 2018 at 02:28:19PM +0100, Frank Thommen wrote:
> Hi,
> 
> I understand that with the new PVE release PVE hosts (hypervisors) can be
> used as Ceph servers.  But it's not clear to me if (or when) that makes
> sense.  Do I really want to have Ceph MDS/OSD on the same hardware as my
> hypervisors?  Doesn't that a) accumulate multiple POFs on the same hardware
> and b) occupy computing resources (CPU, RAM), that I'd rather use for my VMs
> and containers?  Wouldn't I rather want to have a separate Ceph cluster?
The integration of Ceph services in PVE started with Proxmox VE 3.0.
With PVE 5.3 (current) we added CephFS services to the PVE. So you can
run a hyper-converged Ceph with RBD/CephFS on the same servers as your
VM/CT.

a) can you please be more specific in what you see as multiple point of
failures?

b) depends on the workload of your nodes. Modern server hardware has
enough power to be able to run multiple services. It all comes down to
have enough resources for each domain (eg. Ceph, KVM, CT, host).

I recommend to use a simple calculation for the start, just to get a
direction.

In principle:

==CPU==
core='CPU with HT on'

* reserve a core for each Ceph daemon
  (preferable on the same NUMA as the network; higher frequency is
  better)
* one core for the network card (higher frequency = lower latency)
* rest of the cores for OS (incl. monitoring, backup, ...), KVM/CT usage
* don't overcommit

==Memory==
* 1 GB per TB of used disk space on an OSD (more on recovery)
* enough memory for KVM/CT
* free memory for OS, backup, monitoring, live migration
* don't overcommit

==Disk==
* one OSD daemon per disk, even disk sizes throughout the cluster
* more disks, more hosts, better distribution

==Network==
* at least 10 GbE for storage traffic (more the better),
  see our benchmark paper
  https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
* separate networks, cluster, storage, client traffic,
  additional for separate the migration network from any other
* use to physical networks for corosync (ring0 & ring1)

This list doesn't cover every aspect (eg. how much failure is allowed),
but I think it is a good start. With the above points for the sizing of
your cluster, the question of a separation of a hyper-converged service
might be a little easier.

--
Cheers,
Alwin