[PVE-User] (Very) basic question regarding PVE Ceph integration

Alwin Antreich alwin at antreich.com
Sun Dec 16 19:47:40 CET 2018


On Sun, Dec 16, 2018 at 05:16:50PM +0100, Frank Thommen wrote:
> Hi Alwin,
> 
> On 16/12/18 15:39, Alwin Antreich wrote:
> > Hello Frank,
> > 
> > On Sun, Dec 16, 2018 at 02:28:19PM +0100, Frank Thommen wrote:
> > > Hi,
> > > 
> > > I understand that with the new PVE release PVE hosts (hypervisors) can be
> > > used as Ceph servers.  But it's not clear to me if (or when) that makes
> > > sense.  Do I really want to have Ceph MDS/OSD on the same hardware as my
> > > hypervisors?  Doesn't that a) accumulate multiple POFs on the same hardware
> > > and b) occupy computing resources (CPU, RAM), that I'd rather use for my VMs
> > > and containers?  Wouldn't I rather want to have a separate Ceph cluster?
> > The integration of Ceph services in PVE started with Proxmox VE 3.0.
> > With PVE 5.3 (current) we added CephFS services to the PVE. So you can
> > run a hyper-converged Ceph with RBD/CephFS on the same servers as your
> > VM/CT.
> > 
> > a) can you please be more specific in what you see as multiple point of
> > failures?
> 
> not only I run the hypervisor which controls containers and virtual machines
> on the server, but also the fileservice which is used to store the VM and
> container images.
Sorry, I am still not quite sure, what your question/concern is.
Failure tolerance needs to be planned into the system design, irrespective
of service distribution.

Proxmox VE has a HA stack that restarts all services from a failed node
(if configured) on a other node.
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html

Ceph does selfhealing (if enough nodes
are available) or still works in a degraded state.
http://docs.ceph.com/docs/luminous/start/intro/

> 
> 
> > b) depends on the workload of your nodes. Modern server hardware has
> > enough power to be able to run multiple services. It all comes down to
> > have enough resources for each domain (eg. Ceph, KVM, CT, host).
> > 
> > I recommend to use a simple calculation for the start, just to get a
> > direction.
> > 
> > In principle:
> > 
> > ==CPU==
> > core='CPU with HT on'
> > 
> > * reserve a core for each Ceph daemon
> >    (preferable on the same NUMA as the network; higher frequency is
> >    better)
> > * one core for the network card (higher frequency = lower latency)
> > * rest of the cores for OS (incl. monitoring, backup, ...), KVM/CT usage
> > * don't overcommit
> > 
> > ==Memory==
> > * 1 GB per TB of used disk space on an OSD (more on recovery)
> > * enough memory for KVM/CT
> > * free memory for OS, backup, monitoring, live migration
> > * don't overcommit
> > 
> > ==Disk==
> > * one OSD daemon per disk, even disk sizes throughout the cluster
> > * more disks, more hosts, better distribution
> > 
> > ==Network==
> > * at least 10 GbE for storage traffic (more the better),
> >    see our benchmark paper
> >    https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
> > * separate networks, cluster, storage, client traffic,
> >    additional for separate the migration network from any other
> > * use to physical networks for corosync (ring0 & ring1)
> > 
> > This list doesn't cover every aspect (eg. how much failure is allowed),
> > but I think it is a good start. With the above points for the sizing of
> > your cluster, the question of a separation of a hyper-converged service
> > might be a little easier.
> 
> Thanks a lot.  This sure helps in our planning.
> 
> frank
You're welcome.

--
Cheers,
Alwin



More information about the pve-user mailing list