[PVE-User] Better understanding CEPH Pool definition

Alwin Antreich a.antreich at proxmox.com
Wed Mar 11 14:12:23 CET 2020


Hello Gregor,

On Wed, Mar 11, 2020 at 10:57:28AM +0100, Gregor Burck wrote:
> Hi,
> 
> I've still problems to understand the pooling definition Size/min in ceph and what it means to us.
> 
> We've a 3 node cluster with 4 SSDs (the smallest sinfull setup in the documention).
:)

> 
> When I define a ceph pool with 3/2, all implied images requiere the triple storrage.
> When I define a ceph pool with 2/1, all implied images requiere the double storrage.
Oi. :) Never go with 2/1. See below for the explanation.

> 
> Means the Size/min the quantity of knodes, over wich the data is distributed?
Not quite. The default distribution level is 'host', see the crush map
[0]. Ceph will not place any PG on the same host again. So a node can
fail without losing all replicas at once.

> 
> But when I take a knode in maintenance, on the 3/2 setup are still 2 units of each images available, on 2/1 maybe only one.
> On a 3/2 setup the filesystem get readonly when 2 nodes are down in the same time, on a 2/1 setup it could be the storage is not available any more?
size, gives Ceph the amount of how many replicas it needs to create to
get to an healthy state. Or in other words how often an object should be
duplicated.

min_size, up to how many replicas need to exist to still allow writes.
If any PG in a pool drops below that, the pool will be placed in
read-only.

You can change those values for the pool [1] later on as well. It will
create some extra IO on the cluster.

> 
> I try to understand the impact for us, when we set two pools, one with 3/2 for important VMs and one with 2/1 for VMs wich could stand a while off, or 've no claim to be every time up to date or so,....
Taken from above, never go with X/1, especially in small clusters. While
in a failure state with min_size = 1, the risk of losing the remaining
copy on a subsequent failure or while in-flight (update) is quite high.

For the later pool run it with 2/2. This way there will be always two
copies needed, while not using the extra space for the third replica.
The down side is OFC, that the pool will be unavailable if it has less
than 2 replicas.

--
Cheers,
Alwin

[0] https://docs.ceph.com/docs/nautilus/rados/operations/crush-map/
[1] https://docs.ceph.com/docs/nautilus/rados/operations/pools/#set-pool-values




More information about the pve-user mailing list