[PVE-User] Proxmox CEPH 6 servers failures!

Lindsay Mathieson lindsay.mathieson at gmail.com
Fri Oct 5 18:16:21 CEST 2018


Your Ceph cluster requires quorum to operate and that is based on your 
monitor nodes, not the OSD ones, which your diagram earlier doesn't detail.

How many monitor nodes do you have, and where are they located?

nb. You should only have an odd number of monitor nodes.


On 5/10/2018 10:53 PM, Gilberto Nunes wrote:
> Folks...
>
> I CEPH servers are in the same network: 10.10.10.0/24...
> There is a optic channel between the builds: buildA and buildB, just to
> identified!
> When I create the cluster in first time, 3 servers going down in buildB,
> and the remain ceph servers continued to worked properly...
> I do not understand why now this cant happens anymore!
> Sorry if I sound like a newbie! I still learn about it!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> marcus.haarmann at midoco.de> escreveu:
>
>> Gilberto,
>>
>> the underlying problem is a ceph problem and not related to VMs or
>> Proxmox.
>> The ceph system requires a mayority of monitor nodes to be active.
>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>> when two of these servers are gone.
>> Check "ceph -s" on each side if you see any reaction of ceph.
>> If not, probably not enough mons are present.
>>
>> Also, when one side is down you should see a non-presence of some OSD
>> instances.
>> In this case, ceph might be up but your VMs which are spread over the OSD
>> disks,
>> might block because of the non-accessibility of the primary storage.
>> The distribution of data over the OSD instances is steered by the crush
>> map.
>> You should make sure to have enough copies configured and the crush map
>> set up in a way
>> that on each side of your cluster is minimum one copy.
>> In case the crush map is mis-configured, all copies of your data may be on
>> the wrong side,
>> esulting in proxmox not being able to access the VM data.
>>
>> Marcus Haarmann
>>
>>
>> Von: "Gilberto Nunes" <gilberto.nunes32 at gmail.com>
>> An: "pve-user" <pve-user at pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
>> VM into Cluster Ceph, can I solve to quorum problem?
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:23, dorsy <dorsyka at yahoo.com> escreveu:
>>
>>> Your question has already been answered. You need majority to have
>> quorum.
>>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>>> Hi
>>>> Perhaps this can help:
>>>>
>>>> https://imageshack.com/a/img921/6208/X7ha8R.png
>>>>
>>>> I was thing about it, and perhaps if I deploy a VM in both side, with
>>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>>>
>>>> thanks
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>>
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>> Skype: gilberto.nunes36
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>>> aderumier at odiso.com>
>>>> escreveu:
>>>>
>>>>> Hi,
>>>>>
>>>>> Can you resend your schema, because it's impossible to read.
>>>>>
>>>>>
>>>>> but you need to have to quorum on monitor to have the cluster
>> working.
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Gilberto Nunes" <gilberto.nunes32 at gmail.com>
>>>>> À: "proxmoxve" <pve-user at pve.proxmox.com>
>>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>>
>>>>> Hi there
>>>>>
>>>>> I have something like this:
>>>>>
>>>>> CEPH01 ----|
>>>>> |----- CEPH04
>>>>> |
>>>>> |
>>>>> CEPH02
>> ----|-----------------------------------------------------|----
>>>>> CEPH05
>>>>> | Optic Fiber
>>>>> |
>>>>> CEPH03 ----|
>>>>> |--- CEPH06
>>>>>
>>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>> CEPH03
>>>>> remains, the entire cluster fail!
>>>>> I find out the cause!
>>>>>
>>>>> ceph.conf
>>>>>
>>>>> [global] auth client required = cephx auth cluster required = cephx
>> auth
>>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
>>>>> journal size = 5120 osd pool default min size = 2 osd pool default
>> size
>>> =
>>>>> 3
>>>>> public network = 10.10.10.0/24 [osd] keyring =
>>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
>>> mon
>>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
>> osd
>>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
>>> addr
>>>>> =
>>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
>>> [mon.pve-ceph04]
>>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
>>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
>>> [mon.pve-ceph06]
>>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
>>>>> affinity = true
>>>>>
>>>>> Any help will be welcome!
>>>>>
>>>>> ---
>>>>> Gilberto Nunes Ferreira
>>>>>
>>>>> (47) 3025-5907
>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>
>>>>> Skype: gilberto.nunes36
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com
>>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>>
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com
>>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at pve.proxmox.com
>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


-- 
Lindsay




More information about the pve-user mailing list