[PVE-User] Proxmox CEPH 6 servers failures!

Woods, Ken A (DNR) ken.woods at alaska.gov
Fri Oct 5 18:35:41 CEST 2018


Gilberto, 

I have a questions, which I think many of us have, given your recent and not-so-recent history.   Please don’t take them as insults, they’re not intended as such.  I’m just trying to figure out how to best help you solve the problems you keep having.  

Have you read any documentation ?
At all?  Even just a quick-start guide?  If so, did you retain any of it?   (Odd numbers, quorum, etc) 

Or—-do you fire off an email to the list without first trying to find the solution yourself?

Additionally,  how many times does it take for you to receive the same answer before you believe it?  
Have you considered buying a full service maintenance subscription?

Thanks, I’m pretty sure if we can figure out how you think about these issues, we can better help you.  .......Because at this point, I’m ready to start telling you to STFU&RTFM. 

Compassionately,

Ken



> On Oct 5, 2018, at 07:49, Gilberto Nunes <gilberto.nunes32 at gmail.com> wrote:
> 
> I have 6 monitors.
> What if I reduce it to 5? Or 4? Would help??
> ---
> Gilberto Nunes Ferreira
> 
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
> 
> Skype: gilberto.nunes36
> 
> 
> 
> 
> 
> Em sex, 5 de out de 2018 às 11:46, Marcus Haarmann <
> marcus.haarmann at midoco.de> escreveu:
> 
>> This is corosync you are talking about. Also there, a quorum is needed to
>> work properly.
>> It needs to be configured in the same way as ceph.
>> You will always need a majority (e.g 4 out of 6, 3 out of 6 wont do).
>> 
>> You main problem can be that you might lose one location and the part
>> which has the majority of servers
>> is down.
>> In my opinion, in your situation a 7th server would get you to 7 active
>> servers, 4 needed,
>> so 3 can be offline (remember to check your crush map so you will have a
>> working ceph cluster
>> on the remaining servers).
>> Depending on which side is getting offline, only one side will be able to
>> operate without the other,
>> but the other side won't.
>> 
>> Marcus Haarmann
>> 
>> 
>> Von: "Gilberto Nunes" <gilberto.nunes32 at gmail.com>
>> An: "pve-user" <pve-user at pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 15:08:24
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>> 
>> Ok! Now I get it!
>> pvecm show me
>> pve-ceph01:/etc/pve# pvecm status
>> Quorum information
>> ------------------
>> Date: Fri Oct 5 10:04:57 2018
>> Quorum provider: corosync_votequorum
>> Nodes: 6
>> Node ID: 0x00000001
>> Ring ID: 1/32764
>> Quorate: Yes
>> 
>> Votequorum information
>> ----------------------
>> Expected votes: 6
>> Highest expected: 6
>> Total votes: 6
>> Quorum: 4
>> Flags: Quorate
>> 
>> Membership information
>> ----------------------
>> Nodeid Votes Name
>> 0x00000001 1 10.10.10.100 (local)
>> 0x00000002 1 10.10.10.110
>> 0x00000003 1 10.10.10.120
>> 0x00000004 1 10.10.10.130
>> 0x00000005 1 10.10.10.140
>> 0x00000006 1 10.10.10.150
>> 
>> *Quorum: 4*
>> So I need 4 server online, at least!
>> Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
>> which is required...
>> I will request new server to make quorum. Thanks for clarify this
>> situation!
>> ---
>> Gilberto Nunes Ferreira
>> 
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>> 
>> Skype: gilberto.nunes36
>> 
>> 
>> 
>> 
>> 
>> Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
>> gilberto.nunes32 at gmail.com> escreveu:
>> 
>>> Folks...
>>> 
>>> I CEPH servers are in the same network: 10.10.10.0/24...
>>> There is a optic channel between the builds: buildA and buildB, just to
>>> identified!
>>> When I create the cluster in first time, 3 servers going down in buildB,
>>> and the remain ceph servers continued to worked properly...
>>> I do not understand why now this cant happens anymore!
>>> Sorry if I sound like a newbie! I still learn about it!
>>> ---
>>> Gilberto Nunes Ferreira
>>> 
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>> 
>>> Skype: gilberto.nunes36
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
>>> marcus.haarmann at midoco.de> escreveu:
>>> 
>>>> Gilberto,
>>>> 
>>>> the underlying problem is a ceph problem and not related to VMs or
>>>> Proxmox.
>>>> The ceph system requires a mayority of monitor nodes to be active.
>>>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>>>> when two of these servers are gone.
>>>> Check "ceph -s" on each side if you see any reaction of ceph.
>>>> If not, probably not enough mons are present.
>>>> 
>>>> Also, when one side is down you should see a non-presence of some OSD
>>>> instances.
>>>> In this case, ceph might be up but your VMs which are spread over the
>> OSD
>>>> disks,
>>>> might block because of the non-accessibility of the primary storage.
>>>> The distribution of data over the OSD instances is steered by the crush
>>>> map.
>>>> You should make sure to have enough copies configured and the crush map
>>>> set up in a way
>>>> that on each side of your cluster is minimum one copy.
>>>> In case the crush map is mis-configured, all copies of your data may be
>>>> on the wrong side,
>>>> esulting in proxmox not being able to access the VM data.
>>>> 
>>>> Marcus Haarmann
>>>> 
>>>> 
>>>> Von: "Gilberto Nunes" <gilberto.nunes32 at gmail.com>
>>>> An: "pve-user" <pve-user at pve.proxmox.com>
>>>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>>>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>>> 
>>>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
>> this
>>>> VM into Cluster Ceph, can I solve to quorum problem?
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>> 
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>> 
>>>> Skype: gilberto.nunes36
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Em sex, 5 de out de 2018 às 09:23, dorsy <dorsyka at yahoo.com> escreveu:
>>>>> 
>>>>> Your question has already been answered. You need majority to have
>>>> quorum.
>>>>> 
>>>>>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>>>>> Hi
>>>>>> Perhaps this can help:
>>>>>> 
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__imageshack.com_a_img921_6208_X7ha8R.png&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=ol07vaB33zwEaLWY7eR90cAScnrpD7QJI5G1zpMMlKI&e=
>>>>>> 
>>>>>> I was thing about it, and perhaps if I deploy a VM in both side,
>> with
>>>>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>>>>> 
>>>>>> thanks
>>>>>> ---
>>>>>> Gilberto Nunes Ferreira
>>>>>> 
>>>>>> (47) 3025-5907
>>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>> 
>>>>>> Skype: gilberto.nunes36
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>>>>> aderumier at odiso.com>
>>>>>> escreveu:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Can you resend your schema, because it's impossible to read.
>>>>>>> 
>>>>>>> 
>>>>>>> but you need to have to quorum on monitor to have the cluster
>>>> working.
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Mail original -----
>>>>>>> De: "Gilberto Nunes" <gilberto.nunes32 at gmail.com>
>>>>>>> À: "proxmoxve" <pve-user at pve.proxmox.com>
>>>>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>>>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>>>> 
>>>>>>> Hi there
>>>>>>> 
>>>>>>> I have something like this:
>>>>>>> 
>>>>>>> CEPH01 ----|
>>>>>>> |----- CEPH04
>>>>>>> |
>>>>>>> |
>>>>>>> CEPH02
>>>> ----|-----------------------------------------------------|----
>>>>>>> CEPH05
>>>>>>> | Optic Fiber
>>>>>>> |
>>>>>>> CEPH03 ----|
>>>>>>> |--- CEPH06
>>>>>>> 
>>>>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>>>> CEPH03
>>>>>>> remains, the entire cluster fail!
>>>>>>> I find out the cause!
>>>>>>> 
>>>>>>> ceph.conf
>>>>>>> 
>>>>>>> [global] auth client required = cephx auth cluster required =
>> cephx
>>>> auth
>>>>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>>>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>>>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
>>>> osd
>>>>>>> journal size = 5120 osd pool default min size = 2 osd pool default
>>>> size
>>>>> =
>>>>>>> 3
>>>>>>> public network = 10.10.10.0/24 [osd] keyring =
>>>>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
>>>> pve-ceph01
>>>>> mon
>>>>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>>>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789
>> mon
>>>> osd
>>>>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03
>> mon
>>>>> addr
>>>>>>> =
>>>>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph04]
>>>>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
>>>> primary
>>>>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>>>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph06]
>>>>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
>>>> primary
>>>>>>> affinity = true
>>>>>>> 
>>>>>>> Any help will be welcome!
>>>>>>> 
>>>>>>> ---
>>>>>>> Gilberto Nunes Ferreira
>>>>>>> 
>>>>>>> (47) 3025-5907
>>>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>>> 
>>>>>>> Skype: gilberto.nunes36
>>>>>>> _______________________________________________
>>>>>>> pve-user mailing list
>>>>>>> pve-user at pve.proxmox.com
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> pve-user mailing list
>>>>>>> pve-user at pve.proxmox.com
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at pve.proxmox.com
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at pve.proxmox.com
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at pve.proxmox.com
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=


More information about the pve-user mailing list