[PVE-User] Cluster issue!

Wed Jan 31 11:50:45 CET 2018

Hi Ian

In my case, I am just installed to make some tests, so I have installed it
inside other proxmox, you know? Nested Virt.
So I guess this is the cause, 'cause in this scenario, there's no way to
use multcast, or there'is??

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

2018-01-31 8:47 GMT-02:00 Ian Coetzee <proxmox at iancoetzee.za.net>:

> Hi Gilberto,
>
> It is not needed to change manually. You need to check that your network
> hardware (switches, NIC's) supports multicast traffic.
>
> I am running a cluster without manually changing the corosync.conf file.
>
> Kind regards
>
> On 31 January 2018 at 12:35, Gilberto Nunes <gilberto.nunes32 at gmail.com>
> wrote:
>
> > >>the bindnetaddr is not the ip of the master, but is used to determine
> in
> > which network corosync sends/receives (so it should not really matter if
> it
> > is
> > >> yyy.120 or yyy.0 as long as those are in the same network)
> > Yes! I know that!
> > My question is: Why do I need change it manualy??
> > I expected that pvecm do it automaticaly...
> > I tried it several times... I have destroy custer, I did a fresh install
> of
> > proxmox. I have created a new cluster, and everytime that I opened the
> > /etc/pve/corosync.conf, the binnetdaddr was set to the IP of NIC
> installed
> > in first pve cluster node;
> > Just after I change to the network IP, instead host IP, so cluster worked
> > properly.
> > I do not understand why I need change it manualy!
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> > 2018-01-31 5:52 GMT-02:00 Dominik Csapak <d.csapak at proxmox.com>:
> >
> > > On 01/30/2018 08:09 PM, Gilberto Nunes wrote:
> > >
> > >> Hi there
> > >>
> > >> After I change the corosync.conf, cluster is function again:
> > >>
> > >> Here's the original corosync.conf, just after I create the cluster:
> > >>
> > >> logging {
> > >>    debug: off
> > >>    to_syslog: yes
> > >> }
> > >>
> > >> nodelist {
> > >>    node {
> > >>      name: pve01
> > >>      nodeid: 1
> > >>      quorum_votes: 1
> > >>      ring0_addr: 10.10.10.210
> > >>    }
> > >>    node {
> > >>      name: pve02
> > >>      nodeid: 2
> > >>      quorum_votes: 1
> > >>      ring0_addr: 10.10.10.220
> > >>    }
> > >> }
> > >>
> > >> quorum {
> > >>    provider: corosync_votequorum
> > >> }
> > >>
> > >> totem {
> > >>    cluster_name: HOMECLUSTER
> > >>    config_version: 2
> > >>    interface {
> > >>      bindnetaddr: 10.10.10.120    -----------> this is the IP of
> > "master"
> > >> node....
> > >>
> > >
> > > the bindnetaddr is not the ip of the master, but is used to determine
> in
> > > which network corosync sends/receives (so it should not really matter
> if
> > it
> > > is yyy.120 or yyy.0 as long as those are in the same network)
> > >
> > >
> > >      ringnumber: 0
> > >>
> > >>    }
> > >>    ip_version: ipv4
> > >>    secauth: on
> > >>    version: 2
> > >> }
> > >> }
> > >>
> > >>
> > >> And this is the "now working" version:
> > >>
> > >> logging {
> > >>    debug: off
> > >>    to_syslog: yes
> > >> }
> > >>
> > >> nodelist {
> > >>    node {
> > >>      name: pve01
> > >>      nodeid: 1
> > >>      quorum_votes: 1
> > >>      ring0_addr: 10.10.10.210
> > >>    }
> > >>    node {
> > >>      name: pve02
> > >>      nodeid: 2
> > >>      quorum_votes: 1
> > >>      ring0_addr: 10.10.10.220
> > >>    }
> > >> }
> > >>
> > >> quorum {
> > >>    provider: corosync_votequorum
> > >> }
> > >>
> > >> totem {
> > >>    cluster_name: HOMECLUSTER
> > >>    config_version: 2
> > >>    interface {
> > >>      bindnetaddr: 10.10.10.0
> > >>      ringnumber: 0
> > >>      mcastport: 5405
> > >>    }
> > >>    transport: udpu
> > >>
> > >
> > > i guess this is the thing which made it work, namely i guess that
> > > multicast does not properly work in your network
> > >
> > >    ip_version: ipv4
> > >>    secauth: on
> > >>    version: 2
> > >> }
> > >> logging {
> > >>          fileline: off
> > >>          to_logfile: yes
> > >>          to_syslog: yes
> > >>          debug: off
> > >>          logfile: /var/log/cluster/corosync.log
> > >>          debug: off
> > >>          timestamp: on
> > >>          logger_subsys {
> > >>                  subsys: AMF
> > >>                  debug: off
> > >>          }
> > >> }
> > >>
> > >>
> > >> After reboot, everything is running smootlhy
> > >>
> > >> ---
> > >> Gilberto Nunes Ferreira
> > >>
> > >> (47) 3025-5907
> > >> (47) 99676-7530 - Whatsapp / Telegram
> > >>
> > >> Skype: gilberto.nunes36
> > >>
> > >>
> > >>
> > >>
> > >> 2018-01-30 15:39 GMT-02:00 Gilberto Nunes <gilberto.nunes32 at gmail.com
> >:
> > >>
> > >> Hi
> > >>>
> > >>> I have a fresh instalation of Proxmox 5.1.
> > >>> In the /etc/hosts I have:
> > >>>
> > >>> 127.0.0.1 localhost.localdomain localhost
> > >>> 10.10.10.210 pve01.domain.com pve01 pvelocalhost
> > >>> 10.10.10.220 pve02.domain.com pve02
> > >>>
> > >>> in both sides, pve01 and pve02
> > >>>
> > >>> I form the cluster with the command pvecm create HOMECLUSTER
> > >>> I ssh to pve02 and do pvecm add pve01.
> > >>> The cluster are formed as expected, but after 2 minutes, I get this
> > error
> > >>> in /var/log/syslog:
> > >>>
> > >>> Jan 30 15:23:04 pve01 corosync[1482]: error   [TOTEM ] FAILED TO
> > RECEIVE
> > >>> Jan 30 15:23:04 pve01 corosync[1482]:  [TOTEM ] FAILED TO RECEIVE
> > >>> Jan 30 15:23:05 pve01 corosync[1482]: notice  [TOTEM ] A new
> > membership (
> > >>> 10.10.10.210:12) was formed. Members left: 2
> > >>> Jan 30 15:23:05 pve01 corosync[1482]: notice  [TOTEM ] Failed to
> > receive
> > >>> the leave message. failed: 2
> > >>> Jan 30 15:23:05 pve01 corosync[1482]:  [TOTEM ] A new membership (
> > >>> 10.10.10.210:12) was formed. Members left: 2
> > >>> Jan 30 15:23:05 pve01 corosync[1482]:  [TOTEM ] Failed to receive the
> > >>> leave message. failed: 2
> > >>>
> > >>> So, I stop the cluster ( systemctl stop pve-cluster;systemctl stop
> > >>> corosync) and start pmxcfs -l (localy).
> > >>> I saw that in /etc/pve/corosync.conf file, the statement line:
> > >>>
> > >>>      bindnetaddr: 10.10.10.210
> > >>>
> > >>> So after I change this line to this:
> > >>>
> > >>>      bindnetaddr: 10.10.10.0
> > >>>
> > >>> and restart both nodes, the cluster back to normality.
> > >>>
> > >>> This second line wouldn't add but pvecm script?
> > >>> Why I need to change it to the network address by myself and not
> pvecm
> > >>> script do this automaticaly??
> > >>>
> > >>> I cannot understand!
> > >>>
> > >>> Any advice?
> > >>>
> > >>> Thanks a lot.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> ---
> > >>> Gilberto Nunes Ferreira
> > >>>
> > >>> (47) 3025-5907
> > >>> (47) 99676-7530 - Whatsapp / Telegram
> > >>>
> > >>> Skype: gilberto.nunes36
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >> pve-user mailing list
> > >> pve-user at pve.proxmox.com
> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >>
> > >>
> > >
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-user at pve.proxmox.com
> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >
> > _______________________________________________
> > pve-user mailing list
> > pve-user at pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>