[pve-devel] [PATCH cluster v3 12/14] api/cluster: create cluster in worker

Wed Dec 20 08:01:04 CET 2017

On 12/19/2017 02:55 PM, Fabian Grünbichler wrote:
> [...]
> $ pvecm create test
> Corosync Cluster Engine Authentication key generator.
> Gathering 1024 bits for key from /dev/urandom.
> Writing corosync key to /etc/corosync/authkey.
> Writing corosync config to /etc/pve/corosync.conf
> Restart corosync and cluster filesystem
> ipcc_send_rec[4] failed: Transport endpoint is not connected
> 
> adding an artifical delay before returning from the forked worker does
> not help, so it does not seem like it's just the startup of
> pmxcfs/corosync:
> 

For clarity, PVE::IPCC caches the connection internally, once pmxcfs gets
restarted the next IPCC call will *always* fail – no matter if the pmxcfs
is up and ready again, the connection cache gets deleted and the IPCC call
after the doomed-to-fail-one will try to reconnect, (normally) succeed and
then continue happy as ever.

My proposal to address this is to retry the connection again, once only, if
we invalidate the connection cache. If we succeed then good else no loss.

Besides making the above warning go away, this would additionally make
pve-cluster restarts generally more transparent for daemons and remove one
or the other "IPCC endpoint not connected" log error.