[pve-devel] [PATCH pve-ha-manager] delete node from HA stack when deleted from cluster

Thomas Lamprecht t.lamprecht at proxmox.com
Mon Sep 28 11:34:52 CEST 2015


When a node gets deleted from the cluster with pvecm delnode
we set it's node state in the manager status to 'gone'.
When set to gone the manager waits an hour after the node was last
seen online and only then deletes it from the manager status.

When some HA services were forgotten on the node (shouldn't happen
at all!!) the node will be fenced, the service migrated and then its
state reset to 'gone'. After an hour the node will be deleted,
unless it joined the cluster again in the meantime.

Deleting a node from the HA manager status is by no means a final
act, the ha-manager could live without deleting it, but for the user
it is confusing to see dead nodes in the interface.

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---
 src/PVE/HA/NodeStatus.pm | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/NodeStatus.pm b/src/PVE/HA/NodeStatus.pm
index fe8c0ef..eb174cb 100644
--- a/src/PVE/HA/NodeStatus.pm
+++ b/src/PVE/HA/NodeStatus.pm
@@ -24,6 +24,7 @@ my $valid_node_states = {
     online => "node online and member of quorate partition",
     unknown => "not member of quorate partition, but possibly still running",
     fence => "node needs to be fenced",
+    gone => "node vanished from cluster members list, possibly deleted"
 };
 
 sub get_node_state {
@@ -79,6 +80,20 @@ sub list_online_nodes {
     return $res;
 }
 
+my $delete_node = sub {
+    my ($self, $node) = @_;
+
+    return undef if $self->get_node_state($node) ne 'gone';
+
+    my $haenv = $self->{haenv};
+
+    delete $self->{last_online}->{$node};
+    delete $self->{status}->{$node};
+
+    $haenv->log('notice', "deleting gone node '$node', not a cluster member".
+		" anymore.");
+};
+
 my $set_node_state = sub {
     my ($self, $node, $state) = @_;
 
@@ -113,7 +128,7 @@ sub update {
 
 	if ($state eq 'online') {
 	    # &$set_node_state($self, $node, 'online');
-	} elsif ($state eq 'unknown') {
+	} elsif ($state eq 'unknown' || $state eq 'gone') {
 	    &$set_node_state($self, $node, 'online');
 	} elsif ($state eq 'fence') {
 	    # do nothing, wait until fenced
@@ -133,9 +148,16 @@ sub update {
 	if ($state eq 'online') {
 	    &$set_node_state($self, $node, 'unknown');
 	} elsif ($state eq 'unknown') {
-	    # &$set_node_state($self, $node, 'unknown');
+
+	    # node isn't in the member list anymore, deleted from the cluster?
+	    &$set_node_state($self, $node, 'gone') if(!defined($d));
+
 	} elsif ($state eq 'fence') {
 	    # do nothing, wait until fenced
+	} elsif($state eq 'gone') {
+	    if($self->node_is_offline_delayed($node, 3600)) {
+		&$delete_node($self, $node);
+	    }
 	} else {
 	    die "detected unknown node state '$state";
 	}
-- 
2.1.4





More information about the pve-devel mailing list