Quarantined node haven't joined back the cluster even after multiple restart

muthukmk · September 20, 2018, 5:07am

I’m facing an issue where the quarantined node haven’t joined back the cluster even after multiple restart.

The autodown is enable in akka.conf and is set to 300s.

Please refer to the below sequence that happens after I restart the quarantined node.

Akka version used here is 2.4.7.

Following are the key observations

Healthy 3 node cluster is formed with Node-1, Node-2 and Node-3 members
Node 2 is shutdown - this is the test scenario
Node 1 detects above and correctly moves member to UNREACHABLE state and after autodown period of 5 minutes, the node moves to DOWN state
Node 2 is fully restarted (hence its ActorSystem)
Node 2 joins the cluster and Node 1 identifies the node
Node 2 is now full member of the cluster and is visible to all other nodes
Within seconds, Node 2 again gets quarantined
When quarantined node - Node 2 gets restarted , it again joins the cluster and (7) and (8) keeps repeating irrespective of multiple reboots of Node 2

Few points troubleshooted so far

Are there any heavy GC / Non-GC pauses in any nodes of Cluster ? Ans : No
Are there any network issues between the cluster nodes ? Ans : No firewalls , no dynamic port blocking rules and ping latency between nodes is normal
Is remoting used explictly ? No, only Akka Cluster
Is auto-down on unreachable enabled - Yes with 300s timeout
Is persistence used - Yes
Is persistence querying used - No

Questions:

Is this a reported issue in Akka 2.4.7 ?
Would disabling of auto-down help ? I see contrary case here - reported to have persisted uptil 2.4.12 https://github.com/akka/akka/issues/20296 and hence chose not to disable auto-down

Please let me know if any additional details are required

Regards
Muthu

Topic		Replies	Views
Cluster gets down automatically with "Shutting down myself" message Akka akka-cluster	3	1969	August 27, 2018
How to avoid nodes to be quarantined in Akka Cluster? Akka Cluster akka , akka-cluster	2	3211	August 25, 2018
New incarnation of existing member is trying to join. Existing will be removed from the cluster and then new member will be allowed to join Akka	7	1273	April 13, 2018
Nodes not rejoining after cluster spilt Akka Cluster akka-cluster	3	1477	April 18, 2018
How to NOT use akka.cluster.auto-down-unreachable-after Akka Cluster	3	2752	October 2, 2018