RoleLeaderChanged appoints 2 leaders during start-up of a node

Akka 2.6.8
Java 8

I have found another source of unstable behavior in our production cluster.

Following situation:

  1. I have 2 running nodes A and B. Node A has been appointed as the leader by evaluating the RoleLeaderChanged event.
  2. Node B (the one that is not the leader) gets restarted.
  3. During start-up node B gets appointed as a leader via RoleLeaderChanged event. Node A remains leader during this time and does not get any notifications. Some actors now cause damage, because they are running on 2 nodes now.
  4. After a short period of time, node B gets another RoleLeaderChanged event and recognizes node A as the leader now. Now everything is fine, but the leader on node A cannot recover the damage that node B has created, because it does not even get to know that there was a second leader for some time.

Here are the relevant log lines. The node is leader for 5 seconds until it gets the Up status.

2020-09-22T22:08:34.077Z INFO  myown - Handle RoleLeaderChanged, selfAddress=akka://ClusterSystem@, leaderAddress=akka://ClusterSystem@, isLeader=true
2020-09-22T22:08:39.477Z INFO  akka.cluster.Cluster - Cluster Node [akka://ClusterSystem@] - Marking node as REACHABLE [Member(address = akka://ClusterSystem@, status = Up)].
2020-09-22T22:08:39.478Z INFO  akka.cluster.Cluster - Cluster Node [akka://ClusterSystem@] - is no longer leader
2020-09-22T22:08:39.479Z INFO  myown - Handle RoleLeaderChanged, selfAddress=akka://ClusterSystem@, leaderAddress=akka://ClusterSystem@, isLeader=false

Is this expected behavior? I can certainly evaluate Member.Up events as well, but this makes it much harder to rely on RoleLeaderChanged events. I would have expected that AKKA would not send any RoleLeaderChanged events until a decision can be made. Or if it does then without a leader being set.

A common misconception is that there is some kind of leader election that guarantees that there is only one leader at a time. That is not what Akka’s leader is about. See docs for more details.

Leader events are rare to be used by applications. ClusterSingleton is often what should be used instead.

Thank you, @patriknw for clarifying. This wasn’t as obvious to me from the documentation, but I have got it now. :slight_smile: