ClusterSingletonManager restart on same node causes ClusterSingletonProxy failure

ollyw · September 18, 2018, 9:21am

I have a fairly niche situation where the codebase I am working on causes a restart of the ClusterSingletonManager (not just the singleton it is managing). This can be due to the fact that the CSM parent has restarted. Importantly in this case, there would have been no changes to cluster topology. I expected the ClusterSingletonProxy to attempt to identify the actor after it receives the terminate message, as it is watching the singleton once it finds it, and resume forwarding messages. It doesn’t do this when testing with the codebase.

Here is a snippet of code in ClusterSingletonProxy:

  def receive = {
    ...
    case Terminated(ref) ⇒
      if (singleton.contains(ref)) {
        // buffering mode, identification of new will start when old node is removed
        singleton = None
      }

It appears to explicitly wait for a member event to restart the identification process, which excludes recovery of a ClusterSingletonManager on the same node. Superficially, it would seem that the code could easily be modified to start the identification process independently of receiving a cluster down event. The advantage would be that it would be robust in the case of CSM failure, but it would mean that identification messages would be sent whilst the topology was changing in the general use case that the oldest node has gone down.

So, with that said, here are my questions:

What are the possible bad things that could happen if sending identify messages shortly after the Terminate message is received, and throughout the member state changes?
Is the behaviour to reconnect upon CSM failure desirable generally?
Would there be any interest in a PR to fix this, assuming that point 1 has no show-stoppers?

Thanks in advance

Topic		Replies	Views
Question regarding Singleton failure handling Akka Cluster	4	891	March 31, 2019
ClusterSingleton supervision of persistent actor Akka akka-cluster	2	637	August 17, 2019
Cluster losing all singletons Akka Cluster akka-cluster	4	1872	April 19, 2018
Handling errors with a cluster singleton Akka	1	718	June 28, 2018
Singleton cluster: a) multiple roles with proxy b) preStart? Akka akka-cluster	11	1302	May 15, 2020

ClusterSingletonManager restart on same node causes ClusterSingletonProxy failure

Related Topics