Not receiving terminated event from remote Actor

This is on Akka 2.6.8:

Our clustered Akka deployment has become quite unreliable and it seems that we are not receiving all terminated events from remote actors. Today I have found a pretty clear case in our log files.

  1. Node A (100.64.4.38:2551) gets removed from the cluster
  2. Node B gets notified about this:
    INFO akka.remote.artery.Association - Association to [akka://ClusterSystem@100.64.4.38:2551] having UID [2426136702514312762] has been stopped. All messages to this UID will be delivered to dead letters. Reason: ActorSystem terminated
  3. Node B starts watching an Actor on node A, but never receives a termination event. Therefore it assumes the actor is still there. Since I retry sending the message until I either get an acknowledge or a termination event, the functionality is broken.

Is there a bug in AKKA? Or is my understanding of AKKA incorrect that I would ALWAYS get a termination event, even if I start watching after the Actor has died or the node the actor was running on was removed from the cluster?

In Akka 2.6.x remote watch is disabled when when watching outside the Cluster. You probably have a warning about it in the logs. When nodeB previously was in the Cluster I understand that this is unfortunate behavior. Could you please create an issue and we’ll see what we can do about it.

Hi @patriknw, thanks for your quick response.

And yes, the node was part of the cluster and got removed. Did I understand correctly that the termination event will not be sent when I start watching after the node of the actor already got removed from the cluster?

Can I enable remote watching outside the cluster? (I assume, no) I could not find a WARN message about this though.

I think it is pretty important to have this part fixed. But I can create an issue for this.

I have also created an issue here: https://github.com/akka/akka/issues/29628

Did I understand correctly that the termination event will not be sent when I start watching after the node of the actor already got removed from the cluster?

From Akka 2.6.0 the default behavior is to only support remote watch of nodes within the cluster. Watch to non-members are ignored. I will look into the issue you created because I think that we could do something better when the node has previously been in the cluster.

If this problem is related to that you can try workaround by setting config:

akka.remote.use-unsafe-remote-features-outside-cluster = on

Thank you, @patriknw. This is really appreciated. I am now watching for cluster membership events myself myself to work around this. But this is also a little bit tricky and adds some complexity to an otherwise simple watch. I will also consider using the property you have mentioned.