Unnecessary "errors" from Akka Remoting when a remote ActorSystem terminates


(Alan Burlison) #1

We are using Akka Remoting to spin up relatively short-lived remote ActorSystems on other Kubernetes nodes that process work and then exit. I know about Akka Cluster but our needs are simple and we don’t want a long-lived configuration.

Everything works fine and the slaves gracefully disconnect from the controller at the application level and then exit. However the controller spends time waiting for other slaves to exit and while it is doing that it gets a load of “error” messages about the slave that’s shut down cleanly:

13:47:56.578 ERROR [a.s.s.RestartWithBackoffFlow] Restarting graph due to failure
akka.stream.StreamTcpException: Tcp command [Connect(test-1.test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of test-1test-akka.alan.svc.cluster.local

There seems to be no way of stopping these messages (log-remote-lifecycle-events = off has no effect) and no way of telling a local ActorSystem that a remote ActorSystem has in fact gracefully terminated.

The other issue is how long the local ActorSystem holds the details of the exited remote - there may be potentially hundreds of them, and if references to remote ActorSystems are held indefinitely they are in effect a memory leak.


(Patrik Nordwall) #2

Is this with Artery TCP transport? Which version?


(Alan Burlison) #3

Sorry, I should have provided that info - yes it is with Artery TCP, Akka version 2.5.12

    artery.enabled = true
    artery.transport = tcp

(Patrik Nordwall) #4

Can you give 2.5.13 a try. It’s published but not announced yet. I think we fixed a few things that can be related to this.


(Alan Burlison) #5

Sure, will do - thanks! Will probably be Monday before I report back.


(Alan Burlison) #6

Yes, it does appear to be a bit less shouty. This is on the server end with a DEBUG loglevel, when the slave exits:

16:59:24.430 INFO  Association to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] having UID [666184963331773213] has been stopped. All messages to this UID will be delivered to dead letters. Reason: ActorSystem terminated
16:59:24.511 DEBUG Resolving cfe-test-1.cfe-test-akka.alan.svc.cluster.local before connecting
16:59:24.511 DEBUG Resolution request for cfe-test-1.cfe-test-akka.alan.svc.cluster.local from Actor[akka://cfecp/system/IO-TCP/selectors/$a/10#-629842451]
16:59:24.557 DEBUG Could not establish connection to [cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] due to java.net.UnknownHostException: cfe-test-1.cfe-test-akka.alan.svc.cluster.local
16:59:24.561 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of cfe-test-1.cfe-test-akka.alan.svc.cluster.local
16:59:24.562 WARN  Restarting graph due to failure. stack_trace:  (akka.stream.StreamTcpException: Tcp command [Connect(cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of cfe-test-1.cfe-test-akka.alan.svc.cluster.local)
16:59:24.562 DEBUG Restarting graph in 2010221697 nanoseconds
16:59:26.599 DEBUG Clear system message delivery of [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552#666184963331773213]
16:59:27.456 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], control stream] Upstream failed, cause: Association$OutboundStreamStopQuarantinedSignal$: 
16:59:27.456 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], message stream] Upstream failed, cause: Association$OutboundStreamStopQuarantinedSignal$: 
16:59:27.456 DEBUG Outbound control stream to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] was quarantined and stopped. It will be restarted if used again.
16:59:27.456 DEBUG Outbound message stream to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] was quarantined and stopped. It will be restarted if used again.

As we normally have Akka loglevel set to ERROR that’s a lot better.

However there still doesn’t seem to be any way of telling Akka that a remote ActorSystem is shutting down and that a disconnect is expected.


(Patrik Nordwall) #7

Thanks for trying that. I agree that it would be nice to make the normal client shutdown/disconnect silent (nothing > DEBUG). Please create an issue referring to this thread.