Supervisor restart ends in an inactive actor: messages sent to dead letter

hmf · September 30, 2020, 5:34pm

I am experimenting with a tiny cluster with 1-2 Producer seed-nodes:

      "akka://WorkStealSystem@127.0.0.1:25251",
      "akka://WorkStealSystem@127.0.0.1:25252"

I have another Consumer actor that subscribes to a Receptionist, selects one of these Producers and they exchange messages repeatedly. At one point I kill the only working seed (25251). This is detected by the Consumer and results in the following message:

17:50:54.939 [WorkStealSystem-work-dispatcher-16] INFO  concurrency.Utils$ - concurrency.Consumer$ @processJobs: Selecting producer.
17:50:54.942 [WorkStealSystem-work-dispatcher-16] ERROR concurrency.Utils$ - Supervisor RestartSupervisor saw failure: bound must be positive
java.lang.IllegalArgumentException: bound must be positive
	at java.util.Random.nextInt(Random.java:388) ~[?:?]
	at scala.util.Random.nextInt(Random.scala:96) ~[scala-library-2.13.3.jar:?]
	at concurrency.Consumer$.selectRandomly(Consumer.scala:66) ~[classes/:?]
	at concurrency.Consumer$.processJobs$$anonfun$1(Consumer.scala:102) ~[classes/:?]
	at akka.actor.typed.internal.BehaviorImpl$DeferredBehavior$$anon$1.apply(BehaviorImpl.scala:119) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.Behavior$.start(Behavior.scala:168) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.Behavior$.interpret(Behavior.scala:275) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.InterceptorImpl$$anon$2.apply(InterceptorImpl.scala:57) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.RestartSupervisor.aroundReceive(Supervision.scala:261) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.InterceptorImpl.receive(InterceptorImpl.scala:85) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.Behavior$.interpret(Behavior.scala:274) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.Behavior$.interpretMessage(Behavior.scala:230) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.handleMessage(ActorAdapter.scala:129) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.$anonfun$adaptAndHandle$2(ActorAdapter.scala:178) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.$anonfun$adaptAndHandle$2$adapted(ActorAdapter.scala:178) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.withSafelyAdapted(ActorAdapter.scala:189) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.handle$1(ActorAdapter.scala:178) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.adaptAndHandle(ActorAdapter.scala:183) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.$anonfun$aroundReceive$2(ActorAdapter.scala:97) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.$anonfun$aroundReceive$2$adapted(ActorAdapter.scala:95) ~[akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.withSafelyAdapted(ActorAdapter.scala:189) [akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.typed.internal.adapter.ActorAdapter.aroundReceive(ActorAdapter.scala:95) [akka-actor-typed_2.13-2.6.9.jar:2.6.9]
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:577) [akka-actor_2.13-2.6.9.jar:2.6.9]
	at akka.actor.ActorCell.invoke(ActorCell.scala:547) [akka-actor_2.13-2.6.9.jar:2.6.9]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) [akka-actor_2.13-2.6.9.jar:2.6.9]
	at akka.dispatch.Mailbox.run(Mailbox.scala:231) [akka-actor_2.13-2.6.9.jar:2.6.9]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

The Consumer then restarts and after initialization I get the error:

17:50:54.956 [WorkStealSystem-akka.actor.default-dispatcher-6] INFO  akka.remote.artery.Association - Association to [akka://WorkStealSystem@127.0.0.1:25251] having UID [4584493602840708540] has been stopped. All messages to this UID will be delivered to dead letters. Reason: ActorSystem terminated
17:

The actor then gets a time-out as expected and I get this output:

17:50:56.948 [WorkStealSystem-work-dispatcher-16] INFO  concurrency.Utils$ - concurrency.Consumer$: @Init Unexpected consumer message = |ResponseFailure(java.util.concurrent.TimeoutException: Ask timed out on [Actor[akka://WorkStealSystem@127.0.0.1:25251/user/producer#-83987191]] after [3000 ms]. Message of type [concurrency.Consumer$Available]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.)|
17:51:04.838 [WorkStealSystem-akka.actor.default-dispatcher-6] WARN  akka.remote.artery.Association - Outbound control stream to [akka://WorkStealSystem@127.0.0.1:25252] failed. Restarting it. akka.remote.artery.OutboundHandshake$HandshakeTimeoutException: Handshake with [akka://WorkStealSystem@127.0.0.1:25252] did not complete within 20000 ms

At this point the Consumer justy sits theire and does nothing. If I launch a new Consumer it will find the Producers and work correctly.

So my question is: how can I revive the Consumer? Note that I am still using the same behaviour but get no new updates on the cluster members. I assume that this is because all messages are sent to the dead-letter mailbox.

TIA

hmf · October 1, 2020, 8:51am

I think I have figured this out. Once the only peer-node goes down, I have in effect a split pair of clusters when a seed goes up again. The questions then are:

Detecting when the cluster split
Rejoining the clusters

Has anyone had to deal with such issues? If so, is their any literature or example about this?

EDIT:

Found this: Nodes not rejoining after cluster spilt
TIA

patriknw · October 3, 2020, 7:00pm

The IllegalArgumentException must be a bug. Could you create an issue.

hmf · October 6, 2020, 8:33am

@patriknw Thanks for taking the time to look at this but it is not an Akka bug. This is a bug on my side. I am using it to test what happens when the actor fails with an exception - I expect it to reconnect to the cluster and keep on working.

It is restarted correctly. But as I pointed out, the issue is that the actor now becomes the sole leader of its own cluster. And it does not seem to reconnect to the cluster when I restart the seed nodes. In effect I have split clusters.

From what I see, the only option I have is to shutdown the actor. Am I correct in saying this? Note that I have 2 seeds. Seeing as messages from one seed are directed to the dead letter mail box, I had assumed that it would still connect to the 2nd seed.

EDIT: ok I have figure out the issue - the first seed node sets the “master” cluster. Restarting seed node 1 will crate an new cluster that cannot be rejoined by the now split cluster.

TIA

patriknw · October 13, 2020, 3:39pm

Yes, there are some considerations to be aware of when it comes to seed node configuration. The importance of the first entry, and that you should “never” use only one entry in the list. I’d also recommend Akka Cluster Bootstrap. See more in Cluster Usage • Akka Documentation

hmf · October 13, 2020, 5:28pm

@patriknw Thanks.

Topic		Replies	Views
system/cluster/core/daemon/joinSeedNodeProcess doesn't match an active actor Akka java , akka-cluster	2	274	October 25, 2023
Akka Cluster Comunication Problem Akka akka	1	336	August 20, 2023
Distributed Workers Sample - Confirmation to WorkManager leads to dead-letters for long-running work items Akka akka , akka-typed , scala , akka-cluster	2	684	July 3, 2020
ClusterSingleton supervision of persistent actor Akka akka-cluster	2	642	August 17, 2019
Why isn't my typed actor being restarted by its guardian? Akka akka-typed , scala	6	1033	June 21, 2018

Supervisor restart ends in an inactive actor: messages sent to dead letter

Related Topics