What akka remote is waiting for?

Sometimes the following steps may cause the akka to stuck:

  1. Simulate network unreachable event using iptables
  2. For the nodes that are unreachable, they are auto-downed,
  3. JVM restart for the 1st time.
  4. Flush any iptables rules to allow traffic
  5. sometimes some of the nodes after startup are stuck:
(akka.remote.Remoting) akka.remote.Remoting: Starting remoting
(akka.remote.Remoting) akka.remote.Remoting: Remoting started; listening on addresses :[akka.tcp://MyActorSystem@192.168.201.2:2551]
(akka.cluster.Cluster(akka://MyActorSystem)) akka.cluster.Cluster(akka://MyActorSystem): Cluster Node [akka.tcp://MyActorSystem@192.168.201.2:2551] - Starting up, Akka version [2.5.32] ...
(akka.cluster.Cluster(akka://MyActorSystem)) akka.cluster.Cluster(akka://MyActorSystem): Cluster Node [akka.tcp://MyActorSystem@192.168.201.2:2551] - Registered cluster JMX MBean [akka:type=Cluster]
(akka.cluster.Cluster(akka://MyActorSystem)) akka.cluster.Cluster(akka://MyActorSystem): Cluster Node [akka.tcp://MyActorSystem@192.168.201.2:2551] - Started up successfully
(akka.cluster.Cluster(akka://MyActorSystem)) akka.cluster.Cluster(akka://MyActorSystem): Cluster Node [akka.tcp://MyActorSystem@192.168.201.2:2551] - No seed-nodes configured, manual cluster join required, see https://doc.akka.io/docs/akka/current/cluster-usage.html#joining-to-seed-nodes
(akka.tcp://MyActorSystem@192.168.201.2:2551/system/cluster/core/daemon/downingProvider) akka.cluster.AutoDown: Don't use auto-down feature of Akka Cluster in production. See 'Auto-downing (DO NOT USE)' section of Akka Cluster documentation.

The thread dump shows:

"MyActorSystem-akka.remote.default-remote-dispatcher-33" #64 prio=5 os_prio=0 tid=0x00007f7358001000 nid=0x49cd waiting on condition [0x00007f739e9ec000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000086d893c0> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at akka.actor.ActorSystemImpl.findExtension(ActorSystem.scala:995)
        at akka.actor.ActorSystemImpl.registerExtension(ActorSystem.scala:1003)
        at akka.actor.ExtensionId.apply(Extension.scala:80)
        at akka.actor.ExtensionId.apply$(Extension.scala:79)
        at akka.serialization.SerializationExtension$.apply(SerializationExtension.scala:14)
        at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:31)
        at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:72)
        - locked <0x00000000898da808> (a scala.runtime.LazyRef)
        at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:72)
        at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:98)
        at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:1128)
        at akka.actor.Actor.aroundReceive(Actor.scala:539)
        at akka.actor.Actor.aroundReceive$(Actor.scala:537)
        at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:536)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)
        at akka.actor.ActorCell.invoke(ActorCell.scala:583)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
        at akka.dispatch.Mailbox.run(Mailbox.scala:229)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

   Locked ownable synchronizers:
        - None

It seems the akka remote is waiting for the lock on <0x00000000898da808> (a scala.runtime.LazyRef) to be released.

I couldn’t find any association from this locked tid from the rest of the thread dump file. Does it suggest that the thread locked is a subprocess from outside of JVM or some native process that the JVM called?

A restart of the JVM for a 2nd time will resolve this issue - but I can’t not tell if it is deterministic.

Note that Akka 2.5 has reached EOL quite a while ago (nov 2020), it would be good if you could try with a recent Akka version and see if the problem persists.