Problems deploying Lagom to Production

I’m trying to troubleshoot these kinds of errors:

akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://application/system/sharding/CaesarProcessRegistryEntity#2053400028]] after [5000 ms]. Message of type [com.lightbend.lagom.scaladsl.persistence.CommandEnvelope]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
	at akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:671)
	at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:692)
	at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:199)
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
	at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
	at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:334)
	at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:285)
	at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:289)
	at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:241)
	at java.lang.Thread.run(Thread.java:748)

I’ve seen several mention of this kind of error before.

The above is happening in a production deployment of 5 Lagom 1.5.1 services + 1 play web-gateway to Kubernetes.
I use JDBC persistence with AWS RDS(MySQL) as this helps us avoid deploying Cassandra to Kubernetes.
Similarly, I’ve avoided depending on Pub/Sub to avoid deploying Kafka.

Since there are 6 services, I configured:

akka.cluster.min-nr-of-members = 6

An Akka cluster is created by 1 of the 6 nodes:

- Initiating new cluster, self-joining [akka.tcp://application@10.128.52.151:2552]. Other nodes are expected to locate this cluster via continued contact-point probing.

and the other 5 are welcomed into the cluster. All nodes eventually reach a consensus about which node is the “coordinator” and the leader changes the state of all nodes to “Up”.
At this point, all indications are that the coordinator node address is: akka.tcp://application@10.128.52.151:2552

Here, the registration is to the wrong coordinator, so it makes sense that it is buffered.

2019 09 04 16:07:13,496 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.495UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - CaesarProcessRegistryMonitor: Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@10.128.83.95:2552/), Path(/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@10.128.83.95:2552, status = Up)] is reachable.]
2019 09 04 16:07:13,496 DEBUG akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.495UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - CaesarProcessRegistryMonitor: Coordinator moved from [akka.tcp://application@10.128.83.95:2552] to [akka.tcp://application@10.128.52.152:2552]
2019 09 04 16:07:13,496 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.496UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - CaesarProcessRegistryMonitor: Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@10.128.52.152:2552/), Path(/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@10.128.52.152:2552, status = Up)] is reachable.]
2019 09 04 16:07:13,496 DEBUG akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.496UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - CaesarProcessRegistryMonitor: Coordinator moved from [akka.tcp://application@10.128.52.152:2552] to [akka.tcp://application@10.128.52.151:2552]

Here, the registration is to the correct coordinator. Why does it fail?

2019 09 04 16:07:13,500 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.500UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - CaesarProcessRegistryMonitor: Trying to register to coordinator at [ActorSelection[Anchor(akka://application/), Path(/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@10.128.52.151:2552, status = Up)] is reachable.]
2019 09 04 16:07:13,503 DEBUG akka.cluster.singleton.ClusterSingletonProxy [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singletonProxy, akkaTimestamp=16:07:13.503UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - Trying to identify singleton at [akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton]

2019 09 04 16:07:13,506 DEBUG akka.cluster.Cluster(akka://application) [{akkaSource=akka.cluster.Cluster(akka://application), akkaTimestamp=16:07:13.506UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-4}] - Cluster Node [akka.tcp://application@10.128.52.151:2552] - Receiving gossip from [UniqueAddress(akka.tcp://application@10.128.83.92:2552,-1743993024)]
2019 09 04 16:07:13,507 INFO akka.cluster.Cluster(akka://application) [{akkaSource=akka.cluster.Cluster(akka://application), akkaTimestamp=16:07:13.507UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - Cluster Node [akka.tcp://application@10.128.52.151:2552] - event ReachabilityChanged()
2019 09 04 16:07:13,508 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryEntityCoordinator, akkaTimestamp=16:07:13.508UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - Singleton manager starting singleton actor [akka://application/system/sharding/CaesarProcessRegistryEntityCoordinator/singleton]
2019 09 04 16:07:13,509 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryEntityCoordinator, akkaTimestamp=16:07:13.509UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - ClusterSingletonManager state change [Start -> Oldest]
2019 09 04 16:07:13,510 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton, akkaTimestamp=16:07:13.510UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-17}] - Singleton manager starting singleton actor [akka://application/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton]
2019 09 04 16:07:13,510 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton, akkaTimestamp=16:07:13.510UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-17}] - ClusterSingletonManager state change [Start -> Oldest]
2019 09 04 16:07:13,511 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator, akkaTimestamp=16:07:13.511UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Singleton manager starting singleton actor [akka://application/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton]
2019 09 04 16:07:13,511 INFO akka.cluster.singleton.ClusterSingletonManager [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator, akkaTimestamp=16:07:13.511UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - ClusterSingletonManager state change [Start -> Oldest]
2019 09 04 16:07:13,562 INFO com.lightbend.lagom.internal.persistence.cluster.ClusterStartupTaskActor [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton/readSideGlobalPrepare-CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.562UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Executing cluster start task readSideGlobalPrepare-CaesarProcessRegistryMonitor.
2019 09 04 16:07:13,564 INFO com.lightbend.lagom.internal.persistence.cluster.ClusterStartupTaskActor [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton/readSideGlobalPrepare-CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.564UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Cluster start task readSideGlobalPrepare-CaesarProcessRegistryMonitor done.
2019 09 04 16:07:13,575 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:13.575UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-14}] - Received Get for key [CaesarProcessRegistryEntityCoordinatorState]
2019 09 04 16:07:13,579 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:13.578UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-14}] - Received Get for key [CaesarProcessRegistryMonitorCoordinatorState]
2019 09 04 16:07:13,586 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:13.585UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.ddata.protobuf.ReplicatorMessageSerializer] for message [akka.cluster.ddata.Replicator$Internal$Read]
2019 09 04 16:07:13,740 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:13.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-14}] - CaesarProcessRegistryMonitor: Trying to register to coordinator at [ActorSelection[Anchor(akka://application/), Path(/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@10.128.52.151:2552, status = Up)] is reachable.]
2019 09 04 16:07:14,267 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:14.267UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Received gossip status from [akka.tcp://application@10.128.74.103:2552], chunk [1] of [1] containing []
2019 09 04 16:07:14,429 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:14.429UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Received gossip status from [akka.tcp://application@10.128.83.95:2552], chunk [1] of [1] containing []
2019 09 04 16:07:14,429 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/ddataReplicator, akkaTimestamp=16:07:14.429UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Received gossip status from [akka.tcp://application@10.128.83.95:2552], chunk [1] of [1] containing []
2019 09 04 16:07:14,445 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/ddataReplicator, akkaTimestamp=16:07:14.445UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - Received gossip status from [akka.tcp://application@10.128.74.103:2552], chunk [1] of [1] containing []
2019 09 04 16:07:14,489 DEBUG akka.cluster.Cluster(akka://application) [{akkaSource=akka.cluster.Cluster(akka://application), akkaTimestamp=16:07:14.489UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-4}] - Cluster Node [akka.tcp://application@10.128.52.151:2552] - Receiving gossip from [UniqueAddress(akka.tcp://application@10.128.70.78:2552,-1958568656)]
2019 09 04 16:07:14,490 INFO akka.cluster.Cluster(akka://application) [{akkaSource=akka.cluster.Cluster(akka://application), akkaTimestamp=16:07:14.490UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-22}] - Cluster Node [akka.tcp://application@10.128.52.151:2552] - event ReachabilityChanged()
2019 09 04 16:07:14,510 DEBUG akka.cluster.singleton.ClusterSingletonProxy [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singletonProxy, akkaTimestamp=16:07:14.510UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - Trying to identify singleton at [akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton]
2019 09 04 16:07:14,510 INFO akka.cluster.singleton.ClusterSingletonProxy [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singletonProxy, akkaTimestamp=16:07:14.510UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-22}] - Singleton identified at [akka://application/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singleton/singleton]
2019 09 04 16:07:14,510 DEBUG akka.cluster.singleton.ClusterSingletonProxy [{akkaSource=akka.tcp://application@10.128.52.151:2552/user/readSideGlobalPrepare-CaesarProcessRegistryMonitor-singletonProxy, akkaTimestamp=16:07:14.510UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-22}] - Sending buffered messages to current singleton instance
2019 09 04 16:07:14,602 INFO akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:14.602UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - ShardCoordinator was moved to the active state State(Map(),Map(),Set(),Set(),false)
2019 09 04 16:07:14,603 INFO akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryEntityCoordinator/singleton/coordinator, akkaTimestamp=16:07:14.602UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-14}] - ShardCoordinator was moved to the active state State(Map(),Map(),Set(),Set(),false)
2019 09 04 16:07:15,012 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:15.011UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.ddata.protobuf.ReplicatorMessageSerializer] for message [akka.cluster.ddata.Replicator$Internal$Status]
2019 09 04 16:07:15,270 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryEntityCoordinator/singleton/coordinator, akkaTimestamp=16:07:15.270UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - ShardRegion registered: [Actor[akka://application/system/sharding/CaesarProcessRegistryEntity#2053400028]]
2019 09 04 16:07:15,280 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:15.280UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-4}] - Received Update for key [CaesarProcessRegistryEntityCoordinatorState]
2019 09 04 16:07:15,292 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:15.291UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.ddata.protobuf.ReplicatorMessageSerializer] for message [akka.cluster.ddata.Replicator$Internal$Write]
2019 09 04 16:07:15,300 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:15.300UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.ddata.protobuf.ReplicatedDataSerializer] for message [akka.cluster.ddata.LWWRegister]
2019 09 04 16:07:15,305 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:15.305UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.sharding.protobuf.ClusterShardingMessageSerializer] for message [akka.cluster.sharding.ShardCoordinator$Internal$State]
2019 09 04 16:07:15,482 DEBUG akka.serialization.Serialization(akka://application) [{akkaSource=akka.serialization.Serialization(akka://application), akkaTimestamp=16:07:15.481UTC, sourceActorSystem=application, sourceThread=application-akka.remote.default-remote-dispatcher-32}] - Using serializer [akka.cluster.protobuf.ClusterMessageSerializer] for message [akka.cluster.GossipStatus]
2019 09 04 16:07:15,740 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:15.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-17}] - ShardRegion registered: [Actor[akka://application/system/sharding/CaesarProcessRegistryMonitor#1863058464]]
2019 09 04 16:07:15,740 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:15.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-4}] - CaesarProcessRegistryMonitor: Trying to register to coordinator at [ActorSelection[Anchor(akka://application/), Path(/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@10.128.52.151:2552, status = Up)] is reachable.]
2019 09 04 16:07:15,740 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:15.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-17}] - Received Update for key [CaesarProcessRegistryMonitorCoordinatorState]
2019 09 04 16:07:15,830 DEBUG akka.io.TcpListener [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/IO-TCP/selectors/$a/0, akkaTimestamp=16:07:15.829UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-17}] - New connection accepted
2019 09 04 16:07:16,109 DEBUG akka.io.TcpListener [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/IO-TCP/selectors/$a/0, akkaTimestamp=16:07:16.109UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - New connection accepted
2019 09 04 16:07:16,308 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryEntityCoordinator/singleton/coordinator, akkaTimestamp=16:07:16.308UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - The coordinator state was successfully updated with ShardRegionRegistered(Actor[akka://application/system/sharding/CaesarProcessRegistryEntity#2053400028])
2019 09 04 16:07:16,421 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/ddataReplicator, akkaTimestamp=16:07:16.421UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-2}] - Received gossip status from [akka.tcp://application@10.128.83.95:2552], chunk [1] of [1] containing []
2019 09 04 16:07:16,765 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:16.765UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-14}] - The coordinator state was successfully updated with ShardRegionRegistered(Actor[akka://application/system/sharding/CaesarProcessRegistryMonitor#1863058464])

At this point, the registration requests accumulate (i.e., # of buffered messages grows: 1, 2, …)

2019 09 04 16:07:16,767 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:16.766UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-24}] - CaesarProcessRegistryMonitor: Retry request for shard [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] homes from coordinator at [Actor[akka://application/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator#1053517018]]. [1] buffered messages.
2019 09 04 16:07:16,767 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:16.767UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-3}] - GetShardHome [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] request ignored, because not all regions have registered yet.
2019 09 04 16:07:17,394 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:17.393UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-22}] - Received gossip status from [akka.tcp://application@10.128.83.92:2552], chunk [1] of [1] containing [CaesarProcessRegistryEntityCoordinatorState, CaesarProcessRegistryMonitorCoordinatorState]
2019 09 04 16:07:17,748 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:17.748UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - CaesarProcessRegistryMonitor: Retry request for shard [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] homes from coordinator at [Actor[akka://application/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator#1053517018]]. [1] buffered messages.
2019 09 04 16:07:17,748 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:17.748UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-15}] - GetShardHome [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] request ignored, because not all regions have registered yet.
2019 09 04 16:07:19,139 DEBUG akka.cluster.ddata.Replicator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/replicator, akkaTimestamp=16:07:19.139UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-4}] - Received gossip status from [akka.tcp://application@10.128.70.78:2552], chunk [1] of [1] containing [CaesarProcessRegistryEntityCoordinatorState, CaesarProcessRegistryMonitorCoordinatorState]
2019 09 04 16:07:19,740 WARN akka.cluster.sharding.ShardRegion [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitor, akkaTimestamp=16:07:19.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-24}] - CaesarProcessRegistryMonitor: Retry request for shard [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] homes from coordinator at [Actor[akka://application/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator#1053517018]]. [1] buffered messages.
2019 09 04 16:07:19,740 DEBUG akka.cluster.sharding.DDataShardCoordinator [{akkaSource=akka.tcp://application@10.128.52.151:2552/system/sharding/CaesarProcessRegistryMonitorCoordinator/singleton/coordinator, akkaTimestamp=16:07:19.740UTC, sourceActorSystem=application, sourceThread=application-akka.actor.default-dispatcher-24}] - GetShardHome [org.open.caesar.service.processRegistry.CaesarProcessRegistryEvent] request ignored, because not all regions have registered yet.

I observe similar accumulation of registration requests across all 5 lagom services.

For a Lagom service,

  • what determines what an Akka ShardRegion & an Akka cluster role are?

  • what determines what an Akka coordinator is?

  • is there a way to delay “readiness” until all shard registration requests have been successfully handled?

  • Nicolas.

FYI: I managed to work around this problem with a different cluster formation strategy:

akka.cluster.min-nr-of-members = 1
lagom.cluster.bootstrap.enabled = false
lagom.cluster.join-self = true

This doesn’t answer the questions above but it helped me get over this hurdle.

  • Nicolas.