Starting Lagom in development mode is extremely slow

I have a small-ish lagom application with around 14 services. I am running this project using the sbt runAll task.

While starting, i get a whole lot of warnings that look like this:

akka.cluster.sharding.ShardRegion [, akkaSource=akka.tcp://service-application@, sourceActorSystem=service-application, akkaTimestamp=13:17:44.687UTC] - MyEventProcessor: Trying to register to coordinator at [ActorSelection[Anchor(akka://service-application/), Path(/system/sharding/MyEventProcessorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [4] buffered messages. [Coordinator [Member(address = akka.tcp://service-application@, status = Up)] is reachable.]

or this:

15:18:56.077 [warn] akka.cluster.sharding.ShardRegion [, akkaSource=akka.tcp://service@, sourceActorSystem=service, akkaTimestamp=13:18:56.077UTC] - kafkaProducer-my-topic: Retry request for shard [...] homes from coordinator at [Actor[akka://service/system/sharding/kafkaProducer-my-topicCoordinator/singleton/coordinator#409734202]]. [1] buffered messages.

these errors completely flood the console, so there are way to many of them to count.

there is also this occasional warning:

15:19:06.168 [warn] akka.cluster.Cluster(akka://service-application) [, akkaTimestamp=13:19:06.165UTC, akkaSource=akka.cluster.Cluster(akka://service-application), sourceActorSystem=service-application] - Cluster Node [akka.tcp://service-application@] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [2894] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC.

Furthermore, i also get Errors during the startup:

15:18:49.809 [error] akka.cluster.sharding.PersistentShardCoordinator [, akkaSource=akka.tcp://service-application@, sourceActorSystem=service-application, akkaTimestamp=13:18:49.784UTC] - Persistence failure when replaying events for persistenceId [/sharding/MyEventProcessorCoordinator]. Last known sequence number [0]
akka.persistence.RecoveryTimedOut: Recovery timed out, didn't get snapshot within 30000 milliseconds


com.lightbend.lagom.internal.persistence.cassandra.NoServiceLocatorException: Timed out after 2 seconds while waiting for a ServiceLocator. Have you configured one?


15:28:17.633 [error] akka.cluster.sharding.PersistentShardCoordinator [, akkaTimestamp=13:28:12.779UTC, akkaSource=akka.tcp://service-application@, sourceActorSystem=service-application] - Persistence failure when replaying events for persistenceId [/sharding/MyEntityCoordinator]. Last known sequence number [0]
akka.pattern.CircuitBreaker$$anon$13: Circuit Breaker Timed out.


akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://service-application/user/cassandraOffsetStorePrepare-singleton/singleton/cassandraOffsetStorePrepare#-194060245]] after [60000 ms]. Message of type [com.lightbend.lagom.internal.persistence.cluster.ClusterStartupTaskActor$Execute$] was sent by [Actor[akka://service-application/user/cassandraOffsetStorePrepare-singleton/singleton/cassandraOffsetStorePrepare#-194060245]]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.

These Errors are the most common while starting, and are not service specific, i.e. the service that is referenced in these errors is not always the same.

After waiting for the (Services started, press enter to stop and go back to the console...) message,
these errors and warnings continue to occur, until they slowly stop over the next minute or so. After that, the application runs correctly. This whole process (from running runAll to the point where i am able to use the services) easily takes up to five minutes every time. The hot reload of services also fails fairly often, so that i am forced to restart this process.

In contrast, building the application in prod mode (either as debian packages or docker containers), deploying and starting it takes at most 30 seconds for the whole process.

More Information: I am not using the internal cassandra and kafka service, but rather have an external service running (though on the same, local machine.) I am using the lagom-sbt-plugin in version 1.5.1

Is this the normal behavior of the runAll task? can i speed it up somehow?


What’s the memory usage on the machine? Is it swapping?

14 services are a lot for one project. In development mode, all services run in the same JVM as sbt, so you’ll need to launch it with plenty of heap allocated, and might need to adjust other JVM settings to get good performance.

I doubled the heap space, and now it seems to start far more reliably, Thank you