Cluster Sharding, slow handover during rolling restart

We have the following situation:

  1. We have a cluster of 3 nodes
  2. The cluster is receiving numerous GET requests to get the information of an entity
  3. A rolling restart starts
  4. The 2 new nodes are spawn (5 nodes in the cluster)
  5. When those 2 new nodes are healthy (based on AkkaManagementHttp), 2 oldest nodes are downed with coordinated-shutdown
  6. 1 new node is spawned (4 nodes in the cluster)
  7. Latest old node is shut down.

What we see is that during this rolling restart, we see a bunch of GET requests having 4+ seconds latency (compared to the usual 30ms).

  • We know that shutting down the oldest node is not ideal due to the Singleton handovers, but this cannot be configured atm.
  • We are using akka 2.5.19

Is this delay caused by the ShardCoordinator handover? How can this latency increase be prevented?

Hi @milanvdm was your problem solved later on? If yes, how?