The ShardCoordinator was unable to get an initial state

Hi guys

I recently updated my akka cluster to use a homebrewed implementation of the KeepMajority split brain resolver. After this update, I’m seeing an increasing amount of “The ShardCoordinator was unable to get an initial state” errors in my logs

Previously I used to have a specific shutdown hook:

val cluster = Cluster(actorSystem)
CoordinatedShutdown(acotrSystem).addJvmShutdownHook {

but according to the docs (if I understood them correctly) I should not need to do this, the CoordinatedShutdown module should handle nodes leaving gracefully

Is this because I am now using a custom downing-provider-class? Do I need to specifically handle members exiting in the downing provider? (Currently I am listening to the MemberRemoved() signal where I schedule a quorum check after a 7 second stable-after period)


You’re right that you don’t need to call leave, Coordinated shutdown should handle this.

A downing provider shouldn’t need to be involved in any graceful shutdown, only when nodes crash or if there are network partitions. Typically that means you’ll react to Unreachable rather than Member* events.

I’d recommend you either use an existing downing provider as there are many edge cases. E.g