How to NOT use akka.cluster.auto-down-unreachable-after


(William) #1

Hi,

I use Akka Cluster for two reason:

  • ClusterSharding
  • Singleton

Singleton are used for “master/slave” for services who doesn’t support a real and beautiful sharding.

When I start my cluster, everything is fine. But if I kill -9 my leader (like machine down suddenly), there is not autodiscovery of a new leader if I set akka.cluster.auto-down-unreachable-after=off
With settings like akka.cluster.auto-down-unreachable-after=10s, everything is fine, old leader is removed and a new one take the lead.

But this parameter is discouraged for production environment.
And if I try to implement my own version, I will definitively do a far less better job than you.

So, my question is: I understand the risk like documentation said (https://doc.akka.io/docs/akka/current/cluster-usage.html), but is there a way do to “better” without ringing someone in middle of a night in case of machine shutdown ?

Thanks for you time !


(Dmitriy Zakomirnyi) #2

Yes, you can try Split Brain Resolver. It commercial Lightbend tool.

I’ve seen couple of open source alternatives on github.
Actually, you can try to implement your own. Check out this conference talk Scala Swarm 2017 | Niko Will: Akka cluster management and split brain resolution.


(Johan Andrén) #3

In the normal case (scaling down the cluster, rolling out an upgrade etc) you should strive for nodes gracefully leaving the cluster rather than abruptly killing the machines, this is done pretty much out of the box for 2.5 using the graceful shutdown - it’s triggered by a JVM shutdown hook.

While interesting, I think Nikos talk doesn’t actually mention how it is done.

One option, which might be surprising, is to have ops monitor the cluster for unreachability and do manual decisions about if a part of the cluster should be downed on partitions. Depends a bit on what kind of infrastructure you are running on, if it is the cloud maybe less of an option.


(William) #4

Thanks a lot to both of you, and nice video who give me things to think :slight_smile:

Ops decision is not wanted for the moment, and it’s why I’m looking for an automatic option :)