Self-termination for application running in Kubernetes

Hello all!

Given an akka-cluster node is running in k8s cluster, we can have split brain problem when applications automatically form cluster on startup, as described in this issue:
I’ll recap here to not force anyone to read whole thread there.
When k8s apiservice is partitioned from some k8s-nodes (machines, VMs) hosting akka-cluster nodes then it starts them in on available k8s-nodes, but can’t kill nodes on other side of partition. In some conditions (automatic cluster start is ON, all akka-cluster nodes were split or some part of it and SBR is used) this can leave nodes running behind partition and new ones on k8s-nodes available to apiservice.

I propose extension that will probe connectivity to kubernetes apiservice (using simplified approach of akka-cluster-kubernetes-discovery) and in case of lost connectivity it terminates actor system. Alternatively it could Down self-node using Cluster API.

It will be SBR agnostic, something that one will use probably together with SBR.

Do you think it’s anything that can be used generally or problem should be solved in some other way?

1 Like

I think it is worth considering. Some initial thoughts:

  • K8s master outage should result in deployed applications still running. This would mean the whole cluster would shut its self down
  • If the K8s master is on the smaller side of a partition there may not be enough resources left to re-create the cluster and the app would have remained available if the larger side didn’t down its self assuming external connectivity was still there