Canary type of shard allocation strategy

We are thinking of a canary type of deployment to our kubernetes cluster to mitigate the effect of errors introduced when deploying a new version of our application. This seems to be doable for the api-role nodes we have. Kubernetes or the loadbalancing proxy can probably allow a small percentage of traffic to a version upgraded api-role node.

Is there a way to do this for akka-persistence type of nodes? Is there a shard allocation strategy wherein we can pick one shard to transfer to a newly deployed upgraded node? Secondly, what mechanism can you recommend to handle the case of rolling back a deployment that persists new event classes? The way I can think of is deploying the event handlers ahead of the commands that trigger persisting of the new events. Is this what you recommend also?

There is nothing out of the box for canary deployments but you could potentially implement a scheme yourself by defining a custom ShardAllocationStrategy. See docs: Cluster Sharding • Akka Documentation

Doing a two step deploy where you first roll out any new or changed event and message classes and updated serializers for them and then a second one where you start using them is a good idea to avoid problems. Both in case you roll back and while rolling/having one or a few canary nodes that other, older version nodes, in the cluster may need to communicate with. A little bit about that in the docs here: Serialization • Akka Documentation

Thanks. Can I use ExternalShardAllocationStrategy to say define a deployment phase like “Canary” wherein I will only deploy one shard to the newly deployed node? And when I’m satisfied the deployment phase can move on to “Normal” and I can shift to the Least Used allocation strategy?

Yes, I think a custom implementation of ShardAllocationStrategy or custom logic interacting with ExtrenalShardAllocationStrategy can achieve that.

The node app version, configured through akka.cluster.app-version (possibly from kubernetes deployment using Rolling Updates • Akka Management)
and available in the cluster API through Member#appVersion, might be useful to detect when a new version is canary-rolled-out.

Not saying that it will be easy though, probably lots of tricky aspects to solve once you dig into the problem.