Unexpected ShardAllocationStrategy behavior

Hi,

in our application we noticed that the ShardAllocationStrategy sometimes behaves in an unexpected manner. We’re using Akka 2.5.27 (yes we’ll upgrade to 2.5.30).

As an example, suppose you have a cluster of 4 members, all of which participate in sharding. Sharding is configured to use the LeastShardAllocationStrategy. Then you send 4 messages to the sharding actor ref, each targeting a different shard. The behavior that I would expect is that 4 new shards will be created, one on each member.

However, that seems not always to happen in practice. Sometimes, all shards are allocated on the first member in the cluster. Only after some time (seconds), the shard reallocation policy will distribute the shards evenly across the cluster. We want to avoid such behavior because shard reallocation is (somewhat) disruptive.

Further investigation shows that the allocateShard method can be called concurrently with other allocateShard invocations. If that is the case, then obviously, the currentShardAllocations parameter that is passed is obsolete because it is not aware of the other invocations.

def allocateShard(
    requester: ActorRef,
    shardId: ShardId,
    currentShardAllocations: Map[ActorRef, immutable.IndexedSeq[ShardId]]): Future[ActorRef]

In the example, the allocateShard method would basically be called 4 times with an empty map of currentShardAllocations. The result being that the policy would just choose 4 times the first member since they all have the same amount of shards.

My question is if this is an undocumented feature or a bug?

Note: I can see two workarounds. First, if the allocateShard method returns a completed future then the behavior seems not to manifest itself. I’d prefer not to rely on this approach because it feels very much relying on the implementation of futures and thread scheduling. But it is good to be aware of this if you would like to reproduce. Second (and this is what we’ll be doing), we’ll have the policy implemented by an actor that (1) remembers all allocations it did and (2) serializes all requests sent to it. Because the actor remembers what it did, it can correct the provided currentShardAllocations with allocations that it previously did but that are not (yet) part of the state.

thanks,

Bert

Hi @bert ,

Sounds like it would be nice to improve that, although I’m not sure we want to sequence all allocation requests by default. Thanks for asking here first. Next step would be if you can create an issue and we can start thinking about what solution would be possible.

Your workaround with a custom allocation strategy that is using an Actor is a valid way of solving it in your application.

Thanks for the reply.

Issue is created => https://github.com/akka/akka/issues/28798