Suitable usecase for Cluster Sharding

Hi,

What is the purpose of Cluster Sharding I have went through some guides but I didn’t get a clear picture on what it is.

What is the best usecase to use cluster sharding ?

Here,
I have a RunnerCluster
“akka.tcp://RunnerCluster@172.24.117.1:3001”,
“akka.tcp://RunnerCluster@172.24.121.75:3001”,
“akka.tcp://RunnerCluster@172.24.123.240:3001”

And I have a ClientCluster client which running on separate machine.

When I send start message from client to runnerCluster then anyone of the node in the RunnerCluster will receive the message.
After receiving this, my business logic is,

   if (myCondition.equals(sleep))
         create a SleepActor on 172.24.117.1:3001 
   else if ((myCondition.equals(weight))
         create a WeightActor on 172.24.121.75:3001 
   else
         create a FuncActor on 172.24.123.240:3001

How will I achieve the above usecase ?

Here I want to send mesaage to a specific remote machine if particular condition met. So remote machine means we need to know the IP address and port of the app running.

Cluster sharding says that by using a logical keys (in form of ShardId/EntityId pairs) it’s able to route our messages to the right entity without need to explicitly determine on which node in the cluster it lives.

So my question here is,

How will they know whom to send a message without the knowledge of IP and port ?
How routing works in cluster sharding ?
What is the role of ShardId, and EntityId?
By using this how will we route the message to specific remote machine ?

I have red so many guides but did’t have clear knowledge on this So please advice me on this…

When using Cluster Sharding you will not be in charge of where the entities (actors) are running. That is taken care of by Cluster Sharding. For a given entityId there will only be one single entity for that entityId. A message for a given entityId will be routed to the entity no matter from which node it is sent.

Shard is a group of entities, mosly for scalability purposes, to be able to manage millions of entities in more course grained groups.

Thanks patricknw!!

By using entityId how it is pointing the entity. Where will be the mapping happening between entity and entityId?

How entityId is allotted to the entity ?

You essentially define that yourself. One of the functions you define is extractEntityId which is how the the entityid is extracted from a message. Typically this some kind of key, but it can be determined however you want to determine it.

Thanks davidogren

I understand extractEntityId and extractShardId functions.
Say I have 3 nodes in a cluster and I created a shardRegion for Entity Actor

ClusterSharding.get(actorSystem).start(“entity”, EntityActor.props(), settings,
EntityMessage.messageExtractor()

The below code will call receive method in EntityActor class and match with message.
My doubt here is out of three node in a cluster which one will pick this message

EntityMessage.Command command = command();
shardRegion.tell(command, self());

Hope I am clear. If anything I am wrong please advice me
Thanks

From the message the shardId is defined via the extractShardId function. The location of a shards is decided by ClusterSharding automatically. By default a shard is allocated on first access to the node with least shards. There is also a re-balance process to move shards so that new nodes can share the load.

If you really want to be in charge of where to allocate the shards the allocation strategy is pluggable so you can have your own algorithm. That is mentioned in the documentation.

Thank you

I don’t know how take control on shard allocation. By default Shard coordinator does the shard allocation. If I want to take control overt there then where should I have my own algorithm or which one should I override.

I didn’t find anything regarding this in the documentation

Could you please help me out this

@anilkumble

Creating a good sharding algorithm is an interesting challenge in itself. Try to produce a uniform distribution, i.e. same amount of entities in each shard. As a rule of thumb, the number of shards should be a factor ten greater than the planned maximum number of cluster nodes. Less shards than number of nodes will result in that some nodes will not host any shards. Too many shards will result in less efficient management of the shards, e.g. rebalancing overhead, and increased latency because the coordinator is involved in the routing of the first message for each shard. The sharding algorithm must be the same on all nodes in a running cluster. It can be changed after stopping all nodes in the cluster.

The above excerpt is from the documentation and can be found here Classic Cluster Sharding • Akka Documentation

And the API can be found here Akka 2.9.0 - akka.cluster.sharding.ClusterSharding

Hi @patriknw,

I went through the document but I didn’t get idea on how to be in charge of where to allocate the shards.

I ran the sample example on different terminal and I observes that by default it creates the entity equally among all the nodes in the cluster.

If I want to take control on shard allocation means what should I supposed to do

Could you please share your thoughts

Thanks

Thanks @hungai

I went through this documentation but I didn’ find anything