Akka sharding and persistence recovery

I am trying to figure out how akka sharding and persistence play together.
Suppose there is an SomeActor with persistenceId = "SomeActor-"+self.path.name and cassandra journal. Also we start ClusterSharding with extractEntityId equals to some business entity id.

What is interesting is how the recovery process will occur in case cluster has been shutdowned and then restarted.
Will it try to recover all actors state at once after restart? But in that case akka-persistence needs to know persistenceId to replay events from cassandra, but how we get persistenceId if it bounds to actor path itself (and actor is created when first message comes ??? ).I am just confused with it.

Thanks in advance for explanation!

Surely using self.path.name as part of the persistenceId will result in a race condition, as the name of a persistent actor is usually the persistenceId!

I don’t get the point.
self.path.name is used to distinguish entity sharding actors.
In akka sharding the name of each entity will be different e.g.
akka://local/system/sharding/some-actor/1/34f9a1d2-3a16-5fe0-96b0-73cf397106d2

so normally your persistenceId would contain the “business entity id” so that akka persistence can find the actor that is responsible for this business entity.

So if you have an account 766FF428-01E8-4D91-AB5F-261541B11767 then you’d have a persistenceId like "account-766FF428-01E8-4D91-AB5F-261541B11767" and your messages should have a field that allows the "extractEntityId" to find "766FF428-01E8-4D91-AB5F-261541B11767"

Hi there,

I am not exactly sure what you mean with

Therefore, I will provide a short explanation of how the recovery process works, Sharding or not. As the Sharding is merely, a complex, mechanism to spread out the entities across multiple nodes, ensuring each unique entity is started only once.

Will it try to recover all actors state at once after restart?

No, it will not recover all actors at once. Whenever you send a message to a particular entity, it will only recover that entity.

To illustrate this, let’s examine an example using a pseudo financial application. Imagine you had a few Account entities loaded in memory: John, Jane, Jack, and Jill. You experience an issue of sorts and your entire cluster ensemble goes down. And when you’ve started it back up, Jill sends a “GetCurrentBalance” message - then Jill’s account will be recovered, however, the other 3 won’t.

But in that case akka-persistence needs to know persistenceId to replay events from cassandra, but how we get persistenceId if it bounds to actor path itself (and actor is created when first message comes ??? ).I am just confused with it.

The persistenceId is just a unique ID that you assign to an entity (an actor you wish to be stored permanently). Using my example from above, if you were using a relational table you might have an accounts table that has a id long as the primary key.

So, in order to send a message to a particular Account you will have to use the UID that you previously assigned to it. The persistence plugin will load all of the events for that account from the database, and replay them in the order they were added.

Hope this clarifies things.

1 Like

@chmodas Yeah I got the point, recovery happens on message sent. Thanks for detailed explanations