Relationship among entities

Hi,

I want to see if there is any known pattern for this problem and if a solution like below is valid:

First of all, I have to mention that I am using persistence entity from Lagom but I thought it’s more relevant to here.

Problem:
Let’s assume that we have a persistent entity type A with a unique id that can be looked up from registry with that unique id (id) and we have a couple of them. When we create each of these they get another id (_ID) that is not unique. The question is how can I have a mapping from {_ID: List(id)}. The CQRS solution is to have a read-side processor and create a table that is eventually consistent with that mapping. The problem with this approach is WHAT IF there is a request that comes when the table hasn’t yet been updated and we lose reactivity.

A Solution: (which I need validation for)
Having another persistent entity that contains the mapping as a state and we get immediate consistency with that.

Question: Is this solution a common pattern? What are the problems with this approach? Does it really guarantee immediate consistency? What is a common pattern for this problem?

Thanks

In a way that is kind of what cluster sharding does – there’s a persistent actor which decides on allocating other actors (guarantees the uniqueness); so yeah, I’d say it’s a common pattern

cc @TimMoore

Lagom already automatically uses cluster sharding with the persistence ID. It sounds like this is about something else: a secondary index on another field. Did I understand correctly, @omidb?

As far as I know, there is no way to do this in a strongly-consistent transaction with Akka Persistence, even if you use an underlying database that supports it in principle (Cassandra does not, for example, but most RDBMS could). The solution you suggest will not work reliably, because there is no way to process multiple commands to different persistent entities (or persistent actors in Akka) within a single transaction.

Yes, in this case, you can look at it as a secondary index. But in a more general way, basically retrieving an entity from the registry with some condition (e.g. all entities that have some state). Right now, we should have a read-side processor that generates a table with that condition which could be inconsistent.

That’s exactly the problem, I wish we could have defined strongly consistent relations between entities.

Let’s say:

val ref1= persistentEntities.refFor[Post](id)
val ref2 = persistentEntities.refFor[Mappings]('postMappings')

forTranscation {
replyWithSomeID <- ref1.ask(request)
successs <- ref2.ask(Request2(id))
}...

What I’ve seen on most of Lagom projects on Github is somehow wrong way of using persistent entities. They define ONE entity with an id like: “my entiities” and then they keep all the data there so they have strong consistency. The approach that they use doesn’t give them right scalability but makes it much easier to reason about. I think we need something in between.

My understanding of DDD, which is used by lagom as a part of CQRS (aggregate = entity), sais that entity is a some point for check command and generate events only. This is not a complex bussines entity. Complex bussines entity may be implemented with many lagom entitis each of which has own state.
So I see your task as two aggregates: Post, Mapping and one ProcessManager which communicate they:Post (AddMappingToPostCommand => MappingToPostAddedEvent) ProcessManager (MappingToPostAddedEvent => AddPostToMappingCommand) Mapping(AddPostToMappingCommand => PostAddedToMappingEvent). So you can ask Mapping for needed info, but this will be eventual consistent in relation to the AddMappingToPostCommand.

There are two reasons why this is not possible.

The first is more philosophical / principled reason. In CQRS / DDD, an aggregate (Entity in Lagom) defines a consistency boundary. We can refer to it as a transactional boundary as well. The main reason behind it is because it offers a simplified way of dealing with mutations and transactions. One model, one aggregate, one transaction. That’s basically the aggregate mantra from DDD.

The second is a technical reason and touches one of the design principles in Akka. An actor is location transparent. That means that when you get a ActorRef and send a message to it, you may be sending a message over the network and reaching an actor living on another node. In the specific case of Lagom, a PersistentEntityRef will refer to a sharded persistence actor that can be alive in any node of your cluster. As a consequence, we can’t have a transaction wrapping two calls to two different persistent actors (or Lagom entities) because they may live in two different nodes.

Possible alternative solution

There is a common technique in the CQRS world that may help you in that context. It consists of having “command side support table”. Basically, you keep a DB table that is updated before and after handling commands, thus on the command side.

In your specific case, you could have one to manage the list of ids. Each entry on that table must have a boolean associated with it to confirm that it’s effectively in use. In one transaction, you add the id to the table, but as unconfirmed. Then you pass the shared key to the entity and when the command completes you confirm that the shared key is also being used by that new entity. You do it by flipping the boolean to true.

So, basically you have three transactions:

  1. add entity id to the list of entities using the shared key, but unconfirmed.
  2. create the entity (can be a transaction on a different node)
  3. confirm that the entity was created and is using the shared key.

There are three transactions involved and at least two points of failures that need special attention.

If transaction #2 fails, you have added an id to the list, but the entity was never created. Because you don’t have the confirmation, you don’t return it when searching. You may need to clean it up via a scheduler or just leave it there. You should not clean it up immediately (see below).

If transaction #3 fails, you will need a retry mechanism. The easiest way is to have a read-side processor that listen to created events and confirms it. In that case, you have eventual consistency.

Note that there is also the case that transaction #2 succeeds but you never receive the confirmation or the Future times out. You may think it failed, but the entity was created. In that case, the read-side processor will ’see’ the event and confirm it.

This technique will mitigate the eventual consistency for the sunny day scenario. As soon you get the confirmation that the entity was created, you confirm it on the list and it’s already retrievable. Only when transaction #3 fails that you will be impacted by eventual consistency.

2 Likes

Thanks for the answer @octonato

I understand the consistency boundary we get in an entity is very important and makes it easier to reason about the logic. But at the same time, if we don’t provide the right tools for interactions among entities in a safe/consistent way, I think we are sacrificing function over design. This is why, as I mentioned, most Lagom entities that I’ve seen use one “large” entity that contains everything (which doesn’t scale very well) instead of having smaller entities. Basically, the usage I’ve seen of entities is to hold the COMPLETE state of a service in ONE entity which is not a good way of a design.

Can we introduce a “Transactional Command” that can target more than one entity but can get persisted (emit events) if it is successful in all the entities? I guess (not sure) that we can somehow do this with messaging and acks.

Your alternative is nice, but as you can guess, hard to deal with in a complicated environment. Is it possible to have this a pattern design into persistent actors/Lagom entities communications?