ReadSideHandler semantic (or using PersistentEntity as FSM)

datalchemist · July 12, 2019, 9:54am

Hi there,

I have some questions regarding the exact semantic of ReadSideHandler.

Use-case
Use PersistentEntity as a kind of persistent FSM (i.e. one FSM per AggregateRoot)
Roughly, the PersistentEntity State is the FSM State which is also reflected in the emitted Events. Then, a ReadSideHandler is responsible for doing FSM transition which includes performing async operation (calling another service) and computing next state which is then sent back as a Command to the PersistentEntity.

Questions
Now, my first point is ensuring that the handle() Flow in ReadSideHandler has an exactly-once semantic, as otherwise it would break my pseudo-FSM semantic. I guess it is the case but this seems to not be specified in the doc, so I would like to find a confirmation of this.

My second question concerns parallelization of ReadSide event processing. As the processing of my Event includes asynchronous operations, I would like to parallelize it among the AggregateRoot’s.
As far as I understand the only way to process events in a parrallel way is to rely on AggregateEventTag and to define the number of Tag according to the desired level of parallelism. Here I wonder whether I can define any number of AggregateEventTag without bothering, or whether this may have unintended effects on other aspects. To put it differently, are these AggregateEventTag used only for ReadSide processing or are they also used somewhere else ?

Last but not least, does the use-case presented above seem suitable, and do you see any potential problem or drawback that may occur with this approach?

Thanks in advance for any response!

aklikic · July 12, 2019, 3:29pm

Hi @datalchemist,

Exactly-once semantic is guaranteed in case of using Cassandra read-side where user provided statements in the event handler are executed in the batch with offset update. I’m not sure does this apply for JDBC also. I assume yes.
But for your use case, where external resource is called, exactly-once is NOT guaranteed. For example if call succeeds but offset update fails event will be re-handled. So at-least-once semantic applies.

There is one instance of ReadSideProcessor/TopicProducer per tag. One entity instance events are always tagged with the same tag to preserve event order.
You could have as much tags as you want to achieve higher parallelism but this comes with a cost. Every ReadSideProcessor/TopicProducer performs eventByTag query that is very “expensive” in sense of store resource consuming. By increasing number of tags, number of ReadSideProcessor/TopicProducer instances increase and by that number of eventByTag queries.
Tags are used only in ReadSideProcessor/TopicProducer.
One optimal solution would be to publish to Kafka with higher number of partitions allowing you to consume topic with higher parallelism. To publish to Kafka you could have lower number of tags because publish is less time consuming.

Hope this helps.

BR,
Alan

datalchemist · July 12, 2019, 10:34pm

Hi @aklikic,

Thanks alot for your answer! That made think further and better understand the whole.
The exact semantic of ReadSideHandler.handle() flow is not defined as it depends on how side-operations are performed relatively to the offset update managed by this same ReadSideHandler (whether there is some sort of atomicity) And that’s clear that I can have only at-least-once semantic if I call external service as it can’t be atomic with offset update.

Thanks for the considerations around the parallelism based on tag. I understand better (and I have already seen how resource-consuming these query are in live system) and so, I will clearly avoid to use too much tag. The kafka option is indeed interesting and I will consider it further.

But now, that makes me also think that there may be also some room for parallelization in the ReadSideHandler.handle Flow (given that we accept at-lest-once semantic). By using some mapAsync(N) stage there, we may process multiple events in parallel (and if we need some ordering, we could also use Flow.groupBy with a tag-like system). Is-there some constraint or limitation on this handle() Flow, such as single element flowing at a time, that would prevent such approach ?!

Best,
Marc-Antoine

aklikic · July 13, 2019, 6:10am

Hi,

That is a good question and I do not have an answer for that.
I would say that similar logic applies as for flow in topic consumer. Check this for reference.
@TimMoore @octonato @ignasi35 can you please comment on this?

Br,
Alan

TimMoore · July 19, 2019, 4:51am

The trouble with using mapAsync in a read-side handler flow is that you would lose the guarantee that the events for a particular entity ID will be handled in order. For example, if you have a create event and then an update event, and handle them concurrently, then the logic handling the update might run before the logic handling the create. Whether this is a problem or not depends on the specifics of your data and processing logic.

Topic		Replies	Views
How to expose entity state via read-side? Lagom	4	945	April 25, 2019
Persistent Entities and ReadSide Queries / Cassandra Session Lagom Persistence API	1	966	August 31, 2018
PersistentEntity replaying all persistent events Lagom	4	1057	November 7, 2018
Lagom Read Side implementation Lagom akka	3	1019	May 31, 2018
Cassandra Read Side Not Processing Events Lagom Persistence API	4	1279	April 30, 2020

ReadSideHandler semantic (or using PersistentEntity as FSM)

Related Topics