Managing data integrity for public events


(Michael Mangeng) #1

Hi all!

i’ve got a question about managing data integrity for public events in lagom.

My case is an entity whereas its instances can be arranged in a tree. Each entity holds only the entityId of its parent as Optional. If an entity is moved e.g. 2 times successively, 2 instances of PEntityMovedEvent will get created. If i want to send the absolute path in the PUBLIC event i have to query the write-side (or a read side) to get the path. Therefore the outcome of this resolve process is timing dependent. If i’m lucky and the event processing is fast enough, the path is correct. If its after the 2nd modification, the absolute path will be the current one… depending on the layout of the public event, i may emit the same event two times or worse, the data of the event could be inconsistent.

What are the possibilies for the developer to solve this?
a.) “Pollute” the persistent event with data of the state which are required for the public event. (Pollute because this data is not required to recreate the entity state internally). In my case this would be the new absolute path.
b.) Implement some sort of synchronization mechanism. E.g. do not allow certain entity updates until this processor has processed the emitted event. Could be done by returning e.g. a uuid as a transaction identifier which is also in the persistent-event. In my case this is very tricky. The lock would have to affect not only the entity itself but also all entities in the absolute path.

Is this “the way to go” or are there better solutions?

greetings,
Michael

PS: This is also done in the online auction example:
e,g.: https://github.com/lagom/online-auction-java/blob/cebdee4d696ee56863ea5654bdf92896e3088695/item-impl/src/main/java/com/example/auction/item/impl/ItemServiceImpl.java#L154

What if the state of the action is different from the state when the auction was created? We always emit the “current” state here.


(Alan Klikic) #2

Hi,

I think you should embrace eventual consistency as a fact :slight_smile:
From perspective of the requester, doing get on service write side is also eventual consistent because immediately after you collect data, state can change.
It is true that get on write side, from requester perspective, is more consistent then get on read side but again is eventual consistent.
My advice is to model functionalities as much as possible with embrace of eventual constancy.

Personally I do not prefer doing write side read in topic producer event mapping or read-side processor.
I prefer having all needed information in the event it self. Event should be autonomous from perspective of information owned by the entity that produced it.
So when modeling entity events I also think about what information, that is owned by the entity, will be required in read-side or topic producer and enrich the event to be autonomous. I will emphasize again that enrich information is NOT information that is NOT owned by the entity.

I hope this helps.

Br,
Alan


(Michael Mangeng) #3

Hi Alan,

yes - thank you.

If it’s ok the embed “enrich information” into the entity, it makes live easier and i can produce consistent events with ease.

In my case i can store the fullPath in the entity and by using a new event (PathUpdated) i can stay consistent within the public events.

Thank you!

greetings,
Michael


(Marc-Antoine Nüssli) #4

Hi,

@aklikic I am not completely comfortable with the idea of enriching events with any information that would be useful to the reader. I mean as long as the information could be generated from previous events of the same entity, I would try to not put it in the event. Sometimes, the enriched info might be quite large and would indeed pollute the event store.
I do not consider that a single event has to be autonomous but rather the event stream of a single entity is autonomous.

@mm I am not sure to understand correctly your problem but it sounds that your goal is to provide additional state info to the requester of the write command. In this case, you can use the command Reply which can contains as much information as you may need (e.g. information that would only be eventually in the read-side) as it is not persisted.
And more generally, I tend to think that when external caller/reader need infos not conveyed directly in a single event, they must maintains their own read-side the service events.


(Alan Klikic) #5

Hi,

I completely agree. Thank you for pointing that out.
Enriching has to be modeled per use case.
In case of large data, it should be avoided by using readside generated by preceding events. Consistency has to be ensured by handling event stream with the same event processor/topic producer. Querying write side, in this case, should also be avoided because it could break consistency by “looking in the future”.
On the other hand it is not an anti-pattern and in some cases, where data is not large, it can simply/optimize implementation.

Br,
Alan