Akka Persistence Cassandra: tag views TTL unclear


#1

I was struck with the unexpected fact that writing a tags for an event means the whole event is in fact stored (1 + #tags) times. That might be a performance optimization for query but this should be documented. (and I would be happy to know why - the tradeoffs seem massive when storing bigger events).
What is even more unclear is the TTL setting for tags. Am I correct that you use cassandra’s TTL so after the default of 2days (if enabled) all tags are lost - even those for not deleted events? That this is completely unrelated to the corresponding events livecycle strikes me as odd? It would seem more valid if tags are deleted for deleted events (and only deleted) with a background job (eventually consistent) rather than at some arbitrary time by cassandra itself.

Bottom line: I think the documentation of the tags for events should be improved.

From the doc:

Cleanup of tag_views table

By default the tag_views table keeps tagged events indefinitely, even when the original events have been removed. Depending on the volume of events this may not be suitable for production.

Before going live decide a time to live (TTL) and, if small enough, consider using the Time Window Compaction Strategy. See events-by-tag.time-to-live in reference.conf for how to set this.


#2

accidently found


we should add a link from Readme.md


(Christopher Batey) #3

PR welcome for the docs!

This isn’t really a performance optimisation, it is the only way to query the data in different ways in Cassandra. Other options would be materialized views which also duplicates the data under the covers.


#4

Jep - can add a PR to mention the space consumption when using tags. Would add a link to the EventsByTag.md as well.

However I still do not fully understand the TTL

What is even more unclear is the TTL setting for tags. Am I correct that you use cassandra’s TTL so after the default of 2days (if enabled) all tags are lost - even those for not deleted events? That this is completely unrelated to the corresponding events livecycle strikes me as odd? It would seem more valid if tags are deleted for deleted events (and only deleted) with a background job (eventually consistent) rather than at some arbitrary time by cassandra itself.

I can see benefits for using TTL in the case when you transform all events into a different datastore/representation by using an EventsByTag query and this is done within that TTL. (In case of problems people using this would probably be happy when the tags are redundantly stored in the messages table in case this fails). So for all other usescases you do not want to set that TTL since you actually use your tags - or did I completely get this wrong?


(Christopher Batey) #5

That’s right. You should only use TTL if you are using the tag to build a read side view that you won’t need to rebuild from the tags.