I was struck with the unexpected fact that writing a tags for an event means the whole event is in fact stored (1 + #tags) times. That might be a performance optimization for query but this should be documented. (and I would be happy to know why - the tradeoffs seem massive when storing bigger events).
What is even more unclear is the TTL setting for tags. Am I correct that you use cassandra’s TTL so after the default of 2days (if enabled) all tags are lost - even those for not deleted events? That this is completely unrelated to the corresponding events livecycle strikes me as odd? It would seem more valid if tags are deleted for deleted events (and only deleted) with a background job (eventually consistent) rather than at some arbitrary time by cassandra itself.
Bottom line: I think the documentation of the tags for events should be improved.
From the doc:
Cleanup of tag_views table
By default the tag_views table keeps tagged events indefinitely, even when the original events have been removed. Depending on the volume of events this may not be suitable for production.
Before going live decide a time to live (TTL) and, if small enough, consider using the Time Window Compaction Strategy. See
events-by-tag.time-to-livein reference.conf for how to set this.