Akka Persistence Cassandra: events-by-tags bucket-size based on time vs burst load

Hi there

We use akka-persistence-cassandra as event persistence. We do create backups of our full cassandra database in a database independent format (read all persisted events and store as json files per peristenceId). This works very well and with akka streams super fast.
When restoring a backup (what we also do to load data from one environment into another) we persist all events again (also using akka-streams which works very well for that case).

The problem with this is that the backup restore only takes a few minutes for the entire dataset and restores (writes) all events and the corresponding tags. In the end this means that all events are stored in the same cassandra partition which therefore gets huge. This leads to cassandra failing (OOM etc) and the particular persistencequery by tag failing (looks like message size exceeded - not thoroughly investigated yet).

In such a case one could change the bucket-size to Minute which could help a bit but this would lead to lots of overhead when querying. Also the tags would still not be partitioned well.

Any hint on what we could do would be greatly appreciated! I guess other people will eventually also have this issue when they have any type of burst load (batch job etc)

What comes to my mind as an idea would be a configurable strategy of how to partition tags.
This would be nice since in our case we know the original writing timestamp when an event was written and we could write the tag to the corresponding bucket when the event was originally written instead of restored.

Do you have any idea how we could solve the immediate problem other than setting the bucket-size to a minute and adding a throttle to the restoring stream?

Interesting question. I don’t think we can have some kind of dynamic partitioning. Perhaps we could provide a way to pass in the timeuuid when persisting so it would be possible to keep the original time. Wrapping the event to iclude that timestamp. What do you think @chbatey ?

Sorry I missed the notification for this one. I think making the timeuuid is probably the only solution but it is dangerous. Creating timeuiuds from the past makes it possible to create duplicates. I think this would be a very power user API as if you got it wrong it would ruin your event journal.

We rely on sequence_nr before timeuuid for ordering so I don’t think they’d need to come in timeuuid order but you probably still want it to be monotonically increasing.

I created
https://github.com/akka/akka-persistence-cassandra/pull/362
which allows for timeuuid customization without explicitly supporting it in the plugin.
@chbatey what do you think?

1 Like