Strategy for deleting events and distributed data state of short lived entities

leozilla · July 3, 2020, 10:36am

I have a lot of quite short lived (< 5-10 sec) Saga actors/entities for which I use typed persistence and cluster sharding with durable storage and remember entities = true. From what I can see, both the persisted events and also the durable data state in the DurableStore of cluster sharding are not deleted when I stop my entity.
I understand that the persisted events are not deleted but I would expect that the relevant keys in the durable ddata store are at least deleted.

My question is: Whats the preferred way to delete this 2 things, persisted events and more importantly the durable ddata state? (as I am using remember entities = true, which leads to always loading ALL entities after a cluster startup!)

Note: I implemented my own DurableStore which uses JDBC to store the state in postgres.

Edit:
Deleting events: I could implement a cleanup task which runs on service startup (or shutdown) and searches for all finished sagas (by querying the event store for SagaFinished events) and then delete those.

Deleting ddata state: What I can think of here is register my DurableStore with the Receptionist and when a Saga finished I could lookup the DurableStore actor and send it a message to delete the state for this entity/saga.

Any other/better ideas?

leozilla · July 5, 2020, 6:24pm

I investigated the options for deleting the events from the journal and the best I could come up so far is obtaining an instance of the journal actor via reflection and using the existing JournalProtocol (also via reflection) to delete the messages.

class SagaLog(journalPluginId: String)(implicit system: ActorSystem[_]) { // ignoreRef only exists in typed ActorSystem
  import akka.actor.typed.scaladsl.adapter._
  private val extension: Persistence = Persistence(system.toClassic)
  private val journal: ActorRef =
    PrivateInvocation.target(extension).call(Symbol("journalFor"))(journalPluginId, ConfigFactory.empty())
  private val ignoreActorRef = system.ignoreRef[Any].toClassic

  def deleteAllEvents(typeKey: String, sagaId: SagaId): Unit = {
    val id = PersistenceId.of(typeKey, sagaId).id
    val deleteTo = Long.MaxValue
    journal ! DeleteMessagesTo(id, deleteTo, ignoreActorRef)
  }
}

I am calling this method from within the PostStop signal handler of my actor behaviour.

.receiveSignal {
  case (_: Saga.State.Finished, PostStop) => sagaLog.deleteAllEvents(typeKey, sagaId)
}

But I am not sure if triggering the deletion in this way is a good idea. I can imagine there can be cases where a message is delivered to the actor after its events where deleted, in which case the actor would start from its initial state again.

I havent investigated howto delete the durable distributed data yet or if this is even required.

Any inputs about this topic are very welcome.

johanandren · July 10, 2020, 12:59pm

As you mention, in general you’d want to keep track of the fact that the entity has been “completed” or else you may end up re-using the same entity id. In some use cases maybe that is fine or such validation can be performed in some other way.

We have just released Akka 2.6.7 which has a separated remember entities store from shard coordinator state store, with this you could potentially disable coordinator durability and still use a persistent store for remember entities. The new persistent store does event deletion out of the box.

As for options for triggering deletes when an entity stops, if you still want to do that, receptionist is fine, another option would be to publish an event to the event bus.

turnerjasonuk · June 16, 2022, 9:40am

This is of great interest to me: I am looking at Akka as I would like to use event sourcing for some particular microservices in my system (we are migrating an older space-based architecture to an event based microservice architecture). My challenge is that I have high volumes (always increasing, ~1.5bn entities a day) of fairly short lived (1-10 days) entities, and I need to process those 1.5 bn inside of maybe 4 hours (actually spread through day but with significant spikes) - so I need to be able to ‘archive’ the events for ‘retired’ entities off to another longer term store (most likely as ‘history’ documents in a Doc DB . I wouldn’t be using ‘remember entities = true’ - my access will be quite random and only one or two events for an entity per day.

So my two major challenges looking at Akka are integrating with Kafka nicely wrt transactions and at least one delivery, and this archiving question.

Topic		Replies	Views
Deleting events for a persistence Actor Lagom akka	1	508	October 6, 2020
How to delete events in Akka Persistence Persistence / Event Sourcing akka	2	1384	May 5, 2020
Permanent deletion for withDeleteEventsOnSnapshot Criteria Persistence / Event Sourcing	2	684	February 1, 2020
Remembering entities with cluster sharding Akka Cluster lagom	4	2214	July 1, 2020
Missing entries (events) in cassandra messages table Persistence / Event Sourcing	2	617	August 30, 2022

Strategy for deleting events and distributed data state of short lived entities

Related Topics