Question about event sourcing?


#1

I’ve been reading and ran the example about akka persistence event sourcing, and had a couple questions

  1. Say it is june 1st, 10am and my AKKA cluster starts
  2. Say it is june 31st, 10am my AKKA cluster crashes

In between June 1st --> June 31st I received 100,000 messages

a) When AKKA starts back up and recovery begins, the default behavior is for the 100,000 messages to be replayed?
b) To alleviate 100,000 messages being replayed I could take snapshots instead for quicker recovery?


(Rob Crawford) #2

Are all 100,000 messages actually events (ie, do they all cause state changes to your system)? Would they all apply to actors that are still relevant a month later? For example, if one sequence of events starts a “session”, manipulates that “session”, and then ends the “session”, what’s the value in maintaining the actor and its events after it ended?

But, yes, snapshots can help reduce the number of messages replayed.


#3

All 100,000 messages actually cause permanent state changes into the system. For example to keep it simpler

10am --> 100 messages come in
10:05am --> 100 messages go into an internal work queue (an internal hashmap in the actor)
10:10am --> 50 messages are completed, 50 messages are still in the internal work queue.

What i’m trying to persist is the state of the internal work queue if akka goes down


(Rob Crawford) #4

How I would do it (wiser heads would likely have a better way):

The messages aren’t the events. The events are:

“WorkRequestArrived” – which holds the message and some unique identifier
“WorkRequestCompleted” – which holds just the unique identifier

What gets persisted are these two events – so you’d have 100 “WorkRequestArrived” events in the journal at 10:05am. As messages are completed, “WorkRequestCompleted” events are added to the journal. So at 10:10am, you’d have 100 “WorkRequestArrived” and 50 “WorkRequestCompleted” events in the journal.

(if the messages are sizable or there’s an issue serializing them, or even just for auditing, I’d store them somewhere else and just persist a reference to them.)

Snapshots would reduce the number of events replayed during recovery, yes. Each persisted event gets a serial number; when you take a snapshot it receives the same number as the most recent persistent event. During recovery, the system gives you the snapshot then each event with a higher serial number.

So, if you don’t have snapshots, and need to restart after 10:10am, you’d have 150 events played back.

Let’s say you take a snapshot every 100 events (but not do anything else). At 10:05am, your journal has 100 events and your snapshot has a single serialized view of the queue. At 10:10am, the journal has 150 events, and the snapshot holds the same serialized view. If you start at this point, you get the snapshot offer, then the 50 events that occurred after the snapshot was taken. So, yes, the snapshot will reduce the number of events replayed.

To save space in the journal, you can use SaveSnapshotSuccess to tell you it’s OK to delete the journal entries older than the snapshot. So, with this, at 10:10am the journal would only contain 50 events, instead of 150.

Does this help? I hope I’m answering your questions.