Cassandra plugin: what happens when there are persisted events with the same seqNo?


(Heiko Seeberger) #1

Out of curiosity, let’s assume I have started persistent actors with the same persistenceId on several cluster members and “managed” to persist the same type of event – e.g. AccountCreated(username, passwordHash) – with similar but still different data – e.g. same username but different passwords – with the same sequence number.

Then I run an EventsByPersistenceId query. I was assuming that I get a failure or at least two EventEnvelopes with the same sequence number. But surprisingly this does not happen. So the Cassandra plugin seems to deduplicate events with the same sequence number. According to which logic? Latetst timestap?


(Patrik Nordwall) #2

I don’t think it’s deduplicated by Cassandra because the timestamp is part of the primary key.

I think it’s deduplicated by the ReplayFilter, which detects that there is a new writer but with same seqNr as already received. It’s actually buffering events and discarding the oldest.
You should see a log message from the ReplayFilter when this happens.

It’s also possible to configure the ReplayFilter.

That said, the ReplayFilter can’t detect all problems that can occur in those situations, so in general you are in trouble when there are multiple writers.


(Heiko Seeberger) #3

Interesting! Thanks!

Actually I don’t get such a message:

chakka-iam 11:17:51 DEBUG CassandraJournal [CassandraJournal(akka://chakka-iam)] - Recovery is starting before the latest tag writes tag progress. Min progress for pid 1. From sequence nr of recovery: 1
chakka-iam 11:17:51 DEBUG CassandraJournal [CassandraJournal(akka://chakka-iam)] - Starting recovery with tag progress: Map(). From 1 to 1
chakka-iam 11:17:51 DEBUG CassandraReadJournal [CassandraReadJournal] - Creating EventByPersistentIdState graph
chakka-iam 11:17:51 DEBUG EventsByPersistenceIdStage [EventsByPersistenceIdStage(akka://chakka-iam)] - EventsByPersistenceId [accounts] Query from seqNr [1] in partition [0]
chakka-iam 11:17:51 DEBUG EventsByPersistenceIdStage [EventsByPersistenceIdStage(akka://chakka-iam)] - EventsByPersistenceId [accounts] Query took [5] ms
chakka-iam 11:17:51 DEBUG ReplayFilter [akka://chakka-iam/system/cassandra-journal/$b] - Replay: PersistentImpl(AccountCreated(alex,sha1:64000:18:RMImVX5LMGlAdsR0pvG72xuqb7XNZp7s:mSJTTF+rs7mBtQmmURBRBanL),1,accounts,,false,null,1e2560dc-36e8-47c3-a0b4-5c5379017a5d)
chakka-iam 11:17:51 DEBUG ReplayFilter [akka://chakka-iam/system/cassandra-journal/$b] - Replay completed: RecoverySuccess(1)

All I get is that a message has been replayed. But no mention of the dropped duplicate. Which exists, as you can see here

cqlsh> select persistence_id, partition_nr, sequence_nr, timestamp from akka.messages;

 persistence_id | partition_nr | sequence_nr | timestamp
----------------+--------------+-------------+--------------------------------------
       accounts |            0 |           1 | efd23970-5432-11e8-bcad-7b01759c72f5
       accounts |            0 |           1 | f35fcd00-5432-11e8-989f-fd25adfa51f2

(2 rows)

(Patrik Nordwall) #4

Ah, I forgot that we added sequence number tracking in the EventsByPersistenceIdStage in akka-persistence-cassandra. It’s probably filtered out there.