Invalid replayed event from a new writer because of duplicate event

Hi,
We are running a Lagom API and seeing this warning printed in logs:

2018/10/07 23:07:55 WARN akka.persistence.journal.ReplayFilter [107.1.3] [sourceThread=application-akka.actor.default-dispatcher-3, akkaSource=akka.tcp://application@10.20.32.137:2552/system/jdbc-journal/$Jrc, sourceActorSystem=application, akkaTimestamp=23:07:55.160UTC] - Invalid replayed event [sequenceNr=1, writerUUID=f740e7da-3a5c-40f3-87ad-8de572c525b9] from a new writer. An older writer already sent an event [sequenceNr=1, writerUUID=7d4a9973-f764-4f78-9f8e-670a00ecad9b] whose sequence number was equal or greater for the same persistenceId [ProviderSchemaEntitybeb8439b-5466-467d-8dc0-15bb1ead0684]. Perhaps, the new writer journaled the event out of sequence, or duplicate persistentId for different entities?

It’s from https://github.com/akka/akka/blob/master/akka-persistence/src/main/scala/akka/persistence/journal/ReplayFilter.scala#L123-L125.

The reason for this log is the same event is written to journal twice. This happened for the first time.

I’ve included akka.persistence.journal.mysql.replay-filter { mode = repair-by-discard-old } in the config hoping to fix this but it is still printing.

Does anyone have any ideas about how to fix this?

This happens when you have multiple concurrent instances of the same entity/persistent actor, for example if you run both a test environment and a developer environment against the same database and have events written for the same entity. It can also happen if you are using sharding (which Lagom does) and have a “split brain” - a situation where the cluster incorrectly ended up being split in two halves where the separate sides all thought the other side is shut down and therefore end up with duplicates.

You should be able to got into your database and inspect the entries for ProviderSchemaEntitybeb8439b-5466-467d-8dc0-15bb1ead0684, you’ll see that you have duplicate sequenceNr with different writerUUIDs, from there you’ll have to make some an informed decision what to keep and what to delete.

It is likely important to figure out exactly what caused it to avoid it happening again.

Thank you for your response. We’re using sharding and suspected that this might be because of split brain situation. So to fix this, should we delete the duplicate event from journal with an sql query?