Do I have to delete snapshots after Event Migration

Hi,

I am getting following errors after making some major event stucture changes:

[SnapshotMetadata(PortfolioEntity|xxxxx-2aab-478a-xxxxxx-beb1e071f82a,1512,1524909743413)] (1 of 3), trying older one. Caused by: [com.lightbend.lagom.scaladsl.playjson.JsonSerializationFailed: Failed to de-serialize bytes with manifest [com.stashaway.trading.portfolio.impl.portfolio.ces.PortfolioState]\nerrors:\n\t/allocationTargetRecords(1)/allocationTarget/strategyId: ValidationError(List(error.path.missing),WrappedArray())\n\t/allocationTargetRecords(0)/allocationTarget/strategyId: ValidationError(List(error.path.missing),WrappedArray())\n\t/portfolio/strategyId:

Can this be fixed by deleting the snapshots from the database? Does it mean we have to delete the snapshots table everytime we make a structural event change?

Thanks,

Hi @lejoow,

You can delete snapshot whenever you want as long as you keep all the events of the journal forever. Snapshots are just a performance optimisation.

This being said, you can use Lagom’s Schema Evolution so that old-format persisted data is updated in-flight.

Cheers,

1 Like

Thanks @ignasi35
But I got that error even though I created the Transformation in the JsonSerialRegistry class. And that error disappeared when I deleted the snapshot of that Entity, and restarted the service without any code changes.

Hmm, that’s interesting. Transformations should work for both events and state serializers. If transformations are not used on snapshot (state) deserialization this could be a bug.

Interesting… We are still using the older version of Lagom (1.3.5). Are you aware of any bug fixes that were introduced to cover this?

What do you need from me to assess whether this is a bug or not?

Thanks

Only alternative I can think of is a reproducer on github with two commits. The first commit uses a case class State with a certain shape and the second commit has a new field in case class State and migrations. We would then be able to checkout 1st commit, runAll, emit some commands until a snapshot is produced, stop, checkout second commit, runAll over the existing Cassandra and see a failure/success.

I think we could use sbt new lagom/lagom-scala.g8 and I’d also tune the number of events before a snapshot is taken to 3 or 5 instead of the default 100.

Makes sense ?

Hi Ignasi,

While I was investigating, I found something else to be somewhat strange.

When I did following query on the snapshots table:

select * from portfolio.snapshots
where persistence_id = 'PortfolioEntity|7d0273b8-8xxxxxx071-6b33ef4afcf7'

I found that snapshots were generated NOT at exactly every 100th sequences.

image

As you can see, some of the sequence_nr are at 601, 704, 805… and so on.

How is this possible when we set the snapshot size to be 100?

Also, what is quite interesting is this error message:

Failed to load snapshot [SnapshotMetadata(PortfolioEntity|xxxx-8070-xxxx-b071-6b33ef4afcf7,1405,1522220657955)]

(3 of 3), last attempt.

Caused by: [com.lightbend.lagom.scaladsl.playjson.JsonSerializationFailed:

Failed to de-serialize bytes with manifest [com.stashaway.trading.portfolio.impl.portfolio.ces.PortfolioState]

Do you notice how the error message says it failed at the snapshot number 1405? Don’t you think it is quite strange that how it did not have any issue processing snapshots until 1405? I checked the snapshots number 1305,1205 and 1005. They have no structural differences to the snapshot 1405. If something in snapshot 1405 is causing it to fail, the process should have failed processing snapshot 1305 and 1205 as well.

Or perhaps, it is reading the snapshots in the decending order? meaning that Snapshot 1606 and 1506 will be processed first, and then 1405…

This can happen if, for instance, you have 99 events saved and you get a command that emits 2 or more events all together. In that case your events count jumps to 101 or more and you get this effect.

There is no harm.

I’m surprised by the fact that it’s reading snapshot 1405 and also by the first error you posted here that says “trying older one”. That suggests that it read 1606 and failed, probably it read 1506 and failed as well and now is trying to read 1405.

I didn’t know that we had this cascading thing. Probably a feature from the cassandra plugin I was not aware.

(edit: what @octonato said :point_up: . You can stop reading, it’s duplicate. )

IIRC tne 100 means that there’ll be a snapshot not-before 100 events. Sometimes a command causes many events, in that case, the events could be crossing the x%100 boundary but the snapshot happens after the last event on the Seq.

For example, if all your commands emitted 3 events your snapshots would be at 102 (first multiple of 3past 100), 204 (first multiple of 3 that’s 100 events father from 102),…