Serious persistence issue when switching to Lagom 1.5.5

I’m encountering a very strange behavior with no event being persisted (or even read) and no particular warning of any kind from Lagom or Akka. More specifically, the commands are handled correctly (using the initial entity state) and returns a correct reply, but the messages table in Cassandra remains fully empty (and thus no read-side processing occurs) Thus, if we send twice the same command (after the entity get passivated), they are processed exactly in the same way, without any side effect. Not also that existing events in Cassandra are not considered neither: entities are always started in the initial state without considering any existing history. And note also that this happen both in dev and prod.

I struggled quite a lot to understand the cause, and this still remains quite obscur. But I realized that by switching back to 1.5.4, the problem disappear and everything works as expected. I have checked twice and just by changing this Lagom minor version number, I can make the problem appear or disappear.
I suspect that the issue is more complex and I still need to reproduce the behavior in a simple setting to understand it more and possibly open an issue.

But in the meantime, I was hoping that someone could have some idea on the possible origin of such a strange behavior (no effective read-write from cassandra but no complain from Lagom-Akka)

Thanks for any help!

Hi @datalchemist,

That sounds very strange. Lots of people have done this upgrade without signaling any issue.

Apart from the misbehaving persistence layer, are you getting any exception in your logs?
It will be helpful if you can provide a reproducer.

Thanks,

Renato

Hi @renato,

Yes,that’s also what I thought that it would have been discovered before if this was specifically related to Lagom 1.5.5
I’m suspecting some complex dependency issue but for now, I can’t find anything conclusive. And no, I have no exception at all on the logs and that why this particularly puzzling. It behaves as if everything was going fine, except that the events are not written in Cassandra (nor read)

I’m currently trying to make simple reproducer, and by doing so, I realized that I can make it work with 1.5.5 by switching back to sbt 1.2.x. Are you aware of limitation with the version of sbt that is used ?

Thanks

I would be surprised if this is being caused by sbt.

Are you running the embedded Cassandra server or you have it running independently?

I did a quick test using the hello world template.

You can create a new app using the previous version with the following command:
sbt new lagom/lagom-scala.g8 --branch 1.5.x

It will create a hello world scala app using version 1.5.5, Cassandra and sbt 1.2.8. That’s how the template is configured by default.

You can then downgrade to 1.5.4 and also upgrade sbt to 1.3.8.

I tried all combinations:

  • Lagom 1.5.4 with sbt 1.2.8 and 1.3.8
  • Lagom 1.5.5 with sbt 1.2.8 and 1.3.8

All working as expected. Each time I restart it I can read the events persisted on the previous run.

I guess there is something else in your application. Try to take a fresh template and add and remove thigns as you go.

Thanks for your tests and help.

Actually, the problem appears both in dev mode (with an embedded Cassandra server) but also in prod mode (an thus with an independent Cassandra server) And considering that, this is indeed particularly strange that this is affected by the version of sbt: just by creating my docker image with a different sbt version, I can make it work or not.
So, that’s indeed far possible that it is not specifically related to sbt or lagom. Currently, my best guess is that there is some complex dependency issue and that, by chance, it does not occur with some combination of sbt&lagom versions…

I’ve started to reproduce the problem in simpler and simpler settings and I will let you know as soon as I find out more info. But this problem caused me much trouble in my ongoing work, and so I first need to restore everything correctly in my system.

Another possible source that I need to investigate is my particularly complex usage of Lagom. Indeed, my service is done with two projects: a first “generic” Lagom service which defines generic implementation for several base Lagom components (PersistentEntity, ServiceImpl, Service, …) by enclosing them in some type-parametric traits (I’v done that because it seems very hard if not impossible to define directly generic Lagom components with type parameters). Then, there is a second “concrete” project which extends those type-parametric traits in order to make a concrete implementation of a Lagom service. I have used this setting with no problem in the past, but now, I am wondering if this can related to my current bug. Indeed, with this approach, I may end up using some Lagom components in a non-expected way (for example, my PersistentEntity are inner classes rather than top-level classes).
Anyway, up to now I have not identified any particular issue with this approach (and I have used it on prod for one year) but maybe, this could speak to you as a potential source for my problem.

One possible thing to look is the version of Cassandra. Did you upgrade the driver?

There was an issue on sbt 1.3.x (I don’t remember which one exactly) that could impact the ordering on the classpath. So, eventually you are loading a Cassandra driver that is causing some issue. Still, I would expect it to fail with at least some exception. This is failing silently.