I’ve seen a report about similar errors here: Lagom errors on startup
My situation is different.
My lagom 1.4.11 application has about a dozen micro-services in Scala 2.12, each are fairly simple.
Unlike the other case, there is no code that creates tables in cassandra. There are also no read-side processors. It’s just vanilla service descriptors. All interactions w/ Cassandra are due to Lagom’s persistence.
I use an external cassandra 3.11.4 & kafka services.
So, when I execute
sbt runAll, lagom starts the service locator + all my micro services; nothing else.
With a clean cassandra server,
sbt runAll results in several “Column family ID mismatch” errors in the cassandra log. From the various reports about this error, it seems to be related to vulnerabilities in concurrent schema operations in cassandra; an on-going topic here: https://issues.apache.org/jira/browse/CASSANDRA-9424
On my laptop (Dell 7530, Ubuntu 18.04 LTS, Xeon E2186M, 64Gb RAM), there’s enough horsepower that, statistically speaking, starting a dozen micro-services, each of which needs to initialize a schema in cassandra, may indeed put some stress on a single node cassandra deployment, also running on that machine. At least, this is what my experience seems to suggest.
Recently, I wrote a simple sbt task to start all my micro-services sequentially with
Def.sequential(lagomRun in project1, lagomRun in project2, ....).
At least, this avoids stressing cassandra during schema initialization.
What’s annoying though is that I can’t write:
Def.sequential(lagomServiceLocatorStart, lagomRun in project1, ....) as SBT says that
lagomServiceLocatorStart is undefined.
Indeed, looking in the LagomPlugin source code, this task key is set only in the private project,
lagom-internal-meta-project-service-locator. Ok, I have to manually issue
lagomServiceLocatorStart and then my sequential task. At least, this makes for a somewhat clean start in dev mode.
Surprisingly, I didn’t used to experience this problem. Is this a scale issue?
I"m not sure, one other factor that may be relevant: recently, I turned on logging for each micro service.
Before, everything went to the sbt output; that might have indirectly forced some sequencing of the micro services start. With logging enabled, It seems that there is more concurrency at startup; perhaps enough to trip some vulnerabilities in cassandra.
I’m wondering whether others have experienced this kind of problem.
For production, I’m wondering about doing something similar, that is, forcing the micro services to start one after the other just to prevent concurrent schema operations on cassandra. After all schemas are created, I’m not worried about concurrent operations hitting cassandra because this is a well known territory whereas schema initialization is more tricky as indicated by the on-going issue.