Cassandra is down and when it is up, services are not checking for Cassandra connectivity

The current behavior is :
When Cassandra DB is down for some reason, and is restarting.
During that down time if Services tries to connect to DB, it gives Cassandra DB not found errors and exits the request.
We are now required to restart the Service manually, after which it is able to connect to Cassandra.

(This issue is not observed during container startup or since we are handling it using the Init_Container feature of Kubernetes.
We are seeing this issue only when Cassandra goes down intermittently for some reason and is bringing itself up.)

Expected/Preferred behavior:
If Cassandra is not available, Service should keep checking or wait till it is up, and then reconnect to it.
This will provide a graceful reconnection mechanism.

Can you please let us know if there is any inbuilt lagom feature that would enable this behavior.
Or if we should write a retry mechanism code.

We observed this reconnection mechanism already exists for Kafka. Whenever Kafka goes down and up, the services connect to it automatically.

Please provide and help on this issue.

We are seeing the same behavior in k8s. If the cassandra statefulset restarts a pod, the IP address changes, but lagom/akka-persistence continues to use the old addresses. Note that we are binding to the cassandra service, but akka-persistence is caching the initial ip addresses from the dns service lookup.

@vemisettipriyanka, @dpennell,

I’m not sure if you are expiriencing the same issue but regarding cassandra endpoint discovery and access will mainly depend on used ServiceLocator implementation.
ServiceLocator is responsible for, depending in the implementation, querying endpoints, caching and load-balancing.

For the reference, Lagom akka persistance cassandra session provider is ServiceLocatorSessionProvider.

Prefered ServiceLocator implementations:

  1. reactive lib - Lagom 1.4.x (not supported by Lagom 1.5.x)
  2. Lagom akka discovery Lagom 1.4 & Lagom 1.5

Implementations are improving with every new version so check for updates.

Hope this helps.

Br,
Alan

I’m using reactive lib. It appears that the service lookup for cassandra is done once and only once.

@patriknw can you maybe comment on this?

The connection pool in the Cassandra driver should reconnect itself after the initial discovery and connect to the contact points. I can see that this could be a problem if the entire Cassandra cluster is restarted with new IP adresses. I think there is a recently added issue about this, https://github.com/akka/akka-persistence-cassandra/issues/445

2 Likes

We are using the following configuration for Cassandra persistence.
When Cassandra is down, service is coming down and not looking for Cassandra connectivity when Cassandra is up and running.

–START
// Enable dependency injection
play.modules.enabled += com.lightbend.rp.servicediscovery.lagom.javadsl.ServiceLocatorModule

cassandra-keyspace = test

cassandra.default {
session-provider = akka.persistence.cassandra.ConfigSessionProvider

list the contact points here

contact-points=[{?CASSANDRA_HOST}] port={?CASSANDRA_PORT}
}

cassandra-journal {
keyspace = {cassandra-keyspace} contact-points = {cassandra.default.contact-points}
port={cassandra.default.port} first-time-bucket = "20160225T00:00" session-provider = {cassandra.default.session-provider}
}

cassandra-snapshot-store {
keyspace = {cassandra-keyspace} contact-points = {cassandra.default.contact-points}
port={cassandra.default.port} session-provider = {cassandra.default.session-provider}
}

lagom.persistence.read-side.cassandra {
keyspace = {cassandra-keyspace} contact-points = {cassandra.default.contact-points}
port={cassandra.default.port} session-provider = {cassandra.default.session-provider}
}

Enable new sharding state store mode by overriding Lagom’s default

akka.cluster.sharding.state-store-mode = ddata

Enable serializers provided in Akka 2.5.8+ to avoid the use of Java serialization.

akka.actor.serialization-bindings {
“akka.Done” = akka-misc
“akka.actor.Address” = akka-misc
“akka.remote.UniqueAddress” = akka-misc
}

get seed nodes from environmental variables

akka.cluster.seed-nodes = [
${?SEED_NODES_0}
]

lagom.broker.kafka.service-name = “”
lagom.broker.kafka.brokers = ${?KAFKA_SERVICE_NAME} # this can be a comma-separated string if you have >1

–END