NoHostAvailableException

franz · September 7, 2018, 6:22pm

Hi,

Anybody what causes this and how to deal with this?

02:18:23.418 [error] com.lightbend.lagom.internal.javadsl.persistence.PersistentEntityActor [sourceThread=my-microservice-impl-application-akka.actor.default-dispatcher-16, akkaTimestamp=18:18:23.413UTC, akkaSource=akka.tcp://my-microservice-impl-application@127.0.0.1:50720/system/sharding/MyMicroserviceEntity/42/DUMMY, sourceActorSystem=my-microservice-service-impl-application] - Persistence failure when replaying events for persistenceId [MyMicroserviceEntityDUMMY]. Last known sequence number [0]
java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)))
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:503)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:462)
        at akka.persistence.cassandra.package$$anon$1.$anonfun$run$1(package.scala:18)
        at scala.util.Try$.apply(Try.scala:209)
        at akka.persistence.cassandra.package$$anon$1.run(package.scala:18)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)))
        at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:213)
        at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:49)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:277)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.retry(RequestHandler.java:441)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.processRetryDecision(RequestHandler.java:419)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:635)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1075)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:998)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:647)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:582)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:461)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 common frames omitted

Thanks,
Franz

aklikic · September 8, 2018, 7:58pm

Hi @franz,

At what scale are you running cassandra?
From the log I would say 3 (journal query consistancy set to quorum, in your case needs 2).
Check this config:
reference.conf

Br,
Alan

aklikic · September 8, 2018, 9:09pm

Log is related to replying event for building entity state so you should check write side configuration.

franz · September 10, 2018, 2:48pm

Thanks @aklikic. I’ve figured it out. One of my teammates configured cassandra replication for 2 but I only had one running in my local.

To fix it, I’ve moved that cassandra replication of 2 to our application.prod.conf (so that the dev application.conf is just 1).

Thanks!

dsheth28 · April 15, 2019, 9:44am

@aklikic @franz : I have one question regarding Journal writes. We are testing behaviour of write-retries and read-retries of Cassandra-journal and the question is whether it works as intended in-case of NoHostavailable exceptions? We are following below test steps and we are seeing journal writes fails. [since Persistent Actor dies]

Scenario:
1)Start the application (this will cause the keyspaces to be created)
2)Send event [ which started Persistent actor]
3)Event processed and seeing data in persistence table
4)Kill cassandra
5)Send event
6) Wait for a minute and start cassandra - [write failed no data in Persistence table for new event] [seeing NoHosatAvailable exception]

Note :There is only one Caasnadra node.

What my understading is regarding write-retries is : It will try to attempt write operation in Casandra DB no.of times whatever value we set of write-retries [i.e. write-reties = 12345].
Please correct me if my understanding is wrong and provide your thoughts on this.

[ Consider me as new bie in this area :) ]

TimMoore · April 15, 2019, 10:22pm

@dsheth28 when you restart the Cassandra node, is it running on the same or a different IP address and port? If the IP address changes, then it sounds like the same issue being discussed in this other topic:

and this issue:

github.com/akka/akka-persistence-cassandra

Reconnect to Cassandra after IP address changes

opened 02:38PM - 11 Apr 19 UTC

closed 09:43AM - 13 Feb 20 UTC

ennru

0 - new

### Short description Especially when running Cassandra in Kubernetes a pod r…estart will change Cassandra's IP address. Could Akka Persistence Cassandra make a new service lookup when that happens? ### Details _Added by @TimMoore 18 Apr 2019_ There are a few layers to the problem. Lagom interfaces with Akka Persistence Cassandra by implementing a custom `SessionProvider` that looks up the Cassandra contact points in the `ServiceLocator` (essentially a DNS SRV query in Kubernetes) and uses the results to build the Cassandra `Session`. That means that by the time the driver is initialized, it is getting a list of resolved `InetSocketAddress` instances (essentially IP:port pairs). Kubernetes `StatefulSet` does not have the ability to reuse IPs of deleted pods when replacing them (https://github.com/kubernetes/kubernetes/issues/28969). This means that after a shutdown and restart of the Cassandra cluster, the IP addresses that the Cassandra `Session` was built with are stale. `StatefulSet` pods _do_ have stable host names, but using these doesn't help, either, because the Cassandra driver caches the resolved addresses internally ([JAVA-1522](https://datastax-oss.atlassian.net/browse/JAVA-1522)). Version 4.0 of the driver offers an option to store the unresolved addresses ([JAVA-1978](https://datastax-oss.atlassian.net/browse/JAVA-1978)) if the `Session` is created that way. We are planning to refactor the session management in Akka Persistence Cassandra by extracting that to Alpakka Cassandra. As part of that work, we also want to integrate support for Akka Service Discovery directly, so that Lagom can use that in the future, and everyone else using Alpakka Cassandra or Akka Persistence Cassandra can also get automatic discovery. However, it will experience the same problem unless we make other changes. In Akka Persistence Cassandra, once it builds a `Session`, it keeps it and reuses it forever, so AFAICT the only way to get it to re-resolve is to restart the `ActorSystem` (effectively requiring a full service restart in Lagom). Without doing a detailed investigation, I think the cleanest way to solve all of this would be to change our `CassandraSession` wrapper to (optionally?) detect `NoHostAvailableException` failures in the underlying session, then dispose and recreate the session from the `SessionProvider` (and retry executing the statement?)

The problem is that, when the Cassandra cluster completely shuts down and then comes back up on unknown IP addresses, the Cassandra driver is unable to reconnect without restarting.

dsheth28 · April 22, 2019, 8:13am

@TimMoore After restarting Cassandra it is running on same IP address and port.

TimMoore · May 1, 2019, 8:21am

I can’t explain why you would get a NoHostAvailableException if it is trying to connect to a host that is actually available. Maybe a timing issue while it’s still starting?

Topic		Replies	Views
BusyPoolException in Lagom with AWS Cassandra Lagom scala	5	1450	March 9, 2021
Lagom service initialization problem in docker swarm Lagom	1	901	July 1, 2018
What is the right way to configure cassandra in prod Lagom Persistence API	1	1543	July 6, 2018
Error running the reference hello Lagom application for Scala Lagom Development Mode scala , kafka	15	4464	March 26, 2018
Lagom with MySQL db connection is not working Lagom Persistence API	1	1718	December 21, 2018

NoHostAvailableException

Related Topics