Hello,
I recently switched to using Artery from netty.tcp.
I ran into this error : Insufficient usable storage for new log of length=50335744 in /dev/shm (tmpfs).
After searching for a solution, I understood that this is because I am using the defaults to start the media driver, which will be inside the same JVM of the actor. As described at AKKA documentation, it is better to start the driver externally and have it shared among actors.
I followed the instructions in the documentation, I even used the same example configuration.
Now the problem is that my actors start and then after a little time they all fail with an exception:
[ERROR] [04/24/2018 20:49:49.606] [aeron-client-conductor] [akka.remote.artery.aeron.ArteryAeronUdpTransport(akka://ActorClusterSystemName)] Fatal Aeron error ConductorServiceTimeoutException. Have to terminate ActorSystem because it lost contact with the external Aeron media driver. Possible configuration properties to mitigate the problem are 'client-liveness-timeout' or 'driver-timeout'. io.aeron.exceptions.ConductorServiceTimeoutException: Exceeded (ns): 5000000000
io.aeron.exceptions.ConductorServiceTimeoutException: Exceeded (ns): 5000000000
at io.aeron.ClientConductor.checkServiceInterval(ClientConductor.java:745)
at io.aeron.ClientConductor.onCheckTimeouts(ClientConductor.java:720)
at io.aeron.ClientConductor.service(ClientConductor.java:659)
at io.aeron.ClientConductor.doWork(ClientConductor.java:151)
at org.agrona.concurrent.AgentRunner.doDutyCycle(AgentRunner.java:233)
at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:159)
at java.lang.Thread.run(Thread.java:748)
[ERROR] [04/24/2018 20:49:49.963] [aeron-client-conductor] [akka.remote.artery.aeron.ArteryAeronUdpTransport(akka://ActorClusterSystemName)] Aeron error, org.agrona.concurrent.AgentTerminationException
org.agrona.concurrent.AgentTerminationException
at io.aeron.ClientConductor.doWork(ClientConductor.java:148)
at org.agrona.concurrent.AgentRunner.doDutyCycle(AgentRunner.java:233)
at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:159)
at java.lang.Thread.run(Thread.java:748)
and then after many undelivered messages:
[INFO] [04/24/2018 20:49:52.270] [ActorClusterSystemName-akka.remote.default-remote-dispatcher-8] [akka://ActorClusterSystemName@xx.xx.xx.xx:2559/system/remoting-terminator] Remote daemon shut down; proceeding with flushing remote transports.
Other actors in the cluster display this error message:
[ERROR] [04/24/2018 20:52:27.468] [ActorClusterSystemName-akka.actor.default-dispatcher-18] [akka://ActorClusterSystemName@xx.xx.xx.xx:2557/] swallowing exception during message send
java.lang.IllegalStateException: Aeron client conductor is closed
at io.aeron.ClientConductor.ensureOpen(ClientConductor.java:635)
at io.aeron.ClientConductor.addPublication(ClientConductor.java:367)
at io.aeron.Aeron.addPublication(Aeron.java:247)
at akka.remote.artery.aeron.AeronSink$$anon$1.<init>(AeronSink.scala:103)
at akka.remote.artery.aeron.AeronSink.createLogicAndMaterializedValue(AeronSink.scala:100)
at akka.stream.impl.GraphStageIsland.materializeAtomic(PhasedFusingActorMaterializer.scala:630)
at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:450)
at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:415)
at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:406)
at akka.stream.scaladsl.RunnableGraph.run(Flow.scala:588)
at akka.remote.artery.Association.runOutboundOrdinaryMessagesStream(Association.scala:710)
at akka.remote.artery.Association.$anonfun$runOutboundOrdinaryMessagesStream$3(Association.scala:720)
at akka.remote.artery.Association.$anonfun$attachOutboundStreamRestart$1(Association.scala:814)
at akka.remote.artery.Association$LazyQueueWrapper.runMaterialize(Association.scala:89)
at akka.remote.artery.Association$LazyQueueWrapper.offer(Association.scala:93)
at akka.remote.artery.Association$LazyQueueWrapper.offer(Association.scala:84)
at akka.remote.artery.Association.send(Association.scala:379)
at akka.remote.artery.ArteryTransport.send(ArteryTransport.scala:714)
at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:574)
at akka.actor.ActorRef.tell(ActorRef.scala:124)
at akka.actor.ActorSelection$.rec$1(ActorSelection.scala:265)
at akka.actor.ActorSelection$.deliverSelection(ActorSelection.scala:269)
at akka.actor.ActorSelection.tell(ActorSelection.scala:46)
at akka.actor.ScalaActorSelection.$bang(ActorSelection.scala:280)
at akka.actor.ScalaActorSelection.$bang$(ActorSelection.scala:280)
at akka.actor.ActorSelection$$anon$1.$bang(ActorSelection.scala:198)
at akka.cluster.ClusterCoreDaemon.gossipTo(ClusterDaemon.scala:1285)
at akka.cluster.ClusterCoreDaemon.gossip(ClusterDaemon.scala:1009)
at akka.cluster.ClusterCoreDaemon.gossipTick(ClusterDaemon.scala:972)
at akka.cluster.ClusterCoreDaemon$$anonfun$initialized$1.applyOrElse(ClusterDaemon.scala:484)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at akka.cluster.ClusterCoreDaemon.aroundReceive(ClusterDaemon.scala:288)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke_aroundBody0(ActorCell.scala:557)
at akka.actor.ActorCell$AjcClosure1.run(ActorCell.scala:1)
at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
at akka.kamon.instrumentation.ActorMonitors$$anon$1.$anonfun$processMessage$1(ActorMonitor.scala:123)
at kamon.Kamon$.withContext(Kamon.scala:120)
at akka.kamon.instrumentation.ActorMonitors$$anon$1.processMessage(ActorMonitor.scala:123)
at akka.kamon.instrumentation.ActorCellInstrumentation.aroundBehaviourInvoke(ActorInstrumentation.scala:45)
at akka.actor.ActorCell.invoke(ActorCell.scala:550)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at kamon.executors.Executors$InstrumentedExecutorService$$anon$7.run(Executors.scala:270)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
The media driver seem to be working fine! I checked the log files in the shm directory and there are no errors in the loss-report.dat or other files.
I have increased both net.core.rmem_max and net.core.wmem_max to 4194304, I also set java Xms to 1024M. I am running 7 actors in the cluster.
This is my artery configuration:
akka.remote {
log-remote-lifecycle-events = off
maximum-payload-bytes = 15 MiB
artery {
enabled = on
transport = aeron-udp
canonical.hostname = "127.0.0.1"
canonical.hostname = ${?HOST}
canonical.port = ${PORT}
advanced {
maximum-large-frame-size = 15 MiB
send-buffer-size = 15 MiB
receive-buffer-size = 15 MiB
maximum-frame-size = 15 MiB
outbound-message-queue-size = 2480000
aeron-dir = /dev/shm/aeron
embedded-media-driver = off
}
}
}
Akka version: 2.5.12. Scala version: 2.12.5. Linux distribution: debian v9. I added aeron-driver-1.7.0.jar, aeron-client-1.7.0.jar, and agrona-0.9.12.jar to classpath when I start the external shared MediaDriver.
I would appreciate if you can point me to how I can fix these errors.
Thanks