What happens after system.terminate() and main completes?

I have a stand-alone integration written mostly in java+akka that uses akka.testkit.ExplicitlyTriggeredScheduler.

The following stops quickly as I expect:

public class Simulator {
  public static void main(String[] args) {
    // ...
    system.terminate();
    System.exit(0); // quickly exits
  }
}

But if I remove the System.exit(0) the system takes about 1 minute after the last INFO message that the scheduler is stopped. What is happening in this 1 minute? How can I trace any hooks akka might have after main is done?

I think I would start by enabling debug logging - in particular I think that should show you which phases of ‘coordinated shutdown’ are executed, and that might give you a hint as to which part might be holding things up.

This minimal example shows an unexpectedly long run time – on my system – about 43 seconds:

package example;

import com.typesafe.config.Config;
import com.typesafe.config.ConfigFactory;
import akka.actor.typed.ActorSystem;
import akka.actor.typed.javadsl.Behaviors;
import akka.actor.testkit.typed.javadsl.ManualTime;
import java.time.Duration;

public class Simulator {

  public static void main(String[] args) {
    String configString = String.join("\n",
         "akka.scheduler.implementation = akka.testkit.ExplicitlyTriggeredScheduler"  // ManualTime.scala
        ,"akka.stdout-loglevel = \"INFO\""
        ,"akka.loglevel = \"INFO\""
    );

    Config config = ConfigFactory.parseString(configString);  // .withFallback(ConfigFactory.load()) )
    ActorSystem system = ActorSystem.create(Behaviors.setup(
      context -> {
        context.getLog().info("Setting up initial behavior");
        return Behaviors.empty();
      }
    ), "sim-system", config);

    ManualTime manualTime = ManualTime.get(system);
    // manualTime.timePasses(Duration.ofSeconds(12));
    system.log().info("Before end of main");
  }
}

Here’s what I have in my output:

$ mvn exec:java
[INFO] Scanning for projects...
[INFO] 
[INFO] ----------------------------< example:sim >-----------------------------
[INFO] Building Simulator 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ sim ---
16:14:18.535 [sim-system-akka.actor.default-dispatcher-2] INFO  akka.event.slf4j.Slf4jLogger - Slf4jLogger started
SLF4J: A number (1) of logging calls during the initialization phase have been intercepted and are
SLF4J: now being replayed. These are subject to the filtering rules of the underlying logging system.
SLF4J: See also http://www.slf4j.org/codes.html#replay
16:14:18.556 [example.Simulator.main()] INFO  akka.actor.typed.ActorSystem - Before end of main
16:14:18.570 [sim-system-akka.actor.default-dispatcher-2] INFO  example.Simulator - Setting up initial behavior
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  43.448 s
[INFO] Finished at: 2021-07-06T16:15:00-04:00
[INFO] ------------------------------------------------------------------------

Most of the 43 seconds of runtime is spent after 16:14:18.570 ... Setting up initial behavior

How can I increase the logging level here to find out what is going on during that time? The version of scala is 2.13.6, akka is 2.6.15.

Edit: fix akka version

I added the following:

import java.lang.reflect.Field;
// ...
    system.terminate(); // try with and without this
    try {  // See:  https://stackoverflow.com/a/6865509/6296589
      Class clazz = Class.forName("java.lang.ApplicationShutdownHooks");
      Field field = clazz.getDeclaredField("hooks");
      field.setAccessible(true);
      Object hooks = field.get(null);
      System.out.println(hooks); //hooks is a Map<Thread, Thread>
    } catch (Exception e) {
      System.out.println("Exception");
    }

Edit: when run from Maven I see:

{Thread[Thread-0,5,main]=Thread[Thread-0,5,main], Thread[Thread-1,5,main]=Thread[Thread-1,5,main]}

Otherwise, with java -jar

{}

The following is my thread state during the shutdown wait (after calling system.terminate()):

10:34:29.044 [main] INFO  akka.actor.typed.ActorSystem - Before end of main
{Thread[sim-system-shutdown-hook-1,5,main]=Thread[sim-system-shutdown-hook-1,5,main]}
2021-07-07 10:34:49
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.161-b12 mixed mode):

"sim-system-akka.actor.internal-dispatcher-7" #19 prio=5 os_prio=0 tid=0x00007f56c814c000 nid=0x3cc9 waiting on condition [0x00007f571fdfe000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c6c242b8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"sim-system-akka.actor.internal-dispatcher-6" #18 prio=5 os_prio=0 tid=0x00007f56c0009800 nid=0x3cc8 waiting on condition [0x00007f57241f9000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c6c242b8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"DestroyJavaVM" #17 prio=5 os_prio=0 tid=0x00007f574400c800 nid=0x3c93 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"sim-system-akka.actor.default-dispatcher-5" #15 prio=5 os_prio=0 tid=0x00007f56bc003000 nid=0x3cc7 waiting on condition [0x00007f57244fa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c6c23eb8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"sim-system-akka.actor.default-dispatcher-4" #14 prio=5 os_prio=0 tid=0x00007f56c4008800 nid=0x3cc6 waiting on condition [0x00007f57247fb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c6c23eb8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"sim-system-akka.actor.internal-dispatcher-3" #13 prio=5 os_prio=0 tid=0x00007f574491e000 nid=0x3cc5 waiting on condition [0x00007f57248fc000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006c6c242b8> (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"Service Thread" #10 daemon prio=9 os_prio=0 tid=0x00007f5744368000 nid=0x3cc0 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread3" #9 daemon prio=9 os_prio=0 tid=0x00007f574433b000 nid=0x3cbf waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread2" #8 daemon prio=9 os_prio=0 tid=0x00007f5744339000 nid=0x3cbe waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #7 daemon prio=9 os_prio=0 tid=0x00007f5744334800 nid=0x3cbd waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #6 daemon prio=9 os_prio=0 tid=0x00007f5744332800 nid=0x3cbc waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Monitor Ctrl-Break" #5 daemon prio=5 os_prio=0 tid=0x00007f5744329800 nid=0x3cbb runnable [0x00007f572cd17000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	- locked <0x00000006c6c39a90> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.readLine(BufferedReader.java:324)
	- locked <0x00000006c6c39a90> (a java.io.InputStreamReader)
	at java.io.BufferedReader.readLine(BufferedReader.java:389)
	at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:64)

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f5744209000 nid=0x3cba waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f57441d6000 nid=0x3c9e in Object.wait() [0x00007f572d317000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000006c6c4b448> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
	- locked <0x00000006c6c4b448> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f57441d1000 nid=0x3c9d in Object.wait() [0x00007f572d418000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000006c6c3b800> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x00000006c6c3b800> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"VM Thread" os_prio=0 tid=0x00007f57441c9800 nid=0x3c9c runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f5744021800 nid=0x3c94 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f5744023800 nid=0x3c95 runnable 

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f5744025000 nid=0x3c96 runnable 

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f5744027000 nid=0x3c97 runnable 

"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f5744028800 nid=0x3c98 runnable 

"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f574402a800 nid=0x3c99 runnable 

"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f574402c000 nid=0x3c9a runnable 

"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f574402e000 nid=0x3c9b runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f574436a800 nid=0x3cc1 waiting on condition 

JNI global references: 711

Heap
 PSYoungGen      total 74752K, used 12175K [0x000000076cf00000, 0x0000000772200000, 0x00000007c0000000)
  eden space 64512K, 18% used [0x000000076cf00000,0x000000076dae3c78,0x0000000770e00000)
  from space 10240K, 0% used [0x0000000771800000,0x0000000771800000,0x0000000772200000)
  to   space 10240K, 0% used [0x0000000770e00000,0x0000000770e00000,0x0000000771800000)
 ParOldGen       total 72192K, used 5762K [0x00000006c6c00000, 0x00000006cb280000, 0x000000076cf00000)
  object space 72192K, 7% used [0x00000006c6c00000,0x00000006c71a0830,0x00000006cb280000)
 Metaspace       used 21888K, capacity 22792K, committed 23040K, reserved 1069056K
  class space    used 2877K, capacity 3003K, committed 3072K, reserved 1048576K


Process finished with exit code 0

It seems that the long delay before termination is happening due to unexecuted tasks inExplicitlyTriggeredScheduler. The following is a another minimal example that seems to resolve the issue:

package example;

import java.lang.reflect.Field;
import java.lang.management.ManagementFactory;
import com.typesafe.config.Config;
import com.typesafe.config.ConfigFactory;
import akka.actor.typed.ActorSystem;
import akka.actor.typed.ActorRef;
import akka.actor.typed.javadsl.Behaviors;
import akka.actor.testkit.typed.javadsl.ManualTime;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
import scala.concurrent.Await;
import scala.concurrent.duration.FiniteDuration;

public class Simulator {

  public static void main(String[] args) throws InterruptedException {
    // Create ActorSystem and top level supervisor
    String configString = String.join("\n",
         "akka.scheduler.implementation = akka.testkit.ExplicitlyTriggeredScheduler"  // ManualTime.scala
        ,"akka.stdout-loglevel = \"DEBUG\""
        ,"akka.loglevel = \"DEBUG\""
        // ,"akka.coordinated-shutdown.run-by-actor-system-terminate = \"off\"" // also works; no need for Await
    );
    final String runByActorSystemPath = "akka.coordinated-shutdown.run-by-actor-system-terminate";
    Config config = ConfigFactory.parseString(configString);  // .withFallback(ConfigFactory.load()) )
    ActorSystem<Void> system = ActorSystem.create(Behaviors.setup(
      context -> {
        context.getLog().info("Setting up initial behavior");
        return Behaviors.empty();
      }
    ), "sim-system", config);
    // system.logConfiguration();
    ManualTime manualTime = ManualTime.get(system);

    if ((!config.hasPath(runByActorSystemPath) || config.getBoolean(runByActorSystemPath))) {
      // Is there a race condition?  Here I call timePasses twice, but with testKit/ManualTime,
      // a single call seems sufficient, e.g.,
      //   testKit.shutdownTestKit(); // awaits termination
      //   manualTime.timePasses(Duration.ofSeconds(2));

      system.terminate(); // calls finalTerminate();
      manualTime.timePasses(Duration.ofSeconds(60));  // First call after terminate and before Await
      try {
        Await.ready(system.whenTerminated(), new FiniteDuration(5, TimeUnit.SECONDS));
      } catch (final Exception e) {
        system.log().info("Exception while waiting", e);
      }
      manualTime.timePasses(Duration.ofSeconds(60));  // Second call after Await
    } else {
      system.terminate();
      manualTime.timePasses(Duration.ofSeconds(60)); // 60 is arbitrary
    }
    system.log().info("Before end of main");
  }
}

I found that when using ActorTestKit.create(ManualTime.config()); only a single call to manualTime.timePasses appears needed to prevent the long (~40s) timeout.

Ah, I’m sorry, I missed this the first time around. This scheduler isn’t really intended for integration tests - and even for unit tests only for those tests where you really want specific control over the triggering of the scheduler.

I would recommend using a ‘regular’ scheduler for your integration tests. That said, you might see largely the same behavior with a regular scheduler:

That example doesn’t have a system.terminate() in there, so it wouldn’t terminate at all ;) - but indeed after adding system.terminate() it doesn’t terminate immediately, so let’s see what’s going on.

change INFO to DEBUG in your configString. However, I tried and it doesn’t reveal anything particularly interesting: the system seems to terminate fine. The next step for me was to print when Akka thinks it is terminated:

    system.getWhenTerminated().handle((a, b) -> {
	    System.out.println("Terminated.");
	    System.out.println("Successful: " + a);
	    System.out.println("Failure: " + b);
	    return null;
    });

This gave me the instant result:

Terminated.
Successful: Done
Failure: null

… in other words, Akka thinks it is done, but the JVM is still running for 15 seconds or so. It would be interesting to look into what’s going on in more detail, but if you’re confident that there shouldn’t be anything running after the actor system has shut down, you can add akka.coordinated-shutdown.exit-jvm = true to your configuration to make sure the JVM is ‘actively’ shut down.

I am aware of this. I’d like to be able to run perhaps ~1000 integration tests each taking wall-clock time of about hour before a release. If there are other known pitfalls in using an explicitly triggered scheduler with calling thread dispatcher, I would like to know about them.

At this time, I think what was happening is related to the handling of Dead Letters with the explicitly triggered scheduler.