Garbage collection and actors

Hello! First post. I have created a pattern where I am spawning an actor to aggregate results from other actors. Once its task is completed, it sends the results back to requesting actor and terminates. The system handles a large volume of these requests, which result in a large number of these actors being created and destroyed. I am concerned with garbage collection of a large number of these actors and the impact on performance due to stop the world collections. Questions: 1. is there an elegant way to recycle/reuse these actors and avoid generating so many of them? 2. When an actor is shut down, is it garbage collected or are there some conditions where it might not be?

If you can model aggregation as an actor instance you could likely also model it as a state machine inside of a single actor and use a pool of them. If that would give much less GC overhead depends a lot on the type of aggregation they do and how many other objects are created during each aggregation, actors are relatively light weight but if you could represent state with for example two primitive fields that would definitely be less gc intense than allocating and starting a new actor for each operation. Make sure to measure (jmh can for example give you a nice allocation rate metric).

It will essentially become a pool so you will have to implement logic in a parent actor for starting n aggregates and distributing work over them, and handle all aggregates being busy etc.

When the actor it has completely stopped the actor system will no longer have any references to it, so the same rules for GC as for other objects will apply. Since all other access to the actor should be through it’s ActorRef which is decoupled from the actual instance the only case of other objects having access would be antipatterns such as some thread/future etc accessing the internals of the actor.

Thanks, this is helpful. I have been trying to run two separate, yet identical actor “clouds” running on the same server in an attempt to reduce the impact of GC and use a remote router to distribute jobs to them. My thinking is to have several distinct clouds of actors working so that if any one of them goes into a GC stop-the-world cycle then the work will be redirected to the others, especially if I use a shortest queue type logic on the remote router. The problem is that these separate groups appear to GC together. Is the Java memory model sharing memory between the JVMs running the clouds of actors? Or is this just a coincidence?

GC is completely JVM local, so that would be coincidental unless heap have the same size and allocations are identical I guess.

Are you sure it is GC that is the problem? Sharing the same server could cause both (or all) processes to starve at the same time if they share CPU/IO and that becomes a bottleneck. If you haven’t already you can enable GC logging with -XX:+PrintGCDetails on JDK 8 or -Xlog:gc*:file=/some/path on JDK 9+

To me it sounds more like you are allocating too much memory to the JVM (Xmx), which causes the machine to use to swap memory. This causes the stop-the-world scenario that you are describing. I’ve experienced this.

Just a guess of course. But I think it’s unlikely that it is actors or the garbage collector alone that “stops the world”. As long as properly configured, they shouldn’t cause worries unless you are doing some low-level optimizations.

Thanks. I’m running a video analytics app. This particular instance in processing over a million frames an hour. I’ve got Xmx at 64G and Xms at 32G and a bunch
of other settings recommended by a friend. I reduced the Xmx to 32G, 24G and 18G and am still seeing this issue. Anything below 18G results in more pauses. Is there a good resource out there that provides tips on tuning JVMs based on JVM logs? I’m running
Ubuntu 18.04 server and Java 11 Corretto.

image001.jpg

I’ve got Xmx at 64G and Xms at 32G

How much memory does the machine have?

256G

image001.jpg

I ran the system, creating a JVM log file, and then processed the log at gceasy.io and it showed no issue that would explain the long (30-60 sec) stop the world
events I am seeing 4 times an hour. The analysis indicated that the GC appeared to be healthy and there are no major memory leaks. So it doesn’t look like it is being caused by GC. Do you have any other suggestions of other ways to see which system resources
might be being temporarily starved to cause this?

image001.jpg