How to use local memory transport message from A to B actor?

I have two actor: objManagerActor and objActor, send message to objManagerActor, then objManager dispatch message to objActor, I am computing TPS in objActor and result is about 2.4 millions (default-dispatcher thread num is 2), I want to improve the TPS.

It seems that akka transport message with TCP, how to improve the throughput?

Hi 0x3E6

There is no networking involved in local actor messaging, for a cluster TCP or UDP can be used but Akka does not use RMI at all, what you see is likely because of connecting a debugger or profiler to the JVM.

To get a baseline of what you could potentially reach on the hardware you are running your app, you can run this JMH benchmark with sbt after first checking out the Akka repo and then updating the number of cores in the test to match your machine (running jmh from the sbt shell with akka-bench-jmh/jmh:run -f 1 -w5 -i5 .*ActorBenchmark.*). Reading through the benchmark actors can also give you a hint about what kind of message flow you’d have to have to reach such numbers and perhaps compare to your own flow for insights:

For the record, on my current Apple M1 Max, with JDK 17, the bench does ~250m messages per second. A real application that actually does something of business value rather than just ping-ponging messages will not likely reach such a high number though.

1 Like

Ok, thank you! I’ll try the benchmark. If there are suggestions on the number of Actors, CPU, and memory configuration in the document, it would be great.

Since Akka is a very general purpose library that can be used to implement any number of very different use cases it is hard to give any general recommendation other than to carefully think about the message flows, what actors may be bottle necks, how heavy each type of actor will be.

I’d argue that it is relatively seldom that Akka actors or the message passing will be the overhead that is problematic when implementing systems that solve actual business problems. In cases where it is, the granularity may be too fine and each actor is doing so little actual work, and or is so short lived, that the time spent starting, stopping and passing messages dominates what the CPU cycles are spent on.

It looks like you sayed ,the ActorRef.tell spend many time because of the granularity may be too fine. I created ten thousand actors in my 4 core computer.
When the number of actors far exceeds the number of machine cores, will akka performance decrease quickly?

No. There is effectively no correlation between # of actors and performance. I’ve run lots of tests with millions of actors and 4 cores.

However, as I said, if each actor does very little work in response to every message, the logic to enqueue messages in a mailbox in a thread safe fashion may very well dominate what the CPU cycles are used for. If you’d run a profiler against the message sending benchmark I mentioned earlier, you will likely see that most of the time is spent in tell, since all the actors does is ping-pong:ing messages.

Note that profiling and benchmarking is hard, and it is easy to draw the wrong conclusions, so make sure to verify your findings carefully. For example changing your logic and re-running your benchmark to see that it changes in the way you expected it to. It is also important to do warmup of the JVM before running the actual benchmarking so that you get numbers for a system that has been running for a while (with the hot paths JIT:ed).

1 Like

Ok. I use Akka as a stream processing framework because it is more convenient to program custom complex logic than stream processing engines such as Flink. In each Actor, some incremental aggregation operations will be performed on every message, and the results may be output.

In this scenario, can the throughput be improved only by optimizing the mailbox?

Fusing together multiple aggregation operations into one actor so that it does more would be one thing to consider. This is how Akka Stream works behind the scenes, operators are fused together into a single actor avoiding passing messages between if not strictly needed.

1 Like

improved only by optimizing the mailbox?

I’m not sure what you would even mean by this. Effectively, (excluding details such as the interfaces) the default mailbox is a queue. There are fancier options for mailboxes, such as bounded mailboxes and prioritized mailboxes, but the default mailbox is essentially a simple queue and as “optimized” as conceivable.

I’m in agreement with Johan, I think you really need to start by evaluating your profiling/benchmarking. I’m definitely concerned that you are using most of your CPU on your profile tools than you are on executing work.

What are the results of the ActorBenchmark on your hardware if you just run it without changes? (You give no indication of the size of your hardware.) That should give you an idea of the upper limit of your TPS in terms of actor messages on that hardware. That should be a pretty absurdly high number, but if it’s still not high enough your option would be to fuse work together so that it requires fewer messages. As Johan mentions, Akka Streams will do some of that for you automatically, but you could also do it manually in your application logic.

One actor only running on one thread, in your case, you may want to incr your actor which improve your concurrency(depend on your cpu core size).

The most important thing is understand the bottleneck. What is the throughput of objManagerActor and objActorrespectively? In your case with only two actors, this is equivalent to a producer-consumer pattern with two threads. Akka no much performance magic on this case.