Why does Akka version take more time than the multithreaded Java application

I am interested to know how Akka compares with Multithreaded Java applications performance-wise.

So, I have developed a simple event processing application using both the approaches, Akka and Multithreaded Java application. Presently, it just prints the events. It would be enhanced later to write events into a database, suppress events, filter events, throttle events, etc.

I found that the Akka version consumes more time than the multithreaded Java application.

Please clarify why Akka version takes more time than the multithreaded Java application.

Is it anything to do with whether the application is CPU intensive or IO intensive?

Akka multiprocessing is based by default on the number of processing cores that Akka detects in the system. This works well for cpu-bound tasks primarily, but less well for IO bound tasks. Additionally there is message passing overhead that can become significant.

Are you able to share your test application?

1 Like

Thank you Stephen. The test code is available at GitHub - murthy-nn/EventProcessing: Event Processing using two approaches, Akka and Java Multithreading

I won’t claim to be an expert in Akka performance, and hopefully someone more experienced will contribute and maybe correct where I make a mistake or two here.

First, as mentioned before, Akka does have overhead. You’re paying a bit of a premium for Akka’s safety protections, things like supervision, as well as the message-passing and mailbox model. In exchange you get those safety protections, things like actor restarting and such, you also gain the flexibility of things like Akka Cluster which can help you distribute your workload across more nodes almost seamlessly. The shortest answer is, Akka takes longer because it’s doing more.

That said, your test is probably not large enough to make a meaningful benchmark, at only 100 events. That’s not likely to be enough time for HotSpot to start optimizing.

Finally there are a few oddities in your akka test code that might be making it a bit unfair.

On line 59 of EmsManagerBehavior.java, you request a watch on the router, within the loop, for every single alarm. (incidentally, as you are the parent of the router, IIRC you’re automatically watching it anyway). Also you don’t seem to do anything with the watch.

I feel like you’re misusing the router concept a bit there, having it create 100 workers that each will only serve one message. As a general guideline, Akka thinks of a worker actor behind a router as an agent that can have work sent to it repeatedly, the role of the router is to distribute that work evenly (or following a different strategy if desired). Creating each of those worker actors is more expensive than just assembling an object, as the workers have to have mailboxes created, and sending messages to them requires extra logic for which one to send to, when connected to a router like that.

There’s a few other places where you seem to be simply doing more work in the Akka side, but without doing a deeper dive than I have the time for, I can’t pick into those too much.

As a final piece of advice, for the kind of workload you seem to be trying to perform: an incoming stream or batch of events to process, you may want to look more at Akka Streams than Akka actors. Akka streams is focused on throughput while managing memory resources through backpressure, and can still perform tasks in parallel. Still, nothing a framework does will ultimately execute faster than carefully assembled, purpose-built code, what it might be is faster to write, easier to understand, or safer.

2 Likes

Thank you Stephen for a detailed response. Appreciate it.

I had tested with different worker size (10, 100, etc) and different alarm size (100, 1000, 10k).

Thanks for catching the unnecessary watch done in a loop inside EmsManagerBehavior. Let me remove it.

Let me retest and publish the result.

When I reported the issue, I had debug logs in all the three FOR loops (while generating the alarms, processing the alarms, and alarm time reporting). So, for N alarms, the code was generating 3*N logs.

I commented on all the logs present inside the FOR loops. Essentially, the code does not generate 3*N logs. Now, the Akka takes very less time compared to the Multithreaded Java application for alarm loads of 10k and above. Is this due to fewer logs resulting in less disk operation? However, Akka takes more time for less load, mostly due to its overhead.

Akka’s logger is also asynchronous in nature, though it’s ultimately pluggable to be backed by several other engines. I’m not sure why it’s overhead is adding up so much. That said, I’m surprised any version of Akka would be faster than the hand rolled threaded model.

Akka sets its number of threads automatically based on the number of CPU cores you have. It’s possible it’s running a more optimal number of threads for your system than the fixed 10 that the Java side uses.