On the positive impacts of Project Loom and Akka

I’m curious as to the thinking of how the forthcoming Project Loom may positively impact Akka, and in particular, its actors and the way they are developed.

In short, Project Loom encourages the use of blocking code because it is cheap to do so while promoting the ability to reason what is happening. Within an actor’s receiver, we presently discourage blocking code because blocking is expensive and can lead to thread starvation at the expense of making our code less easy to reason. So if blocking code becomes cheap given the use of virtual threads, then I’m presuming that it will become encouraged within an Actor’s receiver implementation.

By way of an example, let’s take the ask pattern. I can presently make a request of another actor, map its future reply message to a message my actor can understand i.e. adapt, and then pipe the latter message to my actor. With Project Loom and the use of virtual threads for my actor, I can envisage surrounding my ask invocation’s future with an Await.result which will avoid thread starvation. In fact, any function/method that returns a Future could be a candidate for Await.result.

I’m sure there are also opportunities to provide a blocking flavour of ask that avoids having to use Await.result, but that’s just a refinement.

I’m very interested to hear of this forum’s thoughts on the above. Thanks for the discussion.

3 Likes

There are many aspects of when and where Loom can be useful in Akka. I haven’t thought much about it yet.

Regarding blocking in the actor there is one important difference. While blocking, also with Loom, the actor will not be able to process any other messages. It becomes unresponsive. Incoming messages will be queued in the mailbox. That can be a bad or good feature depending on context.

2 Likes

Thanks Patrik! I should have included my thoughts on the fact that the actor is unable to process more messages while waiting on the reply. I acknowledge that there are use-cases where you would want to continue processing other messages while having a request/reply (ask) in-flight. For me though, I’ve often found that my actor will transition to some “reply-pending” state while waiting for an ask reply. If my actor receives more messages in this state, I tend to stash them thereby transferring messages from one queue to another. So perhaps my thinking here then is more toward the ask/stash scenario and the leveraging of blocking. Thanks again.

I agree there are many aspects to this. One dividing line is whether we expect any user-level programming model changes (as the one that you mention with blocking in an actor) vs. implementation level benefits.

Just looking on the user-level changes, I don’t see the actor model as a good fit for allowing blocking. After all an actor as we have it in akka is basically mutable state that can be accessed by putting messages into a single mailbox for the actor. It shines when you need exactly that managed access to mutable state.

As Patrik said, the problem of blocking an actor is an actual problem. This is not just incidental but the performance /efficiency benefit of asynchronous execution directly comes from being able to react even if some other process is ongoing. If a use case doesn’t require that interleaving, probably another programming is more suitable than actors.

(That said, I understand that sometimes you really have that problem of having to send requests to third-parties and waiting for results before progressing. It’s unfortunate indeed that you need to resort to stashing to be able at all to receive an answer from that third party. We probably provide better APIs for this use case, as long as you allow unbounded mailboxes anyway.)

Viktor shared another view on that problem with just allowing blocking in this comment: https://twitter.com/viktorklang/status/1262000679360618496

how do I know whether a method-call will unschedule the current virtual thread?

For any kind of asynchronous process, it’s a problem if user code is always allowed to block/suspend because it might block this asynchronous process (a generalization from an actor) from making progress. One tempting “solution” might be to enable the process to capture that unscheduling and manage the continuation of that thread itself. However, that has potential downsides if the suspended stack acquired other resources, or read mutable state, etc. The problem is that if you just allow blocking, any piece of code must always expect that any function it calls might suspend a thread.

So, in very real terms this has similar implications as going back to single-threaded, blocking code which, of course, works, but has the well-known efficiency problems.

For that (and other) reasons, languages and frameworks usually require that potentially suspending functions are marked as such. E.g. go func in go, async fn in Rust, : Future[_], or : Task[] etc in Scala. All these require potentially async functions to be marked as such. That’s also a learning from typed functional programming in general: if you require marking the main operational properties with types, it will force programmers to more cleanly separate pure computational parts and parts with operational consequences (like talking to a database).

Coming back to the original problem of having to wait for a answer from a particular third party in an actor. The problem with the actor model here is that there’s only a single mailbox that receives all messages from all communication partners. This particular problem could be solved by having separate channels for talking to separate parties (like in go). So, there’s a lot of potential for coming up with about alternative programming models that could better fit certain use patterns.

Stack based continuation models like threads or fibers are great when you have many parallel linear execution threads (that may be intermediately suspended) that share nothing. For those kinds of instances (like a stateless web server that talks to a DB), it’s a great and efficient model. All the tricky logic of multiplexing threads and connections and state etc. have been pushed to the infrastructure. It doesn’t help so much with the more complex cases where you want to manage mutable state efficiently yourself.

4 Likes

Note that if you suspend an actor inside receive—it will no longer be able to respond to SystemMessages.

You can see my experiment with adding Loom-support in Akka here: GitHub - akka/akka at wip-loom-√

3 Likes

This is a very interesting question, and conversation.

There seems to be a lot of interest in GraalVM these days. What if GraalVM doesn’t support Loom (I have no idea but suspect it wouldn’t)? I’d think that the blocking model would encounter the same performance problems we see with blocking code today. It just feels wrong to abandon what we’ve learned thus far.

3 Likes

Thanks to everyone that replied. This has been quite thought-provoking.

I’m now moving to the view that virtual threads do not have a place within an actor’s receiver. As highlighted in this conversation, as soon as a function/method blocks then an actor is no longer able to receive any other message, be them system messages, some state transition timeout message, or any message sent from any other actor. Actors then can be an alternative to using virtual threads and they can complement each other e.g. virtual thread-based code outside of an actor can call upon ask and block on a reply.

Thanks again for helping me come to this view.

My remark is not specific to loom, but since we are all here: is it really essential for the Actor Model that an actor react immediately? Or that computing a response over a lengthy time period can be interrupted?

In Akka, SystemMessage is used for lifecycle changes (effecting them as well as informing about them). The only lifecycle change we need to effect is killing the actor (as shown by Akka Typed), and the only one to observe is termination. The latter is a SystemMessage for technical reasons, but there is no need to enforce that it immediately be handled by the actor’s logic.

In essence my question is: can we not give the actor more control over how it wants to spend its time? If the actor is waiting for something else, it may in the meantime not be interested in new messages, even DeathWatch ones. The only remaining issue then is involuntary termination, but the JVM has a very strong opinion on not allowing that to be implemented — other runtimes may permit more aggressive actor implementations.


Viktor: I know that both of us had — and probably still have — a strong bias towards offering components that must always respond quickly. My question above relates to this viewpoint in two ways:

  • not all components have this requirement, some just benefit from the encapsulation
  • responsiveness cannot really be enforced by the framework or even runtime, the programmer can always ruin the day; hence I think that “responds quickly” is a feature of the business logic, independent of whether the receive function may block or be async

One aspect I see rather critical is exemplified in Swift actor reentrancy: there is a huge difference in cognitive load between a single-threaded actor and one that multitasks between different sub-threads. This is another interesting discussion, I mention it only to say that I’d like to keep these topics separate.

3 Likes

From a programming paradigm perspective, it seems quite acceptable (and is also common) that an actor is “busy” and not making progress on its mailbox for some amount of time. How exactly the actor is busy, isn’t observable from the outside anyway.

Looking at it from a feature perspective I’d say an actor is particularly well-equipped to write asynchronous components that respond quickly in any state because you can wait for multiple things happening at the same time. If you don’t need that particular feature you can often find better solutions than actors for a use case (though not always, and sometimes actors fluctuate between busy and reactive states, e.g. when a persistent actor needs to persist a message).

Whether or not actors should respond quickly in general is hard to answer. A slow component can always slow down a complete app, so striving for quick responses is a good purpose, but as you say it’s not guaranteed by any particular design.

But it seems Viktor might point out a more general pattern here: even if an actor is busy on it’s “content” side of processing, it should be responsive on it’s administrative side of things.

I haven’t deeply looked into the current state of Loom affairs, but couldn’t it be possible to catch the thread suspension during user message handling inside of the actor, and then suspend user message processing until that suspended virtual thread has continued and finished but still process system messages in the meantime? If the actor is terminated before the virtual thread is continued, couldn’t that continuation then just be discarded?

I guess that could lead to similar problems as terminated regular threads, which usually works quite well but can also lead to hard-to-debug problems e.g. if the ongoing thread had acquired some resources.

1 Like

I did an experiment with coroutines. Making the inbox available as a ReceiveChannel and allow calling suspend functions. The actor stashes while waiting for the suspend method to return.
Timeouts are plain exceptions that can be caught.
Here’s my write up Akka and kotlin coroutines · GitHub and here’s the code GitHub - joost-de-vries/akka-kotlin: Some experiments with using Kotlin coroutines, channels and suspendable functions in working with Akka actors

Late to the discussion, but since Java Loom is getting near(??) to release had a thought to share and get some feedback (unless this thread is completely dormant).

Have an application and was considering using Akka/Loom but build the following model:

  1. Plan on using a lot of Actors
  2. Have each Actor process a single “unit of work” (with a lot of blocking I/O, database access, etc)
  3. Have a Router (or other component) which manages a pool of worker Actors.
  4. The Router receives all requests for work, picks a free Actor from the pool, and sends it a message to do this one “unit of work” (or creates a new worker Actor if none available).
  5. That Actor does everything needed (which may well take several seconds/minutes) - when finished it sends a “Finished - Available for more work” type message back to the Router.
  6. While processing, that Actor is not sent any more messages from the Router. However, it can send itself internal messages if desired - e.g. move to step 2, delayed message to continue step N (if the Actor knows that it must wait for a long external operation to complete), or start child Actors to do sub-pieces of work and send Finished messages.
  7. The Router can still receive control messages (e.g. Shut down) and relay these to all the worker Actors if desired.
  8. The Router also manages the size of the Pool - create Actors up to N if needed, if many become idle shrink down to a min size.

Certainly a change from the “standard” Actor model, but with Loom should be cheap and not monopolize heavyweight threads. The logic within the Actor could be very straightforward since Loom handles blocking calls effectively.

Since all within Akka, can also have a Singleton and distribute these “worker Actors” across multiple servers if required to handle the expected workload (as opposed to say using Futures but then have to deal with inter-server communication issues).

Thoughts?

Hi John

As mentioned earlier in the thread, if you effectively block within your Akka actor (Loom or no Loom) then you are unable to have it process any other event, including the internal ones that Akka may send.

In your distributed scenario, I’d probably spawn a new thread from the worker actor on an executor tuned for blocking. That worker would then be unblocked and responsible for tracking the state of the blocked task.

The benefits of an actor in this instance are the ability to track state and, given Akka, the ability to distribute processing via Akka Cluster.

HTH.

Thanks for the response, and you have a good point.

Was trying to avoid the complexity of spawning yet another thread - Loom will certainly be able to handle it, but then the worker actor itself becomes more complex. Willing to block the worker actor since it is dedicated to a specific unit of work and is designed to be isolated from the general flow of messages.

But your thought might gain all of the benefits with a modest increase in complexity - will re-think. Thanks

Looks like GraalVM will support Loom - Add Loom support (VirtualThreads) to GraalVM · Issue #4621 · oracle/graal · GitHub

That said, GraalVM means a couple of different things:

  • the OpenJDK fork with a different JIT compiler (which is what that issue seems to implicitly refer to): the heavy lifting for Loom support would mostly be done by OpenJDK (it’s possible some JIT enhancements would make it even better, but the box could be ticked)
  • native image: a whole other kettle of fish