Message delivery reliability of akka APIs

This is a post(set of questions really) regarding the implications of at-most-once(maybe-once) delivery reliability offered by akka on APIs and software based on akka. From my understanding, even messages sent by actors to self can be lost.

Till now, I have been in scenarios where delivery guarantee is not necessary, so life was good. But now, I need reliability from my service at least to the extent that the subscriber should know that a message has been lost. In trying to use akka streams and akka-http libraries, the following questions seemed important for me now.

  1. If messages can be lost in actor system, and akka-streams are based on actors, does that mean akka-streams are leaky? Thats pretty scary to think about. Specifically does a Flow(or GraphStage) ensure that stream entities are not lost?
    For example, is the below code(based on Source.single doc) guaranteed to print [hello]?
    import akka.stream.*;
    
    CompletionStage<List<String>> future = Source.single("hello").runWith(Sink.seq(), materializer);
    CompletableFuture<List<String>> completableFuture = future.toCompletableFuture();
    completableFuture.thenAccept(result -> System.out.println(result));
    
  2. If messages can be lost in actor system, and akka-http is based on actors, does that mean akka-http APIs, lets say websocket API randomly drops messages.
    For example, in:
    http.singleWebSocketRequest(WebSocketRequest.create("wss://api.example.com"), flow, mat)
    
    is it at least guaranteed that my flow will reliably receive everything on the websocket in order?
  3. In actor world, I would have a sequence number on each message and let the receiver actor take remedial measures on a lost(probably never because of at-most-once reliability issues) sequence number. In case we live in a dystopian world where the answers to 1. and 2. are yes, this wouldn’t work because the external web service I consume does not implement sequence numbers. Because any sequence number I generate within the application will only add sequence numbers to messages that are not already lost. Is this a correct understanding?

A local message will only be lost if the JVM is crashing, the computer being abruptly shut down or that actor stops before the message is processed. This is not very different from a method call completing successfully, or an operation scheduled to run on an executor potentially being “lost” if the JVM is terminated or someone shutting down the executor.

Guaranteed would mean that something happens even if there is such a fatal problem. If you need that guarantee you will need to implement measures to deal with that, you could also see it as different levels of guarantees, so for example some parts of a system may need a re-send in case the actor crashes but be fine with not dealing with complete JVM or server crash, while other parts will need that guarantee and has to pay the full cost of persisting every event to some other, distributed/replicated storage (because local storage could also be lost), re-sending, deduplicating on the receiving side.

This does not mean that local messages are randomly dropped.

Messages going across network are more likely to be lost however, this is inherent in the nature of networks. The problem with networking is that are scenarios where it is impossible to know wether the connection failed before, while or just after the request was processed. This is often dealt with by making sure a retry with the same data does not trigger another operation/change if the first one was received (idempotency) and retrying on failure. With actors detecting that there was a failure at all requires an Ack message of some form and a timeout, with HTTP you will likely see the request fail.

1 Like

Thanks. I noticed the differentiation your team makes between ‘reliability’ and ‘guarantee’ now, not just here but in other documentation/blogs.

This answer combined with the answer in google groups answers my questions. So in summary:

  1. Akka actor library offers reliability for local(within JVM) messages since the underlying mechanism is method calling.
  2. Akka streams give single-VM, ephemeral implementations of graphs. So the delivery reliability in 1. applies there as well.
  3. Since akka-http is a http client + akka streams, the same delivery reliability applies.
  1. Akka actor library offers reliability for local(within JVM) messages since the underlying mechanism is method calling

It is not method calling, it is passing messages through a queue, which the actor (potentially on a different thread) then will process. If the actor is busy with other messages, or the system is busy, the message will stay longer in the inbox, which means that the risk is somewhat higher that a system crash happens between posting the message and the actor processing it, than a method call, which happens immediately on the calling thread. It is however in absence of JVM crash unlikely.

  1. 
 So the delivery reliability in 1. applies there as well.

Correct

  1. Since akka-http is a http client + akka streams, the same delivery reliability applies.

Since HTTP is a network protocol it is way less reliable than in-JVM communication, if you want higher reliability for HTTP requests you would have to implement looking at the response and potentially retry failed requests. This is not specific to Akka HTTP. Once the bytes are in the JVM the risk of loosing data is the same as above.

1 Like