Heartbeat interval is growing too large when sending large payload between nodes

Hi,

My akka cluster is processing messages from a Queue. For each message, I will do a calculation, which will result in an big Update payload being sent to its correponding persistent actor(which can be sitting in different machine). Message processing is done within an akka stream, which control maximal parallel messages at a time. The Update payload can be quite large, up to 50MB. I observed akka.remote.default-remote-dispatcher having high cpu and “Heartbeat interval is growing too large” message started to appear. I was reading this article - https://petabridge.com/blog/large-messages-and-sockets-in-akkadotnet/. As Heartbeat message is sent through akka.remote.default-remote-dispatcher also, I think this big Update payload is a problem. I need an akka stream to control and limit the parallel message processing. Do you have a recommendation of design for such system? Thanks.

====Update
I am currently trying out artery and its artery.advanced.idle-cpu-level=1(it seems this option no longer exists? only artery.advanced.aeron.idle-cpu-level = 1 exists?). But I am still receiving “heartbeat interval is growing too large for address”

akka.remote {
  artery {
    enabled = on
    transport = tcp
    large-message-destinations = [
      "/user/myActor1"
    ]
    advanced {
      aeron.idle-cpu-level = 1
      idle-cpu-level = 1
      maximum-large-frame-size = 50 MiB
      large-buffer-pool-size = 5
    }
  }

Cheng

When using transport=tcp the aeron settings are not used, such as the idle-cpu-level.

I think the tcp transport is better for large payloads.

50 MB is very much for a message, also when using the large channel. It could be other things than the transfer that disturb the heartbeats when using such large messages. Difficult to guess without investigation.

We have plans of supporting bulk transfers with StreamRefs, but that is not ready yet.

The typical recommendation would be to send such large things over a side channel, e.g. HTTP, instead of with actor messages.

Thanks for the reply @patriknw. If it is going through a side channel, e.g. HTTP, I will need to set up an extra HTTP server on each node and act as a forwarding proxy, which is not a clean solution(and message itself having destination actorRef). I was thinking about sending TriggeringCalculation message to persistent actor and let persistent actor to spread calculation locally on its node. In this way, it will avoid network transmission. But this has disadvantages of hard to control number of parallel calculations(which I can with stream) and tying calculation logic with persistent actor(which it is not ideal).

Thanks, Cheng