Akka Streams and the number of actors created

Euqinu · November 18, 2019, 7:30am

Is there scaling and descaling of actors in Akka Streams?

Suppose I have a stream, which has an infinite source like Kafka topic and each data is computed and processed and finally sent to a database. There is no use of any async method in any flows.
So will the actors created in backend of Akka streams, scale up and scale down depending on the load. Like if there is huge data, more actors are created for the computation and processing and when the data is less, actors are again scaled down. Does something like this happen?

Or how much ever data is coming, the entire thing is processed by only one actor?Like, if the throughput of the data is 1000 tps or 200 tps, the stream runs on only actor. ?

johanandren · November 21, 2019, 3:37pm

Yes, that is correct, there is no automatic scaling up in streams. Separate actors for a single stream can still be introduced if some operator needs to run on a different dispatcher, for example to do some blocking operation, for a stream that doesn’t use sub streams or submaterialize streams it will be a constant number of actors after the stream has been started.

For scaling up with Kafka I think the usual way to achieve that is to just create more running streams with the same consumer group and have Kafka partitioning take care of splitting the messages across the processing streams.

From the top limit where a single stream of execution maxes out and down to zero the actor can be seen as a form of scaling down though, because if there are no elements the actor will not be scheduled for execution and only use memory, no cpu/threads.

Euqinu · November 22, 2019, 7:06am

So how does stream handle infinite data coming at a very high throughput ?
As far as my understanding, every data is processed in a sequential way( when no asyncs are introduced). So if the throughput is more , maybe 500-1000 tps then the output may not be real time because every data will take time to be fully processed.

 consumerSource ~>
    broadcast~> dataExtractionFlow1~> formattingDataFlow~>
                                                          storeBroadcast~>redisAppendStoreFlow~>out
                                                          storeBroadcast~>redisStoreFlow~>out

   broadcast~> dataExtractionFlow2~>formattingDataFlow~>redisMatchFlow~>sendToKafkaFlow~>out

My whole application is a single GraphDSL and the above is the flows of my graph. The source is infinite kafka source where I have included mapAsync(2) so that every time it gets 2 data from Kafka.

The application works fine in normal tps like 30 40 tps, but is not real time when tps is around 100 or more. So how does stream handle data for real time processing when the load is of high tps.

johanandren · November 22, 2019, 7:38am

If the stream cannot keep up it will backpressure, which means the Kafka consumer will not pick up elements in a higher flow than you can get through the flow. If you have created a flow that can only do 40tps that’s what it will do.

If you want to reach higher per stream throughput the next step would be to figure out what slows it down, and see if you can improve that. For IO bound bottlenecks batching can often help (that’s also true for Kafka commits), if there is a CPU bound bottleneck parallelizing can help but Kafka can complicate things there since you must not re-order if you are using commits.

Euqinu · November 24, 2019, 5:27pm

I also tried to integrate stream with Kamon for analysis purpose. I wanted to see the number of actors created and the number of processed messages. I came up with some interesting things .

akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-300-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-427-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-442-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-378-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-305-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-26-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0

What does the naming convention of flow- 300 - 0 implies ? What if there is flow-300-1 ? What is the difference?

Also, is there any standard tool for Akka for such kind of monitoring other than Kamon? How do we analyze things like actor count and number of processed messages? How do we get the number of processed messages efficiently? I tried various things but couldnt find any clear output.

johanandren · November 25, 2019, 12:46pm

The names just unique names for each stream interpreter actor, the name consist of a index counter and a name from the stream running inside the actor. In this case you can deduct that the flow ends with a Sink.seq from the name but not much else.

You can checkout Lightbend Telemetry for more insights https://developer.lightbend.com/docs/telemetry/current/home.html

Euqinu · December 6, 2019, 6:37am

Thanks for the information. It was really helpful.

Topic		Replies	Views
How are Akka streams implemented on top of Akka actors? Akka Streams & Alpakka	1	568	February 18, 2019
Most efficient way to process data with akka-steams Akka Streams & Alpakka akka	9	2258	March 16, 2018
Why is my source producing more than the buffer size? Akka scala , streams	1	1018	June 14, 2018
StreamRef or Akka actor? Akka	0	454	May 8, 2020
About Akka Kafka Integration Akka Streams & Alpakka scala	3	522	February 19, 2020

Akka Streams and the number of actors created

Related Topics