Akka Streams and the number of actors created

Is there scaling and descaling of actors in Akka Streams?

Suppose I have a stream, which has an infinite source like Kafka topic and each data is computed and processed and finally sent to a database. There is no use of any async method in any flows.
So will the actors created in backend of Akka streams, scale up and scale down depending on the load. Like if there is huge data, more actors are created for the computation and processing and when the data is less, actors are again scaled down. Does something like this happen?

Or how much ever data is coming, the entire thing is processed by only one actor?Like, if the throughput of the data is 1000 tps or 200 tps, the stream runs on only actor. ?

Yes, that is correct, there is no automatic scaling up in streams. Separate actors for a single stream can still be introduced if some operator needs to run on a different dispatcher, for example to do some blocking operation, for a stream that doesn’t use sub streams or submaterialize streams it will be a constant number of actors after the stream has been started.

For scaling up with Kafka I think the usual way to achieve that is to just create more running streams with the same consumer group and have Kafka partitioning take care of splitting the messages across the processing streams.

From the top limit where a single stream of execution maxes out and down to zero the actor can be seen as a form of scaling down though, because if there are no elements the actor will not be scheduled for execution and only use memory, no cpu/threads.

So how does stream handle infinite data coming at a very high throughput ?
As far as my understanding, every data is processed in a sequential way( when no asyncs are introduced). So if the throughput is more , maybe 500-1000 tps then the output may not be real time because every data will take time to be fully processed.

 consumerSource ~>
    broadcast~> dataExtractionFlow1~> formattingDataFlow~>

   broadcast~> dataExtractionFlow2~>formattingDataFlow~>redisMatchFlow~>sendToKafkaFlow~>out

My whole application is a single GraphDSL and the above is the flows of my graph. The source is infinite kafka source where I have included mapAsync(2) so that every time it gets 2 data from Kafka.

The application works fine in normal tps like 30 40 tps, but is not real time when tps is around 100 or more. So how does stream handle data for real time processing when the load is of high tps.

If the stream cannot keep up it will backpressure, which means the Kafka consumer will not pick up elements in a higher flow than you can get through the flow. If you have created a flow that can only do 40tps that’s what it will do.

If you want to reach higher per stream throughput the next step would be to figure out what slows it down, and see if you can improve that. For IO bound bottlenecks batching can often help (that’s also true for Kafka commits), if there is a CPU bound bottleneck parallelizing can help but Kafka can complicate things there since you must not re-order if you are using commits.

I also tried to integrate stream with Kamon for analysis purpose. I wanted to see the number of actors created and the number of processed messages. I came up with some interesting things .

akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-300-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-427-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-442-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-378-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-305-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0
 akka_actor_errors_total{path="KafkaConsumer101/system/StreamSupervisor-0/flow-26-0-seqSink",class="akka.stream.impl.fusing.ActorGraphInterpreter",system="KafkaConsumer101",dispatcher="akka.actor.default-dispatcher"} 0.0

What does the naming convention of flow- 300 - 0 implies ? What if there is flow-300-1 ? What is the difference?

Also, is there any standard tool for Akka for such kind of monitoring other than Kamon? How do we analyze things like actor count and number of processed messages? How do we get the number of processed messages efficiently? I tried various things but couldnt find any clear output.

The names just unique names for each stream interpreter actor, the name consist of a index counter and a name from the stream running inside the actor. In this case you can deduct that the flow ends with a Sink.seq from the name but not much else.

You can checkout Lightbend Telemetry for more insights https://developer.lightbend.com/docs/telemetry/current/home.html

Thanks for the information. It was really helpful.