Alpakka not using more than one CPU core

We have created a Alpakka stream, which consumes Kafka message from a topic and then process those messages. These messages are processed in parallel, using mapAsyncUnordered with a configured parallelism. The Kafka lag for the consumer increases, but the application uses only 1 core of CPU. I have changed the default dispatchers to, which uses a fork-join executor expecting it to use more than a CPU core. I have my application running in 32 cores.
Please find the configured settings below:

akka.kafka.consumer.use-dispatcher = ""

Consumer stream code:

Consumer.DrainingControl<Done> control = Consumer.committableSource(consumerSettings, Subscriptions.topics(topic))

                .buffer( 500, OverflowStrategy.backpressure() )

                //De-serialize the response from json to java object
                .mapAsyncUnordered( 5, //deserialize the output )

                .mapAsyncUnordered(5, //Process it and perform some calculations )

                .mapAsyncUnordered( 5, //Do something and return the consumer offset )

                //Commit the offset
                .toMat( Committer.sink(committerSettings.withMaxBatch(100)), Consumer::createDrainingControl)
                .run( materializer );

The stream runs in a akka-cluster, which is load balanced by same consumer group id. We have a typed actor system as well in the application which is used for triggering the request, with a group router which helps in sharing the load across the cluster. The triggered request is sent to a micro service as a Kafka message and we get a response as a Kafka message which is processed by streams. And these messages are not necessarily to be processed in order, hence the use of mapAsyncUnordered…

Tried increasing the parallelism to even 100, but didn’t see a change.

Thanks in advance

Is the issue that you never observe more than 1 core being used, or that the parallelism value is not actually running operations in parallel?

I never observed more than core being used, it answers the second point as well, or am I wrong in understanding that? If the operations were run parallel then, we expect more than one core to be used right?

There could be a number of reasons why additional cores are unused. Ultimately it’s up to the JVM and your OS to determine when a thread should be assigned to another core.

If you have 32 cores at your disposal and the transformations are mainly CPU bound then you could set the parallelism in mapAsyncUnordered to 32 (or a bit higher).

Can you share your changes to the dispatcher config?

Have you tried increasing the throughput of your application?

Is your application running within a Linux cgroup with any memory or CPU limits?

General note: It’s not advisable to use mapAsyncUnordered if you rely on ordering of your messages. Since you’re using the Committer.sink it’s possible that offsets could be committed out of order and you may skip the processing of some records when a rebalance or shutdown occurs.

How many partitions does the topic have? If you have only one partition, and you are committing offsets manually, then you will only be able to process 1 message at a time — regardless of the parallelism in the middle; Kafka will only give you messages one by one.