I’ve been using alpakka kafka to streaming data from kafka topics. I’m using:
Consumer .committableSource(consumerSettings, Subscriptions.topics(topic))
Recently I’ve tried to spam more consumers like 3 on a topic which has 15 partitions. When I plug more consumers with the same group id, it kindly split 5 partitions per consumer, but it seems to do not consume all partitions at the same time, it seems to read one by one, or read a specific partition much faster than others.
|Partition|LogSize|Consumer Offset|Lag| |---|---|---|---| |0|8,429,145|6,087,144|2,342,001| |1|8,424,948|6,223,257|2,201,691| |2|8,428,121|7,764,854|663,267| |3|8,421,528|6,071,425|2,350,103| |4|8,434,659|7,351,552|1,083,107| |5|8,428,323|5,935,336|2,492,987| |6|8,424,974|6,455,301|1,969,673| |7|8,431,820|7,763,984|667,836| |8|8,425,999|6,370,962|2,055,037| |9|8,416,354|6,681,093|1,735,261| |10|8,416,217|6,814,949|1,601,268| |11|8,428,026|5,878,703|2,549,323| |12|8,424,604|8,424,589|15| |13|8,431,019|8,431,019|0| |14|8,423,218|8,423,218|0|
Here is a real example of a production application I’m running. So I have some questions:
- Is it ok to read some partitions much faster than others? Please, note that this behavior only happens when I start more than one consumer.
- Should I change the way I’m consuming? Should I use source per partition, or is there a different option?
I appreciate any help, thanks!
I was suspecting that it could happen when plugging more than one consumer(read more than one application), but it happened today using only one consumer, you can see by taking a look at the consumer group, which is the same.
At the time it happened, I had 20MM of messages still waiting to be processed(lag). The above picture is a picture taken from the Kafka manager we have at the company.