there will be no back pressure on the subsribe processing because you are not returning result of ask but always return Done.
Can you expound on this? ^ The code snippet that I posted is for the microservice that’s experiencing the OOME (i.e. let’s call this Microservice B). Why I think there is a back pressure because the microservice that’s producing the incoming messages (i.e. Microservice A) have been up for at least a day before Microservice B was deployed and started.
Microservice A - produces EventA. This was deployed and started up on Day 1
Microservice B - consumes EventA. This is the service that’s experiencing the OOME. This was deployed and started up on Day 2.
Which means, Microservice A has been producing EventA for about a day, and when Microsevice B started up, it received a day’s worth of events all at the same time.
Taking you do not have back pressure and there were many messages in kafka to process, that could be a main reason for OOM.
That’s unfortunate, because this isn’t the full load and yet we are already experiencing this. The messages are coming in one at a time. And the way we process the incoming messages is pretty straight forward -> incoming event -> to impl command -> impl event. There are no additional IO (i.e. database calls, rest calls, etc) nor any substantial processing (just setters and getters).
However, even with that, at the time of OOME, we can see only 10 incoming messages in the heap space (which seems to indicate that incoming processes are being handled pretty fast), 1000+ impl commands (which seems to indicate that they’re not being handled in a timely manner), and only about 300+ impl events (which seems to indicate that at most, only 300+ commands are handled at a time. most likely, only 1 command is handled at at time, which means only 1 impl event is created at a time, and the other impl events seen in the heap have already been used but at are just awaiting garbage collection. but since there was an OOME, that means the jvm couldnt collect these yet because something else was holding on to them)
In an ideal situation, for each each impl command created, it would be processed immediately before the next incoming message arrives. That way, we wont end up with 1000+ impl commands in the queue. So IMHO, either, (a.) we are not processing the commands fast enough, or (b.) there was a backpressure of incoming messages and our microservice choked up on 1000+ incoming messages arriving all at the same time. Im currently leaning towards (b.) but I am curious why you think there is no back pressure.
do be able to return ask to get a back pressure you will need to use mapAsync.
Im not sure I understand. Can you expound on this? ^