Inconsistency with subscriber service and parallelism levels

TimMoore · May 14, 2019, 1:14am

Lagom’s Kafka subscriber assumes that you’re going to emit exactly one Done for each incoming element, and relies on this assumption to correctly align and commit the Kafka offset. Using flatMapConcat is therefore unsafe, as you’ll end up with more output elements than input elements and you’ll commit offsets that you haven’t successfully processed yet.

For a similar reason, mapAsyncUnordered is not safe to use: you could end up committing an offset for a later element before the processing of an earlier element has completed. Because Kafka treats the committed offset as “high water mark” (i.e., assumes that you’ve completed processing everything prior to the committed offset) this could also result in missed elements if the stream is restarted for any reason.

Instead you’ll need to make sure that you process each batch together and emit exactly one Done when the whole batch is complete, and that the order is preserved. You can use mapAsync to preserve the ordering, and you can nest a substream inside the mapAsync to iterate through the grouped elements. Something like this:

val parallelism = 16

kafkaProxy.dataTopic.subscribe.atLeastOnce(
  Flow[Data]
    .flatMapConcat(data =>
    .mapAsync(parallelism) { data =>
      Source(groupBySession(data)))
        .mapAsync(parallelism) { sd =>
          entityRefFor(sd.entityId).ask(RecordSessionData(sd.sessionId, sd))
        }
        .runWith(Sink.ignore)
    }
  )

(You might want to adjust for example using different parallelism values for the inner vs. outer mapAsync.)
You might not get the same level of throughput, but that’s a necessary sacrifice to maintain ordering.

Topic		Replies	Views
Sync up a new service and subscribe for updates Lagom	4	710	August 20, 2020
TopicProducer.taggedStreamWithOffset event published to Kafka do not have uniform time delay(lag) Lagom	1	901	April 24, 2019
Kafka Consumer AtLeastOnce is missing events after node crash Lagom kafka	1	1127	April 2, 2021
Public API Consumer consumes twice after restart Lagom	1	1106	November 26, 2018
Lagom multiple TopicProducer in one service Lagom Message Broker API scala , kafka	8	1751	November 5, 2019

Inconsistency with subscriber service and parallelism levels

Related Topics