Transational push of a Kafka Producer


(Alexandre Lebrun) #1


In my team we are actively using the Alpakka-kafka producer in order to push data coming from files read using batches.
It fits well to our needs, however I encounter a problem when I want to validate the data I want to push.
Sometimes I want to push all the data if no line is is error and do nothing otherwise.
Since my file has ~900k lines I can’t keep the data in RAM and push it after.

I heard of transactional messaging implemented in Kafka 0.11 and available in Akka streams Kafka with ‘Transactional.sink’. However it seems to be implemented in order to ensure the exactly-once principle through producer-consumer coordination.

Looking to the source code of ‘Transactional’, I’ve seen that the lines are committed every ‘eosCommitInterval’ milliseconds and at the end of stream.

Does Transactional.sink fit my use case ? Is it possible to avoid the automatic ‘eosCommitInterval’ auto-commit ?

Thanks in advance,

(Charlie) #2

Hello @xela85,

It might be best to use the Producer.plainSink and pass in your own producer instance instead of the Transactional.sink. Then you can begin a transaction before your file opens and commit/abort it after the stream has finished/failed directly from your producer instance. Like you said the current implementation of the transactions in alpakka-kafka assumes the consume-process-produce flow so your use-case doesn’t quite fit.

Hope this helps!

(Alexandre Lebrun) #3

Thank you, It is the tradeoff what I l was looking for :).