Is it possible to write a resilient Kafka producer using Alpakka?

In my use case, I need to produce data into Kafka. When Kafka is not available, I want the producing stream to backpressure and retry with exponential backoff, and not lose any produced data.

It looks like Akka Streams / Alpakka doesn’t offer a solution for that. I’d like to be proven wrong.

Here’s what I have (it’s Kotlin, but should be pretty straightforward – the first line generates an Iterable of all int values starting from 1: 1, 2, 3, …).

  Source.from(generateSequence(0) { it + 1 }.asIterable())
    .throttle(5000, Duration.ofSeconds(1))
    .map {
      ProducerMessage.single(ProducerRecord<String, String>("increasing-topic-1", it.toString()))
    }
    .via(RestartFlow.onFailuresWithBackoff(Duration.ofMillis(100), Duration.ofSeconds(30), 0.5, -1) {
      Producer.flexiFlow<String, String, NotUsed>(producerSettings)
    })
    .runWith(Sink.ignore(), mat)    

What I see (consistently with what RestartFlow’s docs say) is that when I run this program and make Kafka not available / available again, then some numbers are lost, e.g. there are 60k messages in the topic, but the highest one contains number 72k.

Hi @baltiyskiy

The RestartFlow will always discard messages that were “in the stream” when it restarts.

Alpakka Kafka producer flows do not support emitting success/error downstream. Errors trigger stream cancellation. As recently discussed in Kafka - Ability to react on message publication result (which ends with an idea to add such behaviour).

If we added that, a RetryFlow would easily achieve what you are looking for here.

Cheers,
Enno.

1 Like

I’ve gathered all issues that mention similar problem and created feature request in github https://github.com/akka/alpakka-kafka/issues/1101

2 Likes