Kafka Retention and Topic Subscriptions

Looking at the Kafka message broker implementation I see that when a service starts it begins publishing message for each of its topic to the broker, resuming from where it last left off. This gets all the messages into the broker independent of any particular client. It’s up to the broker implementation to manage each client’s consumption of messages (consumer groups, offsets, etc).

In the case of Kafka, if we don’t allow an infinite retention period then message will eventually be deleted. Is it the case that if a new service comes online after the retention period, it won’t be able to subscribe to a topic from the beginning because some messages will no longer be available? If so, are there any best practices on how to handle this case?

That’s correct. There are two general strategies:

  1. Set a long or unlimited retention period for the topic
  2. Create a new topic that begins publishing from the start, and have new consumers read from that

The tradeoffs are optimizing for the case where you don’t expect more consumers later vs. where you do. Some things to consider:

  • The first option obviously consumes more space for the logs, so if you don’t expect new consumers frequently, the second option might be a better tradeoff.
  • Repopulating an empty topic from a big event journal could take some time, so if you do expect new consumers frequently, the first option might allow them to catch up more quickly.
  • You can’t change the retention period once the topic is created, which might force you into the second option if you’re already in production.
  • If the format of your data needs to change for new consumers, the second option might be better, as it allows you to decouple your old and new data formats.

Thanks @TimMoore, that helps a lot.