Hello Community, I am new to Akka and looking for advice on the best approach to implement the following:
I have an event-driven system. The pipeline does the following: ingest events (Kafka) -> validate -> transform -> enrich(hydrate using external service) -> publish to Kafka downstream.
I would like to use Akka streams, in Akka Cluster (with Alpakka).
Since the number of events is large, I would like to distribute the load across multiple nodes in the cluster. The backpressure is important as well as the resilience and processing time. Here is the high-level Akka streamflow.
Consumer.plainSource(....) .groupBy(N, message -> message.getKey().getID(), true) .grouped(100) // send batch of 100 to an actor for processing .via(dataProcessingFlow) .to(Sink.(publish to downstream)));
I was not able to figure out how to use StreamRefs in the substream created by the groupBy. The workaround is to create a group of actors (via receptionist), the actors created on each node. The stream uses the receptionist to find the next available actor and sends batches of 100 items for processing. Each actor uses the Akka streams internally. The results send back to the originated stream for publishing.
A few questions:
- Does it look like a good design?
- Apart from managing a group of workers on each node, are there any other non-apparent issues that I need to be aware of?
- How to enable the manual offset commit in this case?
- Should I use StreamRef or continue using actors (what is the suggested approach). P.S I would choose StreamRef but not sure how to do it with substreams.
Looking forward to the comments.