Should we use Akka Stream and actors vs Spark Streaming?

I’m working in a projects where in we are writing a system to re-engineer the search of a hotel chain platform.
The use cases are like such wherein changes in entities saved in Couchbase are sent to kafka topic which are processed by Spark Stream processing. The changes can be like a room is booked, rate changes in room for a given hotel of a hotel chain. Currently all events for a hotel are being pushed to one partition of kafka.

In my view we should be using Akka stream where in the change events are being processed by the Actors instead of only using stream processing only.

Can someone please clarify ?

Hard to clarify here, there’s no one true solution.

You could even use kafka-streams to process the events, getting rid of spark if not needed.

Stream processing can be done in many different ways depending on different non-functional aspects: expected load, expected rate of events, consistency requirements, complexity of the processing stage, source and sinks of the processing, deployment environment…

The simplest solution (less moving parts and sub-systems) fulfilling the requirements could usually be considered “the best” one, but it might not be your case?

A few question comes to ming immediately:

  • why spark if you can process using kafka-streams?
  • why introduce akka-streams, yet another component, if spark’s already doing the job?
  • Are there any problems with the solution you’ve outlined?

I think the real difference is that spark is not a task parallel framework but a data parallel framework. And akka actor model is a non blocking micro service framework.
So if you have large volumes of data that can be processed in parallel go with spark. But if you need a real time low latency system and your actions can be processed in parallel go with Akka actor model.

1 Like