Sequential processing is faster than parallel processing - Why?

Hello there,

I am currently trying to understand better how parallelism works in Akka Streams. For that purpose, I built a simple stream in a parallel and a sequential way to see the difference. But the sequential processing is faster than the parallel processing and I wonder why?

I used a Balancer and a Merger and my computer has 4 cores. I used a huge amount of numbers and some mathematical operations to give the computer enough workload.

I added a warm-up phase, so the comparison is fair.

My results are always that my sequential code is a lot faster, it needs mostly around 16 sec for the computation, whereas my parallel code needs somewhere between 70 and 100 sec.

Here is my code:

object Main extends App {
  implicit val system = ActorSystem("QuickStart")
  implicit val materializer: Materializer = ActorMaterializer()

  val source: Source[Long, NotUsed] = Source(1L to 100000000L)

  val sequential: Flow[Long, Long, NotUsed] =
      .filter(x => x % 2 == 0)
      .map(x => x.toDouble)
      .map(x => log(x))
      .map(x => exp(x))
      .map(x => x.toLong)
      .fold(0L)((accu, value)
      => accu + value)

    println("starting warm up")
    val start = System.currentTimeMillis()
    val done = source
      .runForeach(i => println(s"warm up: result = $i, time = ${System.currentTimeMillis() - start}"))
    Await.result(done, 100000.millisecond)
    println("starting sequential")
    val start = System.currentTimeMillis()
    val done = source
      .runForeach(i => println(s"sequential: result = $i, time = ${System.currentTimeMillis() - start}"))
    Await.result(done, 100000.millisecond)

  def parallel(parallelism: Int): Flow[Long, Long, NotUsed] = Flow.fromGraph(GraphDSL.create() { implicit builder =>
    val balancer = builder.add(Balance[Long](parallelism))
    val merger = builder.add(Merge[Long](parallelism))

    for (i <- 0 until parallelism) {
      balancer.out(i) ~> sequential.async ~>

    FlowShape(, merger.out)

    println("starting parallel")
    val start = System.currentTimeMillis()
    val done = source
      .fold(0L)((accu, value) => accu + value)
      .runForeach(i => println(s"parallel: result = $i, time = ${System.currentTimeMillis() - start}"))
    Await.result(done, 100000.millisecond)


Do you have an idea why the sequential code is so much faster than the parallel? Is the workload not enough or did I write the parallel code in a wrong way?


Hi @IvoAdrian,

Your example app highlights the trouble in benchmarking parallelism with trivial (fast) operations. When you use the async operator in Akka Streams it incurs extra overhead when the stream is run. By default akka streams operations are “fused” together and run by a single actor. Output from one operator is fed directly as input to the next. When an async boundary is introduced we create an “island” for that operation and everything before it. Concretely, everything before the async runs in one actor, and everything after runs in another. In order to propagate I/O between the two actors we incur extra overhead and that’s what’s skewing the results of your benchmark.

If your operation took a non-trivial amount of time (i.e. > 1ms) then you will begin to see the benefits of parallelization because the operation cost will outweigh the overhead.

Colin Breck has a good blog post which describes the pros and cons of parallelism in akka streams. Lightbend also offers an akka streams course (academy is free at this time) that delves into this topic, if you or your organization are interested in a more formal training option.

Hope that helps.



Hi Sean,

Thanks for your help! That was quite clarifying.

Best regards,


1 Like