Compute fibonacci 34 fails

Hi

For testing purposes I wrote a small program to compute fibonacci numbers

the program fails (not enough memory) for fibonacci 34

fibonacci 34 creates 18454929 actors

Using a common Lisp implementation of a classical Actor model I could compute fibonacci 34 in around 22 seconds (sbcl 8GB, 24 cores), I want to understand why the Scala/Akka program fails to compute fibonacci 34 on the same computer

Kind regards
Taoufik

The code:

import akka.actor.{Actor, ActorSystem, Props}
import akka.pattern.ask
import akka.util.Timeout

import scala.concurrent.Await
import scala.concurrent.duration._

class FibonacciActor extends Actor {
  def receive: Receive = {
    case ("compute", n: Int) if n == 0 || n == 1 =>
      sender() ! ("result", n)
    case ("compute", n: Int) =>
      val worker1 = context.actorOf(Props[FibonacciActor])
      val worker2 = context.actorOf(Props[FibonacciActor])
      val originalSender = sender()

      worker1 ! ("compute", n - 1)
      worker2 ! ("compute", n - 2)

      context.become {
        case ("result", x: Int) =>
          context.become {
            case ("result", y: Int) =>
              originalSender ! ("result", x+y)
              }
          }
    case ("result", r: Int) =>
      sender() ! r
  }
}

class MainActor extends Actor {
  implicit val timeout: Timeout = Timeout(60.seconds)
  implicit val ec = context.system.dispatcher

  def receive: Receive = {
    case n: Int =>
      val worker = context.actorOf(Props[FibonacciActor], name = "FibonacciActor")
      val start = System.currentTimeMillis()

      (worker ? ("compute", n)).mapTo[(String, Int)].foreach {
        case ("result", x) =>
          println(x, System.currentTimeMillis() - start)
          context.system.terminate()
      }
    case _ => println("Error: message not recognized")
  }
}

object Main extends App {
  val system = ActorSystem("HelloSystem")
  println("start")
  implicit val timeout: Timeout = Timeout(60.seconds)

  val start = System.currentTimeMillis()

  val workers = for (i <- 1 to 1) yield {
    val worker = system.actorOf(Props[MainActor], name = "Main"+i)
    worker ! 34
    worker
  }

  Await.result(system.whenTerminated, Duration.Inf)

  println(System.currentTimeMillis() - start)
}

Given that the overhead of an Akka actor is a few hundred bytes on the Java heap, 18 million actors would consume about 5-7 GiB of heap… do you know how much heap you’re running this test on? The default JVM ergonomics are typically a quarter of system RAM, so running on a system with fewer than 16 GB would exhaust the heap before spawning the 18+ million actors.

It is possible that the CL implementation (especially if it’s local-only) is able to garbage collect actors as they go out of scope. Akka Actors must be manually stopped (or the ActorSystem terminated).

This hypothesis can be tested by modifying FibonacciActor to add context.stop(self) after originalSender ! ("result", x+y).

Note that while perhaps an interesting exercise to learn the APIs, with an increased heap to fit your application it should be fairly obvious that one actor per fibonacci operation is not a sensible design with Akka and that a more coarse grained division of work would be better (a single actor doing the entire calculation would likely be much much faster up to some point where it might, but probably not, be worth parallelising calculations over multiple actors).

In addition to the memory overhead a very minimal CPU-bound operation like this per message will be entirely dominated by allocating messages, passing those through queues, and scheduling the actors. That is unlikely what you want to focus spending your CPU cycles on.

In general I’d say that while stateless work-scheduling with actors can sometimes make sense, where actors are really useful is where state is interesting, for calculation/CPU-bound work it could for example be to cache the most often occurring requests to only have to calculate them once and have the result ready once a new request comes in (also possibly share with other parallel workers through broadcast).

1 Like

indeed all the tests run on a single node; the CL implementation uses 24 threads (the number of cores)

I tested with context.stop(self) instead of context.system.terminate() but did not improve things

I am implementing in Common Lisp (SBCL) a classical actor model (cannot ask? result as in Akka)

The objective is to compare my CL implementation against Akka (and Erlang)

I choose the Fibonacci test in order to create a lot of actors (the objective is not to implement fibonacci using actors)

I am puzzled to find that my implementation is faster than Akka (and Erlang), with my CL implementation I can compute fibonacci of 50 and even more, but with Akka I could not compute Fibonacci 34

So I want to understand why Akka is slower is there a configuration issue or an optimization that
I can use (should not touch the fibonacci algorithm, the purpose is to create a lot of actors)

;;;; SBCL test

ACTOR-TEST> (time (let ((s (make-instance 'sync))) (compute (make-instance 'sfib) 33 s) (join s)))
Evaluation took:
  1.279 seconds of real time
  13.391236 seconds of total run time (11.024703 user, 2.366533 system)
  [ Real times consist of 0.036 seconds GC time, and 1.243 seconds non-GC time. ]
  [ Run times consist of 0.052 seconds GC time, and 13.340 seconds non-GC time. ]
  1046.99% CPU
  2,700,813,523 processor cycles
  5,517,496,752 bytes consed
  
3524578 (22 bits, #x35C7E2)
ACTOR-TEST>

;;;; Akka test (scala)

import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import scala.concurrent.Await
import scala.concurrent.duration._

class FibonacciActor extends Actor {
  val parent:FibonacciActor = null
  var count = 2
  var value = 0
  var receiver: ActorRef = null
  def receive: Receive = {
    case ("compute", n: Int, rec: ActorRef) if n == 0 || n == 1 =>
      rec ! ("result", n)
    case ("compute", n: Int, rec: ActorRef) =>
      val worker1 = context.actorOf(Props[FibonacciActor])
      val worker2 = context.actorOf(Props[FibonacciActor])
      count = 2
      receiver = rec
      worker1 ! ("compute", n - 1, self)
      worker2 ! ("compute", n - 2, self)
    case ("result", r: Int) =>
      count = count - 1
      value += r
      if (count == 0) {
        receiver ! ("result", value)
      }
  }
}

class ReceiverActor extends Actor {
  implicit val ec = context.system.dispatcher
  var start: Long = 0

  def receive: Receive = {
    case n: Int =>
      start = System.currentTimeMillis()
      val worker = context.actorOf(Props[FibonacciActor], name = "FibonacciActor")
      worker ! ("compute", n, self)
    case ("result", r: Int) =>
      println("result",r, System.currentTimeMillis() - start)
      context.system.terminate()
      //context.stop(self)
    case _ => println("Error: message not recognized")
  }
}

object Main extends App {
  val system = ActorSystem("HelloSystem")
  println("start")

  val start = System.currentTimeMillis()

  val workers = for (i <- 1 to 1) {
    val worker = system.actorOf(Props[ReceiverActor], name = "Main"+i)
    worker ! 33
  }
  Await.result(system.whenTerminated, Duration.Inf)

  println(System.currentTimeMillis() - start)
}

;; result
start
(result,3524578,3427)
[INFO] [01/02/2024 09:38:44.345] [HelloSystem-akka.actor.default-dispatcher-25] [CoordinatedShutdown(akka://HelloSystem)] Running CoordinatedShutdown with reason [ActorSystemTerminateReason]
9493

The other issue, the Akka Fibonacci 33 took 3.4 seconds but the whole program took 9.4 seconds
I would like to understand why is this, why it took 6 more seconds to find that all actors are terminated

If you do less, it takes less cpu cycles. A simpler actor implementation not supporting behaviour switching, deathwatch, system messages etc. will be faster, but at the cost of not being able to do the same things.

If you benchmark your Akka version I expect you will see things like maintaining a shared tree hierarchy with actor names, managing actor lifecycle, as well as allocations needed for each actor dominating what it is spending time on. There is also a small cost associated with the typed APIs, as they are built on top of classic actor APIs, that will show up when benchmarking actor startup throughput, you could slim the memory and allocation overhead somewhat by using classic actors (at the cost of developer convenience and higher risk of doing the wrong thing).

The specific benchmarking of how fast and how many actors you can start is mostly interesting if that is what you expect users to spend their app time doing. I’d argue that the focus in Akka, is rather on actor (and stream) message throughput, but also rather to be fast enough while providing advanced functionality like cluster and higher level cluster components, than on performance alone.

1 Like

I am trying to assess my CL implementation by comparing it against Akka (and Erlang); the fibonacci test shows that my implementation is faster, the fibonacci test is implemented the same way in CL and Akka (same algo) and I think that the outcome of the test must imply few things, that is what I am trying to figure out, how my CL implementation compares to other systems (ie. Akka, Erlang, …).

Do you have any other actor based application that I can use to test my CL implementation, the objective is to correctly compare the implementations (ie. Matrix multiplication, …).

I do not have a lot of experience using actors in general, and Akka in particular, so I appreciate it very much if you can help in finding actor based applications that I can implement and use it for comparison.

We have quite a few benchmarks testing performance in Akka in the akka-bench-jmh module that you could run locally to see numbers and look at for inspiration on what we think is important to benchmark with an actor system.

I’m not suggesting context.stop(context.self) where your context.system.terminate is. context.stop(context.self) on the main actor is more or less exactly equivalent to context.system.terminate.

I’m suggesting that you have the individual FibonacciActors stop themselves after sending the result. That will at least alleviate the heap consumption and likely allow fib(34) to terminate.

I note the make-instance 'sync in the CL test… if that indicates that actor spawn is a synchronous operation in your actor implementation, the extra overhead of Akka’s async spawn may well account for the difference (async in general means “more overhead, but that overhead buys you more predictable scaling”).

the actor sync is just used to get the final result; only the current thread is waiting, but all actors are asynchronous and are running on the thread pool of the actor system.

Concerning the stop, I will try it as you said, Thanks