Memory leak of RepointableActorRef due to race condition


(Sasidharan) #1

Hi,

we are using play-framework 2.6.17 version with scala version 2.11 for our service.

We notice the heap usage increases gradually over a period of time and from the analysis of heap dump,
the number of RepointableActorRef instances were around 105K with retained heap size of over 200MB.
This in turn cause the fullGC cycle to be triggered frequently.

GC root of these instances point to supervisor(StreamSupervisor-0) of akka.stream.impl.PhasedFusingActorMaterializer instance for AkkaHttpServer.

We looked at the associated ActorGraphInterpreter instances which has the ‘interpreterCompleted’ value as ‘true’ with no subcribersPending. From our code understanding, the RepointableActorRef should be removed from the childrenContainer once the interpreter execution is completed. But in this case,

We notice following sequence of action happening

  • as part of preStart() method in ActorGraphInterpreter, it checks for ‘activeInterpreters’ which is empty and stops the actor.
  • But during this time, actor still has ‘UnstartedActorCell’ which receives the stop message and swapping of active ActorCell has not happened.
  • Now supervisor which added this actorRef as child never gets DeathWatchNotification since the stop message is handled by UnstartedActorCell.

This seems to be a kind of race condition due to which the RepointableActorRef references are retained in the childrenContainer of supervisor forever.

Could someone give pointers on if this issue is already resolved in later version.

Thanks.


(Johan Andrén) #2

That is interesting, would it be possible to share a minimal reproducer?


(Sasidharan) #3

I will try to come up with sample project which can reproduce this issue.


(Sasidharan) #4

I modified the sample play-scala starter project with using Accumulator.flatten api which creates a graph ‘Source. fromPublisher (publisher).toMat(accumulator.toSink)(Keep. right ).run()’ as part of futureToSink method call in FlattenedAccumulator initialization.

When hitting the endpoint ‘http://localhost:9000/count’ at 1000 req in a minute, I can see the following logs,

./play-scala-starter-example -Dplay.http.secret.key=abcdefghijk | grep UnstartedCell
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-463-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-1423-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-1961-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2327-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2328-0-lazySource
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2330-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2780-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2807-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2876-0
UnstartedCell stop method akka://application/system/StreamSupervisor-0/flow-2984-0

In my local setup, I had custom akka-actor.jar in the lib folder containing above debug message in stop method of UnstartedCell class.

Kindly find the below project which has the modified code,
https://github.com/sasiharan/play-scala-starter-example/tree/testing.

Pls let me know if you need any additional information.