Hi
We use akka cluster with persistence version 2.6.6_2.13 and our application runs on raspberry pi 3’s (compute module).
When having -Xmx
set to 128m for the jvm, it takes about 26 hours until the heap has almost no space left. There is no OutOfMemoryError
but instead the max GC pause becomes 2 sec 339 ms and Full GC happens every minute (analyzed using GCeasy).
This is not acceptable because we have setup timeouts in actors of 1 second which means they always time out if the application is paused for more than 1 second.
Goal: The max GC pause should not exceed 700 millis
Increasing the heap size from 128m to 256m makes it even worse, as expected (max GC pause then is 16 sec 48 ms !).
Heapdump analysis with eclipse’s memory analyzer tool “mat“ suspects a memory problem with an instance of the class akka.actor.LocalActorRef
.
Somehow, there is one of our actors in the system, called “propertyHostChannel”, which holds an ActorRef
instance which occupies 42.22% of the heap (this is getting worse the longer the application runs). It is a LocalActorRef
.
Additionally, we can see per-minute bursts of ~400 (!) debug logs like:
[DEBUG] [a.a.L.Deserialization] [akka.actor.LocalActorRefProvider.Deserialization] [SELServer-akka.actor.default-dispatcher-28] - Resolve (deserialization) of path [user/SELServe
r/PropertyHostChannelRouter/VirtualPropertyHost/$Q7#65017950] doesn't match an active actor. It has probably been stopped, using deadLetters.
Eventually relevant code excerpt of the ChannelRouterActor
(the actor behind the “propertyHostChannel” instance):
private PSet<ActorSelection> clusterRoutersOfSameType = HashTreePSet.empty();
@Override
public Receive createReceive() {
return receiveBuilder()
.match(AddIdWithPropsToRegistry.class, this::onAddIdWithPropsToRegistry)
.match(RemoveIdWithPropsFromRegistry.class, this::onRemoveIdWithPropsFromRegistry)
.match(SendTo.class, this::onSendTo)
.match(SendToCluster.class, this::onSendToCluster)
// cluster events
.match(ClusterEvent.CurrentClusterState.class, this::onCurrentClusterState)
.match(ClusterEvent.MemberUp.class, mUp ->
addToClusterRouters(mUp.member())
)
.match(ClusterEvent.ReachableMember.class, reachableMember ->
addToClusterRouters(reachableMember.member())
)
.match(ClusterEvent.UnreachableMember.class, unreachableMember ->
removeFromClusterRouters(unreachableMember.member())
)
.match(ClusterEvent.MemberLeft.class, memberLeft ->
removeFromClusterRouters(memberLeft.member())
)
.match(ClusterEvent.MemberDowned.class, memberDowned ->
removeFromClusterRouters(memberDowned.member())
)
.build();
}
private void addToClusterRouters(Member member) {
final ActorSelection cousinRouter = getCousinRouter(member);
if (cousinRouter.anchor().path().address().host().isDefined()) {
clusterRoutersOfSameType = clusterRoutersOfSameType.plus(
cousinRouter
);
log.debug("Added cousinRouter: {} to clusterRoutersOfSameType: {}", cousinRouter, clusterRoutersOfSameType);
}
}
private void removeFromClusterRouters(Member member) {
final ActorSelection cousinRouter = getCousinRouter(member);
clusterRoutersOfSameType = clusterRoutersOfSameType.minus(
cousinRouter
);
log.debug("Removed cousinRouter: {} from clusterRoutersOfSameType: {}", cousinRouter, clusterRoutersOfSameType);
}
private ActorSelection getCousinRouter(Member member) {
final String name = getContext().getSelf().path().name();
// each node has a ChannelRouterActor at "/user/SELServer/" + name, so select that path
return getContext().actorSelection(member.address() + "/user/SELServer/" + name);
}
Questions
- Why do we have bursts of the mentioned DEBUG logs?
- Why is there one instance of
LocalActorRef
which occupies so much heap? - How can we get rid of the DEBUG log bursts and how can we fix the memory issue so gc does not take more than 700 millis?
Please let us know if you need more information.
Thanks