Strange Behavior with Cluster Aware Group

Hi There,
I experiencing a strange behavior using a cluster-aware group.

‘akka-cluster-tools_2.12’, version: ‘2.5.14’

  1. I run a 3 node cluster in the same JVM.
  2. The cluster is up and stable (no nodes join or leaving).
  3. I create an actor instance that creates a cluster group in the preStartup or constructor method.
  4. I instantly send messages to the routees of my group without a problem.

Though, when I create a second actor instance of the same actor it ‘takes much longer’ to create the cluster aware group even though the same routees should be in the group, there is no change in cluster nodes and there is only reasonable load on the system. (100 messages the most per minute and running a few just hello world streams (doing nothing than passing data through a map). Therefore my messages to the router all end up in the dead letter box and my system is getting out of state.

Works with the first Actor instance but not the second one:

 Iterable<String> routeesPaths = Collections.singletonList("/user/" + NodeManagerActor.NAME);
        boolean allowLocalRoutees = false;
        Set<String> roles = new HashSet<String>();
        roles.add(ClusterNode.COMPUTE_ROLE_NAME);
        ActorRef nodeManagerGroupPool = context().actorOf(new ClusterRouterGroup(new RoundRobinGroup(routeesPaths), new ClusterRouterGroupSettings(totalInstances, routeesPaths, allowLocalRoutees, roles)).props(), "streamManagerGroupRouter");
        for (int i = 0; i < totalInstances; i++) {
            nodeManagerGroupPool.tell(new CreateStreamRunner(streamName), self());
        }

Always works:

     Iterable<String> routeesPaths = Collections.singletonList("/user/" + NodeManagerActor.NAME);
        boolean allowLocalRoutees = false;
        Set<String> roles = new HashSet<String>();
        roles.add(ClusterNode.COMPUTE_ROLE_NAME);
        ActorRef nodeManagerGroupPool = context().actorOf(new ClusterRouterGroup(new RoundRobinGroup(routeesPaths), new ClusterRouterGroupSettings(totalInstances, routeesPaths, allowLocalRoutees, roles)).props(), "streamManagerGroupRouter");
        
        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        
        for (int i = 0; i < totalInstances; i++) {
            nodeManagerGroupPool.tell(new CreateStreamRunner(streamName), self());
        }

Is that expected behavior that it just randomly take some time to setup a cluster group?
How to work around this? Is there a call back / future / state I can use to find out when the cluster group ‘is ready for business’?
Thanks for any help.

Stefan

As the router group doesn’t need to do any communications itself, it just looks at the set of nodes in the cluster, their states (and potential roles) it should never take much time to startup, so that part sounds strange.

Can you try to profile your app and see what it is spending time on?
Do you have a complete reproducer you can share?