ActorSystem scheduler with swarm Replicas

I’m just having an issue with running schedulers in deployed containers of services in the swarm that running play framework as backend, the problem is if I scale that swarm to many Replicas actor system run schedulers in each replica without talking to each other, so the idea is clear to use the cluster, isn’t?

The first problem: Scaling is dynamic, and I can’t really configure ports for each instance, also the IP is dynamic, but still, I can use the hostname instead of IPs, what left for me is the port.

The second problem, I’m not sure how to configure the default play actor system to use the cluster?

This is a sample of code that starting up the scheduler:

    private Cancellable startAlertsMaintainer(ActorSystem system, Duration initialDelay, Duration interval) {
        logger.info("Maintainer Scheduler starting, first iteration will start after {} minutes with intervals of {} minutes", initialDelay.toMinutes(), interval.toMinutes());
        return system.scheduler().scheduleAtFixedRate(
                initialDelay,
                interval,
                () -> {
                    userAlertsMaintenanceService.maintainUserPreferences()
                            .thenAcceptAsync(aVoid -> logger.info("Users checked and subscriptions maintained"), system.dispatcher());

                },
                system.dispatcher()
        );
    }

Hi there,

What I usually do in that sort of cases, is that I have a Redis server that is used as an atomic lock for my actors. The problem with that is the load distribution. Like not having the same instance running all the actors cause it was the first launched. It can be solved by either scheduling the actors at fixed hours and giving them a random amount of delay before locking or more efficiently, if your task can be break down in smaller pieces, you can lock the “sub-tasks”.

For example if you want to calculate daily statistics for each users, well just lock the user you are currently processing. So the first instance will lock the first user and process it, meanwhile the second instance try to lock the first user but can’t and skip to the second user.