Hi guys,
I’m distributing quite heavy computation with Akka Cluster. Orchestrating actor is spawning subtasks which send their results to reducing actor which in the end is collecting sub results and producing final result. Pretty simple.
I use round-robin-group configured as follows:
deployment {
/parent/child {
router = round-robin-group
routees.paths = ["/user/child"]
cluster {
enabled = on
allow-local-routees = on
use-role = compute
}
}
}
and
system.actorOf(ChildWorker.props(...).withRouter(new RoundRobinPool(workerInstances)), "child");
It’s working as expected, but to gain more speed i though about using BalacingPool on the local ChildWorker to have more flexibility in handling subtasks of quite different duration. Results were a but surprising to me.
system.actorOf(ChildWorker.props(...).withRouter(new BalancingPool(workerInstances)), "child");
It turned out that rebalancing is working fine on the machine which is initiating the processing, but all other remote machines are always using single actor to process the subtasks.
This is the how it looks. Table below has 3 columns: number of tasks processed on given hostname and by actor having UUID. Clearly rebalancing is working nicely on tmp06 but as you can see all other machines use only one actor which handles a lot of tasks which is making the processing way slower.
7 tmp06 UUID[-1966118007]
8 tmp06 UUID[-1198559308]
8 tmp06 UUID[1380304217]
8 tmp06 UUID[-1393713980]
8 tmp06 UUID[-231652641]
8 tmp06 UUID[954268528]
9 tmp06 UUID[1575392504]
9 tmp06 UUID[241401640]
9 tmp06 UUID[253473252]
9 tmp06 UUID[662565581]
10 tmp06 UUID[-1021666439]
10 tmp06 UUID[-1740072156]
10 tmp06 UUID[-1944934763]
11 tmp06 UUID[1233386530]
11 tmp06 UUID[-1461303810]
11 tmp06 UUID[-358932747]
11 tmp06 UUID[-375831361]
11 tmp06 UUID[-381997833]
11 tmp06 UUID[973420023]
12 tmp06 UUID[-1595089979]
12 tmp06 UUID[1620412350]
12 tmp06 UUID[1882766899]
12 tmp06 UUID[1978863154]
12 tmp06 UUID[218773515]
12 tmp06 UUID[-29776901]
12 tmp06 UUID[695670443]
13 tmp06 UUID[-1682686364]
13 tmp06 UUID[-1899851456]
13 tmp06 UUID[307375540]
14 tmp06 UUID[1316354448]
14 tmp06 UUID[-1375544402]
14 tmp06 UUID[488103872]
14 tmp06 UUID[-768640047]
15 tmp06 UUID[1613489745]
15 tmp06 UUID[290922216]
18 tmp06 UUID[1049063036]
349 tmp01 UUID[-1177891195]
360 tmp03 UUID[-1159146073]
367 tmp04 UUID[-2058093493]
369 tmp02 UUID[982203959]
383 tmp05 UUID[-988775137]
407 lhch03 UUID[1661774721]
407 lhch04 UUID[1801962384]
So coming to my question: are there any restrictions in using local BalancingPool together with clustered RoundRobinGroup which could lead to such behavior?
(I’m using Akka 2.6.4 with Java and JDK 1.8)
Any help appreciated