Nodes not rejoining after cluster spilt

Hi,

We use akka v2.4.10 and have 4 nodes in an akka cluster (ex. A, B, C, D) and we observe the below cluster split scenario.

  1. When the actor system from a node (i.e D) gets restarted, we received “Associate timed out after [15000 ms]” message when tried connecting with other nodes.
    The cluster.state().members() contains only node D when the actor system is completely started.

Is there a way to get the node D back into the cluster (where A, B and C present) ?

  1. We receive “Associate timed out after [15000 ms]” message during Association failure for a node from akka.remote.ReliableDeliverySupervisor.

    Guess, the timeout value is set as default by Akka, is there any setting to configure this time out value manually (i.e from configuration file) ?

Thanks,
Makesh

Do you mean that you see this when D tries to join to ABC? That shouldn’t happen. Difficult to guess what could be causing it without inspecting logs. Could it be that your network is not configured for peer-to-peer, so that connections can only be opened in one direction (firewalls).

BTW you are using a very old version that is end-of-life.

Hi Patrik,

Thanks for your response.

In our case, when we start D node, it tries to join the cluster(ABC). But ABC remains unreachable due to some network unavailability. So D forms a separate cluster.
At this point, the Cluster.state().members() on D has only D node and not other nodes.

No firewalls present at our end and even when the network becomes stable at later time, D remains in a separate cluster.
Now to recover from this, we restart the node D (actor system) manually to join the ABC cluster.

So the query is, any option available in Akka to rejoin these cluster islands without restarting the nodes?

Regards,
Makesh

Then you have configured node D as the first element in the seed-nodes list, otherwise it wouldn’t join itself.

Some reading: