I have the following cluster members:
- m1 : seeds = [m1, down_member]
- m2 : seeds = [m2, m1, down_member]
down_member is a member that just crashed.
If I start m1 and m2 pretty much at the same time, both members try to contact the seeds, and since no other seed is up, eventually m1 starts to join to itself. In the meantime, m2 sends a Join request to m1, which replies with InitJoinNack because it hasn’t fully initialized yet. The m2 upon receiving the InitJoinNack stops retrying to join m1, and eventually joins to itself, creating a separate cluster.
I’d like to understand why m2 doesn’t re-try after the InitJoinAck, and if there’s anyway I can force this.
I’m not able to ensure the seeds are correct all the time, because of the dynamic nature of the cluster (intances going down or up any time, i.e. autoscale/downscale)