We are planning to move our Lagom-backed service to DData for Akka-cluster. Per the Lagom migration guide (as well as the Akka documentation), this requires downtime, as nodes that use DData cannot connect to ones who do not use DData without corrupting the journal.
Obviously, one way to achieve this is to shut down our servers, then start them up again with the DData implementation, we are considering an alternative mechanism to reduce the downtime and keep our capacity high.
We are considering booting up a second environment, with nearly the same configuration as our existing one except the seeds, and without the read side processors running. This second environment would use its own nodes as seeds, effectively creating a split brain. We would make sure no traffic is going to it when we start it up, so without any writers (Entity, RSPs, and the shard coordinator writer, as this cluster uses DData) we wouldn’t expect any corrupt data to be written. Then we would cut traffic to the old environment and repoint our DNS entries to the new environment. Finally, we’d shut down our old environment, then turn on the read-side processors in our new environment.
We’d have downtime between when we cut traffic to our old environment and when our clients refresh their DNS cache after we update our DNS, but we think that is it. We think this ensures that only one set of nodes is writing to any specific data at any point in time, so it shouldn’t corrupt any data.
Are we missing some data? Does this seem like a valid strategy?