Database evolutions - why you should avoid them!

During deployments, it is not unusual for both old and new versions of the service to be running at the same time. The new nodes must start before the old nodes are terminated. This means your database MUST be compatible with both previous and next versions of your service.

If you want to alter or rename a column - don’t. Instead,

  1. add a new column (must not have non-null constraint or it won’t be backward compatible)
  2. update code to insert data into the new column while also maintaining the old column
  3. run migration script to ensure all data inserted before 2 was implemented is updated
  4. update code to stop using old column
  5. now it is safe to drop the old column

In theory, you should be able to do this workflow with evolutions… however, it seems that evolutions require the database to be at the desired version or the service will not start.

For example, you deploy a new version of your service, and the database is migrated forward. For one reason or another your deployment fails and you roll back or perhaps one of your old nodes respawns during the deployment of the new service (or maybe your migration script locked the database for too long and the old nodes’ health check failed so it respawned).

You may find yourself in a bad spot where your old nodes will no longer start throwing the following exceptions at startup.

WARN p.a.d.e.ApplicationEvolutions: Your production database [] needs evolutions, including downs!

Is there a way to prevent evolutions from blowing up if the database is newer with the assumption that new database versions will be backward compatible at least for as long as it matters?

The best workaround I can think of is to enable downs in production but use some non-sense script such as SELECT 1.

However, this will also cause the up evolution to be rerun once the new version is redeployed which may fail if your down script does not actually revert the change. Thus it is ALSO necessary that your migration script be idempotent, meaning that it can be run multiple times. Rather than blindly adding a column with the assumption that it does not already exist, it should check if the column exists before adding it.

So while it is possible perhaps to still use evolutions with the above rules, is deployment time really the best time to be making database changes? you have multiple nodes being started simultaneously many of them blocked waiting for the database lock to be released possibly respawning if the lock is held too long…

Are you still using evolutions?
Do you have zero downtime deployments?
How have you mitigated these issues?
If you are not using evolutions what else are you using?

1 Like

Hy!

First rule; don’t use downs ever!

We are writing code backward compatible with one version. So, our high-level flow looks like this;

  • version 1 has a table
  • version 2 modifies the table backward compatible (new col, copy data, write both cols)
  • version 3 removes the write both logic (only write the new renamed one)
  • version 4 clears the unused col

At any point, we can rollback one version.

We do this both with tables and apis. I usually have a backlog with cleanup tasks, but if you deploy often enough you can live without a list :) (I like to use the deprecated annotations, so you get warns from the compiler and don’t forget to clean up older stuff) I even use this method with bigger refactors.
Also, you can use this with feature toggles too (do a fancy if into the code, deploy it, switch the if at runtime so you can see if the new code works expected, if not, switch it back so no real user impact, if so, in the next version you can delete the ifs and the old code).

EDIT; ohh, and you should write idempotent evolutions so create table if not exists , and so on. You should have the power to run any evolution twice.