Evolutionary Database Design
Evolutionary design in databases occurs when developers can build and evolve the structure of the database as requirements change over time. Database schemas are abstractions, similar to class hierarchies. As the underlying real world changes, those changes must be reflected in the abstractions developers and DBAs build. Otherwise, the abstractions gradually fall out of synchronization with the real world.
How can architects build systems that support evolution but still use traditional tools like relational databases? The key to evolving database design lies in evolving schemas alongside code. Continuous Delivery addresses the problem of how to fit the traditional data silo into the continuous feedback loop of modern software projects. Developers must treat changes to database structure the same way they treat source code: tested, versioned, and incremental.
DBAs and developers should rigorously test changes to database schemas to ensure stability. If developers use a data mapping tool like an object-relational mapper (ORM), they should consider adding fitness functions to ensure the mappings stay in sync with the schemas.
Developers and DBAs should version database schemas alongside the code that utilizes it. Source code and database schemas are symbiotic — neither functions without the other. Engineering practices that artificially separate these two necessarily coupled things cause needless inefficiencies.
Changes to the database schemas should accrue just as source code changes build up: incrementally as the system evolves. Modern engineering practices eschew manual updates of database schemas, preferring automated migration tools instead.
Database migration tools are utilities that allow developers (or DBAs) to make small, incremental changes to a database that are automatically applied as part of a deployment pipeline. They exist along a wide spectrum of capabilities from simple command-line tools to sophisticated proto-IDEs. When developers need to make a change to a schema, they write small delta scripts, as illustrated in Example 5-1.
Example 5-1. A simple database migration_
CREATE TABLE customer (
id BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1) PRIMARY
firstname VARCHAR(60), lastname VARCHAR(60)
The migration tool takes the SQL snippet shown in Example 5-1 and automatically applies it to the developer’s instance of the database. If the developer later realizes they forgot to add date of birth rather than change the original migration, they can create a new one that modifies the original structure, as shown in Example 5-2.
Example 5-2. Adding date of birth to existing table using a migration_
ALTER TABLE customer ADD COLUMN dateofbirth DATETIME;
ALTER TABLE customer DROP COLUMN dateofbirth;
In Example 5-2, the developer modifies the existing schema to add a new column. Some migration tools support undo capabilities as well. Supporting undo allows developers to easily move forward and backward through the schema versions. For example, suppose a project is on version 101 in the source code repository and needs to return to version 95. For the source code, developers merely check out version 95 from version control. But how can they ensure the database schema is correct for version 95 of the code? If they use migrations with undo capabilities, they can “undo” their way backwards to version 95 of the schema, applying each migration in turn to regress back to the desired version.
However, most teams have moved away from building undo capabilities for three reasons. First, if all the migrations exist, developers can build the database just up to the point they need without backing up to a previous version. In our example, developers would build from 1 to 95 to restore version 95. Second, why maintain two versions of correctness, both forward and backward? To confidently support undo, developers must test the code, sometimes doubling the testing burden. Third, building comprehensive undo sometimes presents daunting challenges. For example, imagine that the migration dropped a table — how would the migration script preserve all data in the case of an undo operation?
Once developers have run migrations, they are considered immutable — changes are modeled after double-entry bookkeeping. For example, suppose that Danielle the developer ran the migration in Example 5-2 as the 24th migration on the project. Later, she realizes dateofbirth isn’t needed after all. She could just remove the 24th migration, and the end result on the table is no column. However, any code written between the time Danielle ran the migration and now assumes the presence of the dateofbirth column, and will no longer work if for some reason the project needs to back up to an intermediate point (e.g., to fix a bug). Instead, to remove the no-longer needed column, she runs a new migration that removes the column.
Database migrations allow both database admins and developers to manage changes to schema and code incrementally, by treating each as parts of a whole. By incorporating database changes into the deployment pipeline feedback loop, developers have more opportunities to incorporate automation and earlier verification into the project’s build cadence.