Change management for databases?
If you are a software developer, you are most likely aware of (and using) source control software such as SVN, CVS, Clearcase, or one of the other version control systems. Version control of source code has been a given for many years and is a critical piece of every software project.
When a project requires a database, it is typical for the software developer to simply version the SQL for that database in the same way that the source code is versioned… namely, by placing it directly into the version control system and treating the SQL just like any other source code. But what about the data that may be in the db? I think we could all agree that there may be a reduced need to version production data, but the question still remains: ‘How do you migrate the existing production data when changing the database structure?’
If the schema changes, then it is very likely that a transform to the existing data will be required. A transform is simply an operation that takes a certain piece of data as an input, applies an algorithm to it and then outputs the “transformed” data. As anyone that has done any database development can imagine, the more complex the database is, the more difficult it is to manage a set of scripted transforms.
In addition to dealing with data transforms, what about the case where you have multiple product deployments in the field and each product may have a different database schema version on it! A simplistic way to deal with this is to create a meta-data table in the database which indicates what version the schema is at. Another methodology is to simply ignore the schema version and simply make small incremental changes…granted this is a pretty poor approach to database management…
What is needed is a tool that integrates the migration of BOTH the schema and the data…
Liquibase to the rescue!
Fortunately, there is a tool which does manage both schema and data migration. Liquibase has been publicly available for the last few years and is slowly making inroads into both open source and commercial/industrial products and build processes. Liquibase can perform the following tasks:
- Simple XML schema and transform definitions are stored in a human readable file
- Refactoring (schema modifications & data transforms) can be grouped into “changesets” allowing modularization of one or more related refactorings
- Bi-directional schema and data transforms (supports roll-forward and roll-back)
- Version based conditional execution of transforms
- Integration with hibernate (a persistence abstraction layer for java)
- Support for many databases
- Human readable refactoring log in a meta-data table
- Conditional execution of changesets based on database type
- Direct integration with the Ant build system
Schema Abstraction – Power that comes at a cost
One of the more powerful aspects of Liquibase is that by having the database developer define the schema and transforms in an abstracted (non-SQL) form, the design is more or less database engine agnostic. This means that a persistence solution defined in Liquibase can be applied to one database vendor today (i.e. SQLite) and then when the products requirements change, the database engine can change too (i.e. to MySQL).
Abstraction is neither a panacea or a silver-bullet. With all abstraction schemes comes some cost. The cost of abstraction depends on many things, including the depth of abstraction and the run-time environment in which the abstraction will execute. Typically, we see abstraction costs manifesting in several areas including run-time performance, code complexity, ease of debugging and over-generalization. In the case of Liquibase, the cost of the abstraction from SQL into the higher-level changeset meta-language is over generalization.
For example, Liquibase only supports a very limited number of data-types. In a recent trial of Liquibase on an existing MySQL database, I found that Liquibase was extracting the schema from the database perfectly except in the case of the TIMESTAMP datatype. Liquibase changed every TIMESTAMP column to a DATETIME column. This is because Liquibase only supports the very generic DATETIME datatype since it is common across the most database engines… The crux is that in a MySQL database, DATETIME and TIMESTAMP are very different datatypes. With TIMESTAMP one can configure the column to update whenever the row is updated. The ON UPDATE functionality does not work with a DATETIME datatype.
Fortunately, there is a way to solve this problem. By making use of the <modfySql> construct within a changelog, you can effectively customize the way that Liquibase generates SQL. If you are using Ant as your automated build tool, then you can further extend your customization of the the Liquibase generated artifacts by making use of Ant’s regular expression based string replacement tasks.
Of ChangeSets and ChangeLogs
The core concepts to understand when working with Liquibase are ChangeLogs and Changesets.
Changesets – change that really works
A changeset is the base unit in the Liquibase system. A changeset is an XML construct within a changelog XML file. A changelog XML file may contain one or more changesets. A changeset defines one or more changes that will be applied to the database. These changes may include schema refactorings, data transformations and even direct SQL statements. A changeset has a unique identifier which allows Liquibase to accurately determine when to execute the changset against a particular database.
A changeset can contain a context parameter which allows the user to define when a particular changeset will be executed. One could have a number of changesets with a “test” context and a number of changesets defined with a “production” context. This would allow the user to conditionally execute certain changesets in a production environment and other changesets in a test environment.
For example, to create a table on your database called ‘Dogs’ with an auto-increment id column and a breed string column, you could create your changeset like this:
In this example, we have created a single changeset which creates a table with two columns. Note that the changeset id is set to 1. If this example had a second changeset we would have given the second changeset an id of 2. A changesetid must be unique as it is used by Liquibase to determine if he changeset has been applied to the database.
When first working with Liquibase changesets I had thought that the changeset id was an incremental value that would direct Liquibase regarding the order of execution of the changesets. Interestingly, this is not how the id works, rather, the id is simply that…an id.
The order of operation for changesets within a changelog file is based on the order of definition within the file.
Changelogs – a compilation of Changesets
A changeLog is an XML file which contains one or more changesets. A changelog file can include another changelog file which allows for a type of componentization of the changesets. Preconditions can be set in the changelog file which indicate what conditions must be met in order for the changelog to be executed.
Interacting with a database engine
Liquibase is implemented in pure java and requries JDBC compliant database engine “drivers”. JDBC driver packages are available for most of the databases in popular use today. Some database engines have multiple driver projects. It is up to the person using Liquibase to determine which database driver to use.
Liquibase supports many, many database engines including a number of the most popular such as MySQL, SQLite, Oracle, MS SQL Server, PostgreSQL, HSQL, DB2 and others.
Liquibase can be executed from the command line, ant tasks or direct java calls.
Database configuration management workflows
There are a numbers of ways that Liquibase can be employed in order to solve real-world issues when managing a database throughout it’s life-cycle. Here are several “workflows” that utilize Liquibase as the central management framework:
Refactoring of an existing (pre-Liquibase) database
Let’s assume that you have a database which has been in use for some time and you need to make some major architectural changes. Without Liquibase, you might write a number of scripts using various technologies such as SQL and one or more scripting languages. These scripts would do everything from change the schema to migrate the data from the old schema to the new schema. It will be very important that the scripts are executed in the correct order or the data will likely be corrupted.
With Liquibase, the easiest approach is to have Liquibase extract the schema from the existing database into a set of Liquibase changesets, stored in a Liquibase changelog file. You can then use Liquibase to extract the data from the existing database. This process creates what I call a baseline Liquibase changelog.
Now that you have your baseline, you can begin creating new changesets that describe the various refactorings that you wish to apply to the existing baseline. Remember that each changeset needs to have a unique identifier. Within your changeset’s you can describe the various roll-forward and roll-backward transforms. In addition to pure SQL transforms, Liquibase can directly call compiled java classes if those classes implement the Liquibase interface.
Once you have defined your refactorings, you can use Liquibase to apply them to the database. Obviously, you should always perform a full schema and data backup using your native database vendor tools before attempting this.
Creation of a new database using Liquibase
You can also make use of Liquibase if you are starting fresh and are not bound to an existing database schema. By defining your database schema within the Liquibase changelog rather than in SQL, you can make use of all of the benefits of the abstracted model that Liquibase provides. Additionally, you can utilize Liquibase throughout the full lifecycle of your database. In this scenario, your workflow would look like this:
- Create changesets in the Liquibase changelog
- Use Liquibase to execute the changelog on your database engine
- Make changes to the database by creating new changesets in the changelog
- Use Liquibase to execute the changelog on the database engine to apply the new changesets
- As your database moves through it’s lifecycle, you may have multiple changes to the schema or to the default dataset. Since you are managing your database structure through Liquibase, you will simply repeat steps 3 and 4 for existing databases.
Changing database vendors with Liquibase
Liquibase can also be used to migrate a schema from one database vendor to another. For example, let’s say that you have a legacy database that was implemented in SQLite as an in-process, embedded database. In the next revision of the product, you determine that you would rather use a database server such as MySQL in order to gain the benefit of greater scalability and a single datastore for multiple application deployments.
In this case you can use Liquibase to extract the schema and data from the existing SQLite db into the higher-level changeset description model. Once extracted into a ChangeLog, you can ue Liquibase to apply the changes to the new database engine. It actually works pretty well, although you should double check the datatypes in the new engine since Liquibase defaults to a very small set of ultra-generic types.
Managing test data with Liquibase
Since Liquibase has the ability to execute arbitrary SQL against a db engine, it can also be used to insert and remove test data from a database.
Migrating customer data in the field with Liquibase
Migrating data between various versions of a schema can be an arduous task, full of pitfalls and easy to make mistakes. The versioned changeset model implented by Liquibase can help bring some structure and modularization to this process.
By making use of the changeset model, one can structure the migration process and break it into multiple smaller migrations, each of which can be defined by a single changeset.
Liquibase presents a very useful tool for adding structure, repeatability and a common execution engine to the db portion of your product life-cycle. While it will not totally make up for a lack of good process in your db management approach, the act of integrating Liquibase into a product will necessitate a review of the current db management processes. Overall, I think that Liquibase is a good tool that allows the database to be managed in a manner very similar to that of source code. The tool has an easy to learn xml syntax and can be integrated into ant or simple batch scripts making it quite versitile. If you have been thinking that there is a better way to manage your db, there is… and it’s name is Liquibase.