Databases in any system under development will inevitably require modifications, maintaining these scripts and identifying which scripts need to be applied to databases in various environments (production, staging and development) can be challenging.  These challenges are further compounded if the development of core features is done in multiple branches.

Where I am working we are currently versioning files with the following naming scheme:

            nnn-a short script description.sql

where “nnn” is an incrementing version number.

In each database there is a database_updates table which is a simple record of which scripts have been applied and when (the scripts are expected to update this table – the actual script and the database_update are all done within a transaction).

It’s not at all uncommon for scripts to be committed with a particular version number into one working branch only to find when merging back down to the trunk that scripts exist with the same version numbers, despite being very different scripts.  The responsibility for this resides with the person doing the merge, as there is always the file name that can be used to distinguish scripts.  This is a less than perfect solution – and I for one would like be able to run scripts checked in and not have to worry about exceptions because a script with the same version number has already been applied (unless of course I am running a script that has been run before). Apart from being a little disconcerting there is the very real chance that a script will be ignored with the assumption that the script has already been applied – locally this isn’t such an issue but can cause delays in our production and staging environments.

Suggestions

Maybe I am making something out of nothing here and by simply using the same merge tools that are used for code; along with a little common sense these issues can be sorted.  Perhaps, but I can’t help but think there is a “nicer” solution. None of the following suggestions make sense unless you have “baselined” the database and can recreate from script, a known good staring point.

In no particular order:

  • Separate repository for database scripts – perhaps using the svn:externals property to map a local directory to the URL-and possibly a particular revision-of the versioned resource: (http://svnbook.red-bean.com/en/1.1/ch07s04.html)
  • Database scripts that are committed should be treated as immutable – there is no point of versioning a file and then modifying it – it may have already been executed and any subsequent changes may not get applied.
  • File name is just version number so multiple scripts of the same number can not be added without a conflict and seeing as scripts should not be updated an svn merge is out of the question (or a “resolve with mine” type of operation). If this seems too restrictive (the file naming) perhaps an svn script can be written to check and reject a file if one with the same version prefix/suffix already exists?
  • File names are named in such as way you know which version of the database it applies to

In Practice

Let’s say the database is baselined and versioned 1.000 and 15 change scripts are added (1.001 to 1.015).  At this point a release branch is created along with a version 2.000 schema script in the trunk  - this script will produce the same database as the original script plus the 15 change scripts.

New scripts added in the trunk are for the version 2 database, so if 4 scripts were added there would be scripts 2.001 to 2.004.  If a script is needed in the release branch – 1.016, the production system could simply receive this change as production is still running with version 1 – the problem is how to integrate 1.016 into the trunk.  One solution could be to create a new script in the trunk, 2.005, that has the 1.016 changes.  All version 1 databases can simply run all version 1 change scripts, version 2 databases start with the version 2 baseline script and apply any 2.x change scripts.

This works ok when branches are created for release as schema changes and code modification slow.  Hopefully only bug fixes are committed to the branch and not new features.  Modifications made in the branch are generally merged down to the trunk where work not scheduled for a particular release resumes.

This is all fine and dandy until we come to “feature” branches – this is where I get stuck :)

Suggestions most welcome!

Advertisement