Database Migrations Concepts

Managing Database Schema Evolution in Production Applications

Understanding Database Migrations

Database migrations are a structured way to evolve your database schema over time. They allow you to track, version, apply, and revert changes to your database structure in a controlled and systematic manner.

Think of migrations as a version control system specifically for your database schema. Just as Git tracks changes to your code, migrations track changes to your database structure and data.

Analogy: Home Renovation Blueprint

Database migrations are like detailed blueprints for renovating a house. Each blueprint shows exactly what changes to make (add a room, remove a wall, update the electrical system), the order to make them in, and includes a way to undo the changes if something goes wrong. With these blueprints, multiple contractors (developers) can work on the house at different times, and everyone understands exactly what changes have been made and what the house should look like at each stage.

graph TD A[Initial Database Schema] --> B[Migration 1] B --> C[Migration 2] C --> D[Migration 3] D --> E[Current Database Schema] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style B fill:#ffcc99,stroke:#333,stroke-width:2px style C fill:#ffcc99,stroke:#333,stroke-width:2px style D fill:#ffcc99,stroke:#333,stroke-width:2px style E fill:#d9f7be,stroke:#333,stroke-width:2px

Why Database Migrations Are Essential

Key Benefits of Database Migrations

Challenges Without Migrations

Without using database migrations, teams often face several challenges:

Real-World Scenario: Startup Growth

Imagine a startup that began with a simple database model for their MVP. As they gained customers and added features, the database evolved significantly. Without migrations, each developer might make ad-hoc changes to their local database, leading to inconsistencies across environments. When deploying to production, they'd need to remember all changes and manually apply them without errors. With migrations, each change is documented, tested, and can be automatically applied to any environment, significantly reducing errors and downtime.

Key Concepts in Database Migrations

Migration Files

Migration files are scripts that define a specific change to your database. Each migration typically includes:

Migration State Table

Migration frameworks typically maintain a special table in your database that tracks which migrations have been applied. This table might contain:

Migration Lifecycle

flowchart LR A[Create Migration] --> B[Test Migration] B --> C[Apply Migration] C --> D[Commit Migration File] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style B fill:#f5f5f5,stroke:#333,stroke-width:2px style C fill:#f5f5f5,stroke:#333,stroke-width:2px style D fill:#f5f5f5,stroke:#333,stroke-width:2px

Types of Database Changes

Migrations can handle various types of changes:

Add Users Table Migration 001 Add Email Field Migration 002 Add Profiles Table Migration 003 Add Foreign Key Migration 004 Add Indexes Migration 005 Current State

Migration Strategies

Migration Naming Conventions

Well-named migration files help understand the database evolution at a glance:

Zero-Downtime Migrations

For production applications, zero-downtime migrations are crucial. This involves multiple smaller migrations instead of a single large change:

sequenceDiagram participant DB as Database participant App as Application Note over DB,App: Step 1: Add new column (doesn't affect existing code) App->>DB: Create Migration: Add new_column DB-->>App: Column added (nullable/with default) Note over DB,App: Step 2: Update application to write to both columns App->>DB: Write to old_column AND new_column DB-->>App: Data saved in both places Note over DB,App: Step 3: Migrate existing data App->>DB: Create Migration: Copy data from old_column to new_column DB-->>App: All existing rows updated Note over DB,App: Step 4: Update application to only read from new column App->>DB: Read only from new_column DB-->>App: Return data from new_column Note over DB,App: Step 5: Remove old column App->>DB: Create Migration: Drop old_column DB-->>App: Column removed

Safe vs. Unsafe Migrations

Understanding which database changes are safe to perform in production is crucial:

Safe Migrations

  • Adding a new table
  • Adding a new column (nullable or with default)
  • Adding an index (with concurrent creation)
  • Adding a constraint that existing data satisfies

Unsafe Migrations (Require Special Handling)

  • Dropping a table
  • Dropping a column
  • Adding a NOT NULL column without default
  • Renaming a table or column
  • Changing column type

Migration Tools for Different Databases

SQL Database Migration Tools

Tool Language/Framework Supported Databases
Knex.js Migrations JavaScript/Node.js PostgreSQL, MySQL, SQLite, MSSQL
Sequelize Migrations JavaScript/Node.js PostgreSQL, MySQL, SQLite, MSSQL
Flyway Java PostgreSQL, MySQL, Oracle, SQL Server, etc.
Liquibase Java Most SQL databases
Django Migrations Python/Django PostgreSQL, MySQL, SQLite, Oracle
Rails Migrations Ruby on Rails PostgreSQL, MySQL, SQLite
Alembic Python/SQLAlchemy PostgreSQL, MySQL, SQLite, Oracle

NoSQL Database Migration Tools

NoSQL databases also have migration solutions, though they're often handled differently:

Real-World Example: MongoDB Migrations

While MongoDB is schema-less, migrations are still important. For example, if you decide to split a user's full name into first and last name, you need a migration to transform existing data. A tool like migrate-mongo can define this transformation and track which documents have been updated.

Working with Knex.js Migrations

Let's explore how to create and manage migrations using Knex.js, a popular SQL query builder for Node.js that includes a robust migrations system.

Setting Up Knex Migrations


// Install Knex and your database driver
npm install knex pg

// Initialize Knex configuration
npx knex init

// Create a migration file
npx knex migrate:make create_users_table
            

Writing a Migration


// migrations/20230315143227_create_users_table.js
exports.up = function(knex) {
  return knex.schema.createTable('users', function(table) {
    table.increments('id').primary();
    table.string('username').notNullable().unique();
    table.string('email').notNullable().unique();
    table.string('password_hash').notNullable();
    table.boolean('is_active').defaultTo(true);
    table.timestamp('created_at').defaultTo(knex.fn.now());
    table.timestamp('updated_at').defaultTo(knex.fn.now());
  });
};

exports.down = function(knex) {
  return knex.schema.dropTable('users');
};
            

Running Migrations


// Run all pending migrations
npx knex migrate:latest

// Rollback the last batch of migrations
npx knex migrate:rollback

// Rollback all migrations
npx knex migrate:rollback --all

// Check migration status
npx knex migrate:status
            

Common Knex Migration Operations


// Adding a column
exports.up = function(knex) {
  return knex.schema.table('users', function(table) {
    table.string('first_name');
    table.string('last_name');
  });
};

exports.down = function(knex) {
  return knex.schema.table('users', function(table) {
    table.dropColumn('first_name');
    table.dropColumn('last_name');
  });
};

// Creating an index
exports.up = function(knex) {
  return knex.schema.table('users', function(table) {
    table.index(['last_name', 'first_name'], 'idx_users_name');
  });
};

exports.down = function(knex) {
  return knex.schema.table('users', function(table) {
    table.dropIndex(['last_name', 'first_name'], 'idx_users_name');
  });
};

// Adding a foreign key
exports.up = function(knex) {
  return knex.schema.table('posts', function(table) {
    table.integer('user_id').unsigned();
    table.foreign('user_id').references('users.id');
  });
};

exports.down = function(knex) {
  return knex.schema.table('posts', function(table) {
    table.dropForeign('user_id');
    table.dropColumn('user_id');
  });
};
            

Data Migrations

Schema migrations modify the structure of your database, but data migrations transform the data itself. These are essential when changing how data is represented.

Common Data Migration Scenarios

Example: Splitting Name into First and Last Name


// migrations/20230316123045_split_user_name.js
exports.up = function(knex) {
  return knex.schema
    // First add the new columns
    .table('users', function(table) {
      table.string('first_name');
      table.string('last_name');
    })
    // Then migrate the data
    .then(() => {
      return knex('users').select('id', 'name');
    })
    .then((users) => {
      // Process each user
      const updates = users.map(user => {
        const nameParts = user.name.split(' ');
        const firstName = nameParts[0] || '';
        // Join all remaining parts as last name
        const lastName = nameParts.slice(1).join(' ') || '';
        
        return knex('users')
          .where('id', user.id)
          .update({
            first_name: firstName,
            last_name: lastName
          });
      });
      
      return Promise.all(updates);
    });
};

exports.down = function(knex) {
  return knex.schema
    // First combine the data back
    .then(() => {
      return knex('users').select('id', 'first_name', 'last_name');
    })
    .then((users) => {
      const updates = users.map(user => {
        const fullName = `${user.first_name || ''} ${user.last_name || ''}`.trim();
        
        return knex('users')
          .where('id', user.id)
          .update({ name: fullName });
      });
      
      return Promise.all(updates);
    })
    // Then remove the columns
    .then(() => {
      return knex.schema.table('users', function(table) {
        table.dropColumn('first_name');
        table.dropColumn('last_name');
      });
    });
};
            

Analogy: Moving House

A data migration is like moving to a new house. Schema migration is building the new house (creating new tables/columns), data migration is moving your belongings (the actual data), and cleanup is selling your old house (removing old tables/columns). Just like you wouldn't demolish your old house before moving your belongings, you shouldn't drop old database structures until you've safely migrated your data.

Migration Best Practices

General Best Practices

Production Migration Strategies

Real-World Example: GitHub's Database Architecture

GitHub runs one of the world's largest Ruby on Rails applications and has a sophisticated migration strategy. They perform zero-downtime migrations by breaking changes into small steps and using feature flags to toggle between old and new code paths. This allows them to deploy database changes safely without affecting millions of users.

Handling Migrations in Different Environments

Development Environment

In development, migrations should be easy to apply and revert for rapid iteration:


// Example development workflow with Knex
// Reset and set up the database
npx knex migrate:rollback --all
npx knex migrate:latest
npx knex seed:run
            

Staging/Test Environment

Staging environments should mimic production as closely as possible:


// Example CI/CD workflow
// In your CI/CD pipeline
echo "Running migrations on staging..."
npx knex migrate:latest --env staging

echo "Running tests..."
npm test

// If tests pass, continue to production deployment
            

Production Environment

Production migrations require careful planning and execution:


// Example production migration script
#!/bin/bash
# Backup the database
echo "Creating database backup..."
pg_dump -U $DB_USER -h $DB_HOST $DB_NAME > backup_$(date +%Y%m%d_%H%M%S).sql

# Run the migration
echo "Running migrations..."
npx knex migrate:latest --env production

# Verify the migration
echo "Verifying database state..."
# Run verification queries or checks

# If verification fails
if [ $? -ne 0 ]; then
  echo "Migration verification failed! Rolling back..."
  npx knex migrate:rollback --env production
fi
            

MongoDB Migration Example with Mongoose

Since MongoDB is schema-less, migrations work differently. However, with Mongoose, we can still benefit from a structured migration approach.

Setting Up mongoose-migrate


// Install mongoose-migrate
npm install mongoose-migrate

// Create a migration
npx mongoose-migrate create add-email-to-users
            

Writing a MongoDB Migration


// migrations/20230317145623-add-email-to-users.js
const mongoose = require('mongoose');
const User = mongoose.model('User');

module.exports = {
  async up() {
    // Get all users without an email field
    const users = await User.find({ email: { $exists: false } });
    
    // Update each user
    const updates = users.map(user => {
      // Generate a placeholder email based on username
      const email = `${user.username.toLowerCase()}@example.com`;
      
      return User.updateOne(
        { _id: user._id },
        { $set: { email: email } }
      );
    });
    
    // Execute all updates
    await Promise.all(updates);
    
    // Update the schema to validate email in future documents
    // (This assumes you're updating your Mongoose schema separately)
    console.log(`Updated ${updates.length} users with placeholder emails.`);
  },
  
  async down() {
    // Remove email field from all users
    await User.updateMany(
      {},
      { $unset: { email: "" } }
    );
    
    console.log('Removed email field from all users.');
  }
};
            

Managing MongoDB Schema Changes

With MongoDB, we need to handle both the data migration and schema changes:


// Update your Mongoose schema
const userSchema = new mongoose.Schema({
  username: { type: String, required: true },
  // Add the new email field
  email: { 
    type: String, 
    required: true, // Will only affect new documents
    validate: {
      validator: function(v) {
        return /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/.test(v);
      },
      message: props => `${props.value} is not a valid email address!`
    }
  },
  createdAt: { type: Date, default: Date.now }
});

// Update the model
const User = mongoose.model('User', userSchema);
            
flowchart TD A[Define Migration Strategy] --> B[Update Application Code] B --> C[Update Mongoose Schema] C --> D[Run Data Migration] D --> E[Test with Old and New Data] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style B fill:#f5f5f5,stroke:#333,stroke-width:2px style C fill:#f5f5f5,stroke:#333,stroke-width:2px style D fill:#f5f5f5,stroke:#333,stroke-width:2px style E fill:#f5f5f5,stroke:#333,stroke-width:2px

Real-World Migration Scenarios

Scenario 1: Adding a Required Field

You need to add a new required field to a table with existing data. This requires multiple steps:

  1. Add the column as nullable
  2. Deploy code that populates the field for new records
  3. Run a data migration to populate the field for existing records
  4. Add a NOT NULL constraint after all data is populated

// Step 1: Add nullable column
exports.up = function(knex) {
  return knex.schema.table('users', function(table) {
    table.string('phone_number');
  });
};

// Step 2: Populate existing data
exports.up = function(knex) {
  return knex('users')
    .select('id', 'contact_info')
    .then(users => {
      const updates = users.map(user => {
        // Extract phone from contact_info JSON or use placeholder
        let phoneNumber = null;
        try {
          const contactInfo = JSON.parse(user.contact_info);
          phoneNumber = contactInfo.phone || '000-000-0000';
        } catch (e) {
          phoneNumber = '000-000-0000';
        }
        
        return knex('users')
          .where('id', user.id)
          .update({ phone_number: phoneNumber });
      });
      
      return Promise.all(updates);
    });
};

// Step 3: Add NOT NULL constraint
exports.up = function(knex) {
  return knex.schema.alterTable('users', function(table) {
    table.string('phone_number').notNullable().alter();
  });
};
            

Scenario 2: Table Splitting

You need to split a large table into two separate tables:

graph TD A[users
id
username
email
address
city
state
zip
country] --> B[users
id
username
email
address_id] A --> C[addresses
id
address
city
state
zip
country] B --> C style A fill:#ffcc99,stroke:#333,stroke-width:2px style B fill:#d9f7be,stroke:#333,stroke-width:2px style C fill:#d9f7be,stroke:#333,stroke-width:2px

// Step 1: Create new addresses table
exports.up = function(knex) {
  return knex.schema.createTable('addresses', function(table) {
    table.increments('id').primary();
    table.string('address').notNullable();
    table.string('city').notNullable();
    table.string('state').notNullable();
    table.string('zip').notNullable();
    table.string('country').notNullable();
    table.timestamp('created_at').defaultTo(knex.fn.now());
    table.timestamp('updated_at').defaultTo(knex.fn.now());
  });
};

// Step 2: Migrate address data
exports.up = function(knex) {
  return knex('users')
    .select('id', 'address', 'city', 'state', 'zip', 'country')
    .then(users => {
      // Insert addresses and track user_id -> address_id mapping
      const addressInserts = [];
      const userUpdates = [];
      
      users.forEach(user => {
        // Skip users with no address
        if (!user.address) return;
        
        // Insert address and then update user
        addressInserts.push(
          knex('addresses')
            .insert({
              address: user.address,
              city: user.city || '',
              state: user.state || '',
              zip: user.zip || '',
              country: user.country || 'Unknown'
            })
            .returning('id')
            .then(([addressId]) => {
              userUpdates.push(
                knex('users')
                  .where('id', user.id)
                  .update({ address_id: addressId })
              );
            })
        );
      });
      
      // Wait for all inserts to complete
      return Promise.all(addressInserts)
        .then(() => Promise.all(userUpdates));
    });
};

// Step 3: Add foreign key constraint
exports.up = function(knex) {
  return knex.schema.table('users', function(table) {
    table.foreign('address_id').references('addresses.id');
  });
};

// Step 4: Remove old columns
exports.up = function(knex) {
  return knex.schema.table('users', function(table) {
    table.dropColumn('address');
    table.dropColumn('city');
    table.dropColumn('state');
    table.dropColumn('zip');
    table.dropColumn('country');
  });
};
            

Practice Activities

Activity 1: Create a Migration Plan

Design a migration plan for adding a new feature to an e-commerce application:

  1. Add a customer loyalty program
  2. Each customer will have a points balance
  3. Each order will earn points based on total amount
  4. Points can be redeemed for discounts

Your plan should include:

Activity 2: Implement Knex Migrations

Create a series of Knex migration files for a blog platform with the following features:

  1. Users table (id, username, email, password_hash)
  2. Posts table (id, title, content, user_id, created_at, updated_at)
  3. Comments table (id, content, user_id, post_id, created_at)
  4. Add a status field to posts (draft, published, archived)
  5. Add tags to posts (many-to-many relationship)

Activity 3: Data Migration Challenge

Create a migration to transform data in the following scenario:

  1. You have a products table with a JSON 'metadata' column
  2. The metadata contains information about product dimensions
  3. You need to extract this data into dedicated columns
  4. After extraction, update the application to use the new columns

Original structure:


// Products table
{
  id: 1,
  name: 'Desk Lamp',
  price: 29.99,
  metadata: '{"dimensions": {"height": 45, "width": 15, "depth": 15}, "unit": "cm"}'
}
            

New structure should have:


// Products table
{
  id: 1,
  name: 'Desk Lamp',
  price: 29.99,
  height: 45,
  width: 15,
  depth: 15,
  dimension_unit: 'cm',
  metadata: '{}' // Other metadata preserved
}
            

Further Reading