Data Seeding Strategies

Populating Databases with Initial and Test Data

Understanding Data Seeding

Data seeding is the process of populating a database with initial data. This data can serve various purposes, from providing necessary reference data for an application to function, to creating realistic test data for development and testing environments.

While migrations focus on evolving the database structure, seeding focuses on the initial content that populates that structure. Together, they ensure your application has both the right database schema and the right data to work with.

Analogy: Building a New Library

Think of database migrations and seeding like building a new public library. Migrations are like constructing the building—designing the floor plan, building the shelves, and setting up the classification system. Seeding is like stocking the library with books. Without books (data), the library (database) has the right structure but isn't useful yet. Different types of seeding are like different strategies for acquiring books: reference seeding is like ordering essential reference books that every library needs, development seeding is like getting a diverse sample collection for the staff to test the systems, and test seeding is like creating temporary book records to verify the cataloging system works properly.

graph TD A[Database Setup] --> B[Migrations] A --> C[Seeding] B --> B1[Create Tables] B --> B2[Modify Structure] B --> B3[Add Constraints] C --> C1[Reference Data] C --> C2[Test Data] C --> C3[Development Data] C --> C4[Demo Data] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style B fill:#ffcc99,stroke:#333,stroke-width:2px style C fill:#d9f7be,stroke:#333,stroke-width:2px

Types of Seed Data

Reference Data

Reference data (sometimes called lookup data or static data) is essential, relatively unchanging data that your application needs to function correctly. This data is often shared across environments (development, staging, production).

Examples:

Test Data

Test data is specifically designed to support automated testing. It should be consistent, predictable, and cover a wide range of scenarios to ensure comprehensive test coverage.

Examples:

Development Data

Development data helps developers work on the application locally by providing a realistic working environment. It should be comprehensive enough to exercise all parts of the application.

Examples:

Demo Data

Demo data is designed to showcase the application to stakeholders, potential customers, or during presentations. It's often polished and carefully curated to highlight key features.

Examples:

Real-World Example: E-commerce Platform

An e-commerce platform might have these different types of seed data:

  • Reference Data: Product categories, shipping methods, payment types, countries for shipping, tax codes
  • Test Data: Test users with different roles, products with specific prices to test discount calculation, orders in various states (pending, shipped, refunded)
  • Development Data: Hundreds of realistic products with images, sample customer accounts, order histories
  • Demo Data: Carefully selected premium products with professional images, sample customer journeys, featured promotions

Seeding Strategies

Environment-Specific Seeding

Different environments often need different amounts and types of data:

Declarative vs. Programmatic Seeding

Declarative Seeding

  • Data defined in structured files (JSON, YAML, CSV)
  • Easy to review and maintain
  • Good for static, reference data
  • Limited flexibility for complex relationships

Programmatic Seeding

  • Data created through code (JavaScript, Ruby, etc.)
  • Dynamic generation of complex data sets
  • Can incorporate logic and relationships
  • Good for generating large volumes of realistic data

Manual vs. Automated Seeding

Manual Seeding

  • Data inserted through scripts or commands run by developers
  • Good for one-time setup or occasional updates
  • Can lead to environment inconsistencies

Automated Seeding

  • Data inserted as part of deployment or CI/CD pipeline
  • Ensures consistent environments
  • Can be integrated with migrations
  • Repeatable and reliable

Data Factories and Generators

For complex or large data sets, you can use data factories and generators to create realistic, varied data:

flowchart TD A[Seeding Strategy] --> B[Declarative] A --> C[Programmatic] B --> B1[Static JSON/YAML Files] B --> B2[CSV Import] B --> B3[SQL Insert Scripts] C --> C1[Data Factories] C --> C2[Faker Libraries] C --> C3[Custom Generators] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style B fill:#ffcc99,stroke:#333,stroke-width:2px style C fill:#d9f7be,stroke:#333,stroke-width:2px

Seeding with Knex.js

Knex.js provides a robust seeding framework that integrates well with its migration system.

Setting Up Seed Files


// Create a seed file
npx knex seed:make 01_users

// This creates a file in the seeds directory:
// seeds/01_users.js
            

Basic Seed File Structure


// seeds/01_users.js
exports.seed = function(knex) {
  // Deletes ALL existing entries
  return knex('users').del()
    .then(function () {
      // Inserts seed entries
      return knex('users').insert([
        {
          id: 1, 
          username: 'admin',
          email: 'admin@example.com',
          password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2', // hashed 'password'
          is_admin: true
        },
        {
          id: 2, 
          username: 'user1',
          email: 'user1@example.com',
          password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2',
          is_admin: false
        },
        {
          id: 3, 
          username: 'user2',
          email: 'user2@example.com',
          password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2',
          is_admin: false
        }
      ]);
    });
};
            

Running Seeds


// Run all seed files
npx knex seed:run

// Run a specific seed file
npx knex seed:run --specific=01_users.js
            

Ordering Seed Files

Knex runs seed files in alphabetical order. To control the order, you can prefix filenames with numbers:

Managing Relationships in Seeds


// seeds/03_products.js
exports.seed = function(knex) {
  // Deletes ALL existing entries
  return knex('products').del()
    .then(function () {
      // Inserts seed entries
      return knex('products').insert([
        {
          id: 1, 
          name: 'Laptop',
          description: 'Powerful laptop for developers',
          price: 1299.99,
          category_id: 1, // References a category from 02_categories.js
          created_by: 1 // References the admin user
        },
        {
          id: 2, 
          name: 'Smartphone',
          description: 'Latest smartphone with advanced features',
          price: 899.99,
          category_id: 1,
          created_by: 1
        },
        {
          id: 3, 
          name: 'Coffee Maker',
          description: 'Automatic coffee maker with timer',
          price: 79.99,
          category_id: 2,
          created_by: 2
        }
      ]);
    });
};
            

Advanced Seeding with Knex


// seeds/01_reference_data.js
exports.seed = async function(knex) {
  // Seed countries
  await knex('countries').del();
  await knex('countries').insert([
    { id: 1, code: 'US', name: 'United States' },
    { id: 2, code: 'CA', name: 'Canada' },
    { id: 3, code: 'MX', name: 'Mexico' }
  ]);
  
  // Seed states for the US
  await knex('states').del();
  await knex('states').insert([
    { id: 1, code: 'NY', name: 'New York', country_id: 1 },
    { id: 2, code: 'CA', name: 'California', country_id: 1 },
    { id: 3, code: 'TX', name: 'Texas', country_id: 1 },
    { id: 4, code: 'ON', name: 'Ontario', country_id: 2 },
    { id: 5, code: 'BC', name: 'British Columbia', country_id: 2 }
  ]);
  
  // Seed order statuses
  await knex('order_statuses').del();
  await knex('order_statuses').insert([
    { id: 1, code: 'pending', name: 'Pending' },
    { id: 2, code: 'processing', name: 'Processing' },
    { id: 3, code: 'shipped', name: 'Shipped' },
    { id: 4, code: 'delivered', name: 'Delivered' },
    { id: 5, code: 'cancelled', name: 'Cancelled' }
  ]);
};

// seeds/02_development_users.js
const bcrypt = require('bcrypt');

exports.seed = async function(knex) {
  // Only seed development data if not in production
  const environment = process.env.NODE_ENV || 'development';
  if (environment === 'production') {
    console.log('Skipping development data seeding in production');
    return;
  }
  
  // Hash the passwords
  const hashPassword = async (password) => {
    const salt = await bcrypt.genSalt(10);
    return bcrypt.hash(password, salt);
  };
  
  // Delete existing entries
  await knex('users').del();
  
  // Insert admin user
  await knex('users').insert({
    id: 1,
    username: 'admin',
    email: 'admin@example.com',
    password_hash: await hashPassword('admin123'),
    is_admin: true
  });
  
  // Insert regular users
  const regularUsers = [];
  for (let i = 1; i <= 50; i++) {
    regularUsers.push({
      username: `user${i}`,
      email: `user${i}@example.com`,
      password_hash: await hashPassword('password123'),
      is_admin: false
    });
  }
  
  await knex('users').insert(regularUsers);
};
            

Generating Dynamic Seed Data with Faker.js

Faker.js is a popular library for generating realistic fake data. It's ideal for creating large volumes of varied test data.

Setting Up Faker.js


// Install Faker.js
npm install @faker-js/faker
            

Basic Faker.js Usage


// seeds/03_faker_products.js
const { faker } = require('@faker-js/faker');

exports.seed = async function(knex) {
  // Only run in development
  if (process.env.NODE_ENV === 'production') return;
  
  // Clear the products table
  await knex('products').del();
  
  // Generate 100 random products
  const products = [];
  
  for (let i = 0; i < 100; i++) {
    products.push({
      name: faker.commerce.productName(),
      description: faker.commerce.productDescription(),
      price: parseFloat(faker.commerce.price({ min: 10, max: 1000 })),
      category_id: faker.number.int({ min: 1, max: 5 }), // Assuming 5 categories
      image_url: faker.image.urlLoremFlickr({ category: 'product' }),
      created_at: faker.date.past(),
      updated_at: faker.date.recent()
    });
  }
  
  // Insert the products in batches
  const chunkSize = 25;
  await knex.batchInsert('products', products, chunkSize);
};
            

Creating Related Data with Faker


// seeds/04_orders_and_items.js
const { faker } = require('@faker-js/faker');

exports.seed = async function(knex) {
  // Only run in development
  if (process.env.NODE_ENV === 'production') return;
  
  // Clear the tables
  await knex('order_items').del();
  await knex('orders').del();
  
  // Get all user IDs
  const users = await knex('users').select('id');
  
  // Get all product IDs
  const products = await knex('products').select('id', 'price');
  
  // Generate 200 orders
  const orders = [];
  const orderItems = [];
  
  for (let i = 0; i < 200; i++) {
    // Create an order
    const orderDate = faker.date.past();
    const user = faker.helpers.arrayElement(users);
    
    const orderId = i + 1; // Simplification for this example
    
    orders.push({
      id: orderId,
      user_id: user.id,
      status_id: faker.number.int({ min: 1, max: 5 }), // Assuming 5 statuses
      total_amount: 0, // Will calculate after items
      shipping_address: faker.location.streetAddress(),
      shipping_city: faker.location.city(),
      shipping_state: faker.location.state({ abbreviated: true }),
      shipping_zip: faker.location.zipCode(),
      shipping_country: 'US',
      created_at: orderDate,
      updated_at: orderDate
    });
    
    // Create 1-5 order items
    const itemCount = faker.number.int({ min: 1, max: 5 });
    let orderTotal = 0;
    
    // Randomly select products without duplicates
    const selectedProducts = faker.helpers.arrayElements(
      products,
      itemCount
    );
    
    selectedProducts.forEach((product, index) => {
      const quantity = faker.number.int({ min: 1, max: 3 });
      const price = parseFloat(product.price);
      const itemTotal = price * quantity;
      orderTotal += itemTotal;
      
      orderItems.push({
        order_id: orderId,
        product_id: product.id,
        quantity: quantity,
        price: price,
        total: itemTotal
      });
    });
    
    // Update the order total
    orders[i].total_amount = parseFloat(orderTotal.toFixed(2));
  }
  
  // Insert the orders and items
  await knex('orders').insert(orders);
  await knex('order_items').insert(orderItems);
};
            

Useful Faker.js Generators for Web Applications

Category Faker Methods Examples
Person faker.person.firstName()
faker.person.lastName()
faker.person.fullName()
faker.person.jobTitle()
"John"
"Smith"
"Jane Doe"
"Senior Developer"
Internet faker.internet.email()
faker.internet.userName()
faker.internet.password()
faker.internet.url()
"john.smith@example.com"
"jsmith42"
"a6B7c8D9e0"
"https://example.com/user/1"
Commerce faker.commerce.productName()
faker.commerce.price()
faker.commerce.department()
"Ergonomic Steel Keyboard"
"129.99"
"Electronics"
Date/Time faker.date.past()
faker.date.future()
faker.date.recent()
"[Date object]"
"[Date object]"
"[Date object]"
Location faker.location.streetAddress()
faker.location.city()
faker.location.country()
"123 Main St"
"New York"
"United States"
Images faker.image.url()
faker.image.avatar()
"https://loremflickr.com/640/480"
"https://cloudflare-ipfs.com/ipfs/Qm..."
Lorem faker.lorem.sentence()
faker.lorem.paragraph()
faker.lorem.paragraphs()
"Lorem ipsum dolor sit amet."
"[paragraph text]"
"[multiple paragraphs]"

Analogy: Cooking Show Prep

Using Faker.js is like being a prep chef for a cooking show. Instead of manually chopping every vegetable and measuring every ingredient, you use specialized tools and pre-prepped ingredients to quickly assemble a realistic-looking kitchen setup. The audience (your application) sees what looks like a complete, varied, and realistic set of ingredients, but you've created it in a fraction of the time it would take to gather everything for real. And just like how cooking show food doesn't need to be edible long-term (it just needs to look good on camera), Faker data doesn't need to be production-quality—it just needs to provide a realistic development environment.

Seeding MongoDB with Mongoose

For MongoDB databases, you can create seed scripts using Mongoose models.

Basic MongoDB Seeding Script


// scripts/seed.js
const mongoose = require('mongoose');
const User = require('../models/User');
const Category = require('../models/Category');
const Product = require('../models/Product');

// Connect to the database
mongoose.connect('mongodb://localhost/my_app_dev')
  .then(() => console.log('Connected to MongoDB...'))
  .catch(err => console.error('Could not connect to MongoDB...', err));

// Seed users
async function seedUsers() {
  // First, clear the collection
  await User.deleteMany({});
  
  const users = [
    {
      username: 'admin',
      email: 'admin@example.com',
      password: 'admin123',
      isAdmin: true
    },
    {
      username: 'user1',
      email: 'user1@example.com',
      password: 'password123',
      isAdmin: false
    }
  ];
  
  return User.insertMany(users);
}

// Seed categories
async function seedCategories() {
  await Category.deleteMany({});
  
  const categories = [
    { name: 'Electronics', slug: 'electronics' },
    { name: 'Clothing', slug: 'clothing' },
    { name: 'Books', slug: 'books' },
    { name: 'Home & Kitchen', slug: 'home-kitchen' }
  ];
  
  return Category.insertMany(categories);
}

// Seed products
async function seedProducts(categories) {
  await Product.deleteMany({});
  
  const products = [
    {
      name: 'Laptop',
      slug: 'laptop',
      description: 'Powerful laptop for developers',
      price: 1299.99,
      category: categories[0]._id,
      countInStock: 10
    },
    {
      name: 'T-Shirt',
      slug: 't-shirt',
      description: 'Comfortable cotton t-shirt',
      price: 19.99,
      category: categories[1]._id,
      countInStock: 50
    },
    {
      name: 'JavaScript Book',
      slug: 'javascript-book',
      description: 'Comprehensive guide to JavaScript',
      price: 39.99,
      category: categories[2]._id,
      countInStock: 20
    }
  ];
  
  return Product.insertMany(products);
}

// Run the seeding operations
async function seed() {
  try {
    const users = await seedUsers();
    const categories = await seedCategories();
    const products = await seedProducts(categories);
    
    console.log(`Seeded ${users.length} users`);
    console.log(`Seeded ${categories.length} categories`);
    console.log(`Seeded ${products.length} products`);
    
    mongoose.disconnect();
  } catch (error) {
    console.error('Seeding error:', error);
    mongoose.disconnect();
    process.exit(1);
  }
}

seed();
            

Using Faker with Mongoose


// scripts/seed-large.js
const mongoose = require('mongoose');
const { faker } = require('@faker-js/faker');
const User = require('../models/User');
const Category = require('../models/Category');
const Product = require('../models/Product');
const Order = require('../models/Order');

mongoose.connect('mongodb://localhost/my_app_dev')
  .then(() => console.log('Connected to MongoDB...'))
  .catch(err => console.error('Could not connect to MongoDB...', err));

// Seed a large number of users
async function seedUsers(count = 50) {
  await User.deleteMany({});
  
  // Create one admin user
  const users = [{
    username: 'admin',
    email: 'admin@example.com',
    password: 'admin123', // in a real app, hash this
    isAdmin: true
  }];
  
  // Create regular users
  for (let i = 0; i < count - 1; i++) {
    const firstName = faker.person.firstName();
    const lastName = faker.person.lastName();
    
    users.push({
      username: faker.internet.userName({ firstName, lastName }),
      email: faker.internet.email({ firstName, lastName }),
      password: 'password123', // in a real app, hash this
      isAdmin: false,
      name: `${firstName} ${lastName}`,
      address: {
        street: faker.location.streetAddress(),
        city: faker.location.city(),
        state: faker.location.state(),
        zipCode: faker.location.zipCode(),
        country: 'USA'
      }
    });
  }
  
  return User.insertMany(users);
}

// Seed categories
async function seedCategories() {
  await Category.deleteMany({});
  
  const categories = [
    { name: 'Electronics', slug: 'electronics' },
    { name: 'Clothing', slug: 'clothing' },
    { name: 'Books', slug: 'books' },
    { name: 'Home & Kitchen', slug: 'home-kitchen' },
    { name: 'Toys & Games', slug: 'toys-games' }
  ];
  
  return Category.insertMany(categories);
}

// Seed a large number of products
async function seedProducts(categories, count = 100) {
  await Product.deleteMany({});
  
  const products = [];
  
  for (let i = 0; i < count; i++) {
    const name = faker.commerce.productName();
    const category = faker.helpers.arrayElement(categories);
    
    products.push({
      name,
      slug: name.toLowerCase().replace(/[^a-z0-9]+/g, '-'),
      description: faker.commerce.productDescription(),
      price: parseFloat(faker.commerce.price({ min: 5, max: 2000 })),
      category: category._id,
      countInStock: faker.number.int({ min: 0, max: 100 }),
      rating: faker.number.float({ min: 1, max: 5, precision: 0.1 }),
      numReviews: faker.number.int({ min: 0, max: 100 }),
      image: faker.image.urlLoremFlickr({ category: 'product' }),
      isFeatured: faker.datatype.boolean({ probability: 0.2 })
    });
  }
  
  return Product.insertMany(products);
}

// Seed orders
async function seedOrders(users, products, count = 200) {
  await Order.deleteMany({});
  
  const orders = [];
  
  for (let i = 0; i < count; i++) {
    const user = faker.helpers.arrayElement(users);
    const orderItems = [];
    const itemCount = faker.number.int({ min: 1, max: 5 });
    
    // Select random products
    const selectedProducts = faker.helpers.arrayElements(products, itemCount);
    
    let totalPrice = 0;
    
    selectedProducts.forEach(product => {
      const quantity = faker.number.int({ min: 1, max: 5 });
      const itemPrice = quantity * product.price;
      totalPrice += itemPrice;
      
      orderItems.push({
        product: product._id,
        name: product.name,
        quantity,
        image: product.image,
        price: product.price
      });
    });
    
    // Add shipping and tax
    const shippingPrice = totalPrice > 100 ? 0 : 10;
    const taxPrice = parseFloat((totalPrice * 0.15).toFixed(2));
    totalPrice += shippingPrice + taxPrice;
    
    orders.push({
      user: user._id,
      orderItems,
      shippingAddress: {
        fullName: user.name || `${user.username}`,
        address: faker.location.streetAddress(),
        city: faker.location.city(),
        postalCode: faker.location.zipCode(),
        country: 'USA'
      },
      paymentMethod: faker.helpers.arrayElement(['PayPal', 'Credit Card', 'Cash']),
      paymentResult: {
        id: faker.string.uuid(),
        status: 'succeeded',
        email_address: user.email
      },
      itemsPrice: parseFloat(totalPrice.toFixed(2)) - shippingPrice - taxPrice,
      shippingPrice,
      taxPrice,
      totalPrice: parseFloat(totalPrice.toFixed(2)),
      isPaid: faker.datatype.boolean({ probability: 0.7 }),
      paidAt: faker.date.past(),
      isDelivered: faker.datatype.boolean({ probability: 0.5 }),
      deliveredAt: faker.date.recent(),
      createdAt: faker.date.past()
    });
  }
  
  return Order.insertMany(orders);
}

// Run the complete seeding process
async function seedDatabase() {
  try {
    const users = await seedUsers(50);
    console.log(`Seeded ${users.length} users`);
    
    const categories = await seedCategories();
    console.log(`Seeded ${categories.length} categories`);
    
    const products = await seedProducts(categories, 100);
    console.log(`Seeded ${products.length} products`);
    
    const orders = await seedOrders(users, products, 200);
    console.log(`Seeded ${orders.length} orders`);
    
    console.log('Database seeding completed successfully');
    
    mongoose.disconnect();
  } catch (error) {
    console.error('Seeding error:', error);
    mongoose.disconnect();
    process.exit(1);
  }
}

seedDatabase();
            

Factory Pattern for Test Data

The factory pattern is a powerful approach for generating test data in a consistent, reusable way. Instead of hard-coding test data, you define factories that produce objects with default values that can be overridden as needed.

Setting Up Factories with Factory-Girl


// Install factory-girl
npm install factory-girl
            

Defining Factories


// tests/factories/index.js
const factory = require('factory-girl').factory;
const { faker } = require('@faker-js/faker');
const User = require('../../models/User');
const Product = require('../../models/Product');
const Order = require('../../models/Order');

// User factory
factory.define('User', User, {
  username: factory.sequence('User.username', (n) => `user${n}`),
  email: factory.sequence('User.email', (n) => `user${n}@example.com`),
  password: 'password123',
  isAdmin: false,
  name: () => `${faker.person.firstName()} ${faker.person.lastName()}`
});

// Admin user factory
factory.define('Admin', User, {
  username: factory.sequence('Admin.username', (n) => `admin${n}`),
  email: factory.sequence('Admin.email', (n) => `admin${n}@example.com`),
  password: 'admin123',
  isAdmin: true,
  name: () => `${faker.person.firstName()} ${faker.person.lastName()}`
});

// Product factory
factory.define('Product', Product, {
  name: () => faker.commerce.productName(),
  slug: factory.sequence('Product.slug', (n) => `product-${n}`),
  description: () => faker.commerce.productDescription(),
  price: () => parseFloat(faker.commerce.price({ min: 10, max: 1000 })),
  category: factory.assoc('Category', '_id'),
  countInStock: () => faker.number.int({ min: 0, max: 100 }),
  rating: () => faker.number.float({ min: 1, max: 5, precision: 0.1 }),
  numReviews: () => faker.number.int({ min: 0, max: 100 })
});

// Order factory
factory.define('Order', Order, {
  user: factory.assoc('User', '_id'),
  orderItems: [],
  shippingAddress: {
    fullName: () => faker.person.fullName(),
    address: () => faker.location.streetAddress(),
    city: () => faker.location.city(),
    postalCode: () => faker.location.zipCode(),
    country: 'USA'
  },
  paymentMethod: 'PayPal',
  itemsPrice: 0,
  shippingPrice: 10,
  taxPrice: 0,
  totalPrice: 0,
  isPaid: false,
  isDelivered: false
});

module.exports = factory;
            

Using Factories in Tests


// tests/integration/product.test.js
const mongoose = require('mongoose');
const request = require('supertest');
const app = require('../../app');
const factory = require('../factories');

describe('Products API', () => {
  beforeAll(async () => {
    await mongoose.connect('mongodb://localhost/test_db');
  });

  afterAll(async () => {
    await mongoose.connection.dropDatabase();
    await mongoose.connection.close();
  });

  beforeEach(async () => {
    await mongoose.connection.db.dropDatabase();
  });

  describe('GET /api/products', () => {
    it('should return all products', async () => {
      // Create 3 products using the factory
      await factory.createMany('Product', 3);
      
      const res = await request(app).get('/api/products');
      
      expect(res.status).toBe(200);
      expect(res.body.length).toBe(3);
    });
  });

  describe('GET /api/products/:id', () => {
    it('should return a product if valid id is passed', async () => {
      const product = await factory.create('Product');
      
      const res = await request(app).get(`/api/products/${product._id}`);
      
      expect(res.status).toBe(200);
      expect(res.body.name).toBe(product.name);
    });
  });

  describe('POST /api/products', () => {
    it('should create a product if authenticated as admin', async () => {
      // Create an admin user
      const admin = await factory.create('Admin');
      // Get auth token (implementation depends on your auth system)
      const token = generateAuthToken(admin);
      
      const productData = factory.build('Product');
      
      const res = await request(app)
        .post('/api/products')
        .set('Authorization', `Bearer ${token}`)
        .send(productData);
      
      expect(res.status).toBe(201);
      expect(res.body.name).toBe(productData.name);
    });
  });
});
            

Using Factories for Application Seeding


// scripts/seed-with-factories.js
const mongoose = require('mongoose');
const factory = require('../tests/factories');

mongoose.connect('mongodb://localhost/my_app_dev')
  .then(() => console.log('Connected to MongoDB...'))
  .catch(err => console.error('Could not connect to MongoDB...', err));

async function clearDatabase() {
  const collections = mongoose.connection.collections;
  
  for (const key in collections) {
    await collections[key].deleteMany();
  }
}

async function seedDatabase() {
  try {
    await clearDatabase();
    
    // Create admin user
    const admin = await factory.create('Admin', {
      username: 'admin',
      email: 'admin@example.com'
    });
    
    // Create regular users
    const users = await factory.createMany('User', 20);
    console.log(`Created ${users.length + 1} users`);
    
    // Create categories
    const categories = await Promise.all([
      factory.create('Category', { name: 'Electronics', slug: 'electronics' }),
      factory.create('Category', { name: 'Clothing', slug: 'clothing' }),
      factory.create('Category', { name: 'Books', slug: 'books' })
    ]);
    console.log(`Created ${categories.length} categories`);
    
    // Create products for each category
    const products = [];
    
    for (const category of categories) {
      const categoryProducts = await factory.createMany('Product', 10, {
        category: category._id
      });
      products.push(...categoryProducts);
    }
    console.log(`Created ${products.length} products`);
    
    // Create orders
    const orders = [];
    
    for (const user of [admin, ...users]) {
      // Each user gets 1-3 orders
      const orderCount = Math.floor(Math.random() * 3) + 1;
      
      for (let i = 0; i < orderCount; i++) {
        // Select 1-5 random products
        const orderProducts = [];
        const productCount = Math.floor(Math.random() * 5) + 1;
        
        for (let j = 0; j < productCount; j++) {
          const product = products[Math.floor(Math.random() * products.length)];
          const quantity = Math.floor(Math.random() * 3) + 1;
          
          orderProducts.push({
            product: product._id,
            name: product.name,
            quantity,
            image: product.image || 'placeholder.jpg',
            price: product.price
          });
        }
        
        // Calculate totals
        const itemsPrice = orderProducts.reduce(
          (sum, item) => sum + item.price * item.quantity, 
          0
        );
        const shippingPrice = itemsPrice > 100 ? 0 : 10;
        const taxPrice = parseFloat((itemsPrice * 0.15).toFixed(2));
        const totalPrice = itemsPrice + shippingPrice + taxPrice;
        
        // Create the order
        const order = await factory.create('Order', {
          user: user._id,
          orderItems: orderProducts,
          itemsPrice,
          shippingPrice,
          taxPrice,
          totalPrice
        });
        
        orders.push(order);
      }
    }
    
    console.log(`Created ${orders.length} orders`);
    console.log('Database seeding completed successfully');
    
    mongoose.disconnect();
  } catch (error) {
    console.error('Seeding error:', error);
    mongoose.disconnect();
    process.exit(1);
  }
}

seedDatabase();
            

Real-World Example: Test-Driven Development Workflow

In a test-driven development workflow, factories become essential. For example, a team developing an e-commerce platform would create factories for users, products, orders, and reviews. Developers write tests that use these factories to create consistent test scenarios. When they need to add a feature like "wishlist functionality," they first create a wishlist factory, then write tests using that factory and existing user/product factories. This consistent approach speeds up development and ensures reliable testing.

Integration with CI/CD Pipelines

Automating database seeding as part of your continuous integration and deployment pipeline ensures consistent testing and deployment environments.

Example GitHub Actions Workflow


# .github/workflows/test.yml
name: Test

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]

jobs:
  test:
    runs-on: ubuntu-latest
    
    services:
      postgres:
        image: postgres:13
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: test_db
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Node.js
      uses: actions/setup-node@v2
      with:
        node-version: '16'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Run migrations
      run: npm run migrate:test
    
    - name: Seed test database
      run: npm run seed:test
    
    - name: Run tests
      run: npm test
      env:
        NODE_ENV: test
        DB_HOST: localhost
        DB_USER: postgres
        DB_PASSWORD: postgres
        DB_NAME: test_db
            

Environment-Specific Seeding Scripts


// package.json
{
  "scripts": {
    "migrate:dev": "knex migrate:latest --env development",
    "migrate:test": "knex migrate:latest --env test",
    "migrate:prod": "knex migrate:latest --env production",
    
    "seed:dev": "knex seed:run --env development",
    "seed:test": "knex seed:run --env test",
    "seed:prod": "knex seed:run --env production --specific=01_reference_data.js",
    
    "setup:dev": "npm run migrate:dev && npm run seed:dev",
    "setup:test": "npm run migrate:test && npm run seed:test",
    "reset:dev": "knex migrate:rollback --all --env development && npm run setup:dev",
    "reset:test": "knex migrate:rollback --all --env test && npm run setup:test"
  }
}
            

Database Reset vs. Incremental Updates

Complete Reset Approach

  • Drop all tables and recreate from scratch
  • Ensures a clean state for testing
  • Good for development and testing environments
  • Slower for large datasets

Incremental Update Approach

  • Only seed missing or changed data
  • Faster for large datasets
  • Preserves existing data
  • Can lead to inconsistencies if not carefully managed

Production Seeding Considerations

flowchart LR A[CI Pipeline] --> B[Run Tests] B --> C[Build Application] C --> D[Deploy to Staging] D --> E[Run Migrations] E --> F[Seed Reference Data] F --> G[Integration Tests] G --> H[Deploy to Production] H --> I[Run Migrations] I --> J[Seed Reference Data ONLY] style A fill:#f5f5f5,stroke:#333,stroke-width:2px style J fill:#ffcc99,stroke:#333,stroke-width:2px

Best Practices and Common Pitfalls

Seeding Best Practices

Common Seeding Pitfalls

Analogy: Restaurant Opening

Think of database seeding like preparing a restaurant for its opening day. Migrations are like building the restaurant—the kitchen, dining room, bar, etc. Reference data seeding is like stocking the essentials—plates, glasses, cooking equipment, and basic ingredients that every restaurant needs. Development seeding is like preparing a variety of sample dishes for the staff to practice with. Test seeding is like setting up controlled scenarios to ensure each station operates correctly. And in production, you only want the essential ingredients ready—you don't want random test dishes sitting in the kitchen when real customers arrive.

Practice Activities

Activity 1: Create a Complete Seeding System

Develop a seeding system for a blog application with the following requirements:

  1. Create reference seeds for user roles, post categories, and post statuses
  2. Create development seeds for users, posts, and comments
  3. Use Faker.js to generate realistic content
  4. Implement environment-specific seeding (different data for development, testing, production)
  5. Create NPM scripts to run different types of seeds

Activity 2: Factory Pattern Implementation

Implement the factory pattern for test data generation:

  1. Set up factory-girl or a similar library
  2. Define factories for a simple e-commerce system (users, products, orders)
  3. Create relationships between factories
  4. Write test cases that use factories to create test scenarios

Activity 3: Performance Optimization Challenge

Optimize a seeding script for performance:

  1. Start with a seed script that generates 10,000 users and 50,000 posts
  2. Identify performance bottlenecks
  3. Implement batch inserts
  4. Use parallel processing where appropriate
  5. Measure and report performance improvements

Further Reading