Understanding Data Seeding
Data seeding is the process of populating a database with initial data. This data can serve various purposes, from providing necessary reference data for an application to function, to creating realistic test data for development and testing environments.
While migrations focus on evolving the database structure, seeding focuses on the initial content that populates that structure. Together, they ensure your application has both the right database schema and the right data to work with.
Analogy: Building a New Library
Think of database migrations and seeding like building a new public library. Migrations are like constructing the building—designing the floor plan, building the shelves, and setting up the classification system. Seeding is like stocking the library with books. Without books (data), the library (database) has the right structure but isn't useful yet. Different types of seeding are like different strategies for acquiring books: reference seeding is like ordering essential reference books that every library needs, development seeding is like getting a diverse sample collection for the staff to test the systems, and test seeding is like creating temporary book records to verify the cataloging system works properly.
Types of Seed Data
Reference Data
Reference data (sometimes called lookup data or static data) is essential, relatively unchanging data that your application needs to function correctly. This data is often shared across environments (development, staging, production).
Examples:
- Country and state/province lists
- Currency codes
- Time zones
- Category lists
- Role definitions
- Status values
- Configuration settings
Test Data
Test data is specifically designed to support automated testing. It should be consistent, predictable, and cover a wide range of scenarios to ensure comprehensive test coverage.
Examples:
- User accounts with specific attributes for testing different permission levels
- Edge case data for testing validation and business rules
- Data sets that test performance with various volumes
- Data designed to test specific application features
Development Data
Development data helps developers work on the application locally by providing a realistic working environment. It should be comprehensive enough to exercise all parts of the application.
Examples:
- Sample user accounts
- Realistic product catalogs
- Placeholder content for media
- Mock transaction histories
Demo Data
Demo data is designed to showcase the application to stakeholders, potential customers, or during presentations. It's often polished and carefully curated to highlight key features.
Examples:
- Visually appealing product examples
- Sample workflows that demonstrate value
- Data that tells a story or demonstrates a use case
- Non-confidential, representative customer-like data
Real-World Example: E-commerce Platform
An e-commerce platform might have these different types of seed data:
- Reference Data: Product categories, shipping methods, payment types, countries for shipping, tax codes
- Test Data: Test users with different roles, products with specific prices to test discount calculation, orders in various states (pending, shipped, refunded)
- Development Data: Hundreds of realistic products with images, sample customer accounts, order histories
- Demo Data: Carefully selected premium products with professional images, sample customer journeys, featured promotions
Seeding Strategies
Environment-Specific Seeding
Different environments often need different amounts and types of data:
- Development: Rich, varied data that covers most application scenarios
- Testing: Consistent, predictable data for automated tests
- Staging: Data that mirrors production in structure but may be anonymized
- Production: Only essential reference data
Declarative vs. Programmatic Seeding
Declarative Seeding
- Data defined in structured files (JSON, YAML, CSV)
- Easy to review and maintain
- Good for static, reference data
- Limited flexibility for complex relationships
Programmatic Seeding
- Data created through code (JavaScript, Ruby, etc.)
- Dynamic generation of complex data sets
- Can incorporate logic and relationships
- Good for generating large volumes of realistic data
Manual vs. Automated Seeding
Manual Seeding
- Data inserted through scripts or commands run by developers
- Good for one-time setup or occasional updates
- Can lead to environment inconsistencies
Automated Seeding
- Data inserted as part of deployment or CI/CD pipeline
- Ensures consistent environments
- Can be integrated with migrations
- Repeatable and reliable
Data Factories and Generators
For complex or large data sets, you can use data factories and generators to create realistic, varied data:
- Factories define blueprints for creating different types of records
- Generators produce random but realistic values
- Libraries like Faker.js help create human-like data
- Allows creation of large data sets with realistic variations
Seeding with Knex.js
Knex.js provides a robust seeding framework that integrates well with its migration system.
Setting Up Seed Files
// Create a seed file
npx knex seed:make 01_users
// This creates a file in the seeds directory:
// seeds/01_users.js
Basic Seed File Structure
// seeds/01_users.js
exports.seed = function(knex) {
// Deletes ALL existing entries
return knex('users').del()
.then(function () {
// Inserts seed entries
return knex('users').insert([
{
id: 1,
username: 'admin',
email: 'admin@example.com',
password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2', // hashed 'password'
is_admin: true
},
{
id: 2,
username: 'user1',
email: 'user1@example.com',
password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2',
is_admin: false
},
{
id: 3,
username: 'user2',
email: 'user2@example.com',
password_hash: '$2b$10$X1ZhMDKvNK5Gwh.Wc8A3O.6EBGveuQjS3X1qww8kspSLiGCIu3.f2',
is_admin: false
}
]);
});
};
Running Seeds
// Run all seed files
npx knex seed:run
// Run a specific seed file
npx knex seed:run --specific=01_users.js
Ordering Seed Files
Knex runs seed files in alphabetical order. To control the order, you can prefix filenames with numbers:
01_users.js02_categories.js03_products.js04_orders.js
Managing Relationships in Seeds
// seeds/03_products.js
exports.seed = function(knex) {
// Deletes ALL existing entries
return knex('products').del()
.then(function () {
// Inserts seed entries
return knex('products').insert([
{
id: 1,
name: 'Laptop',
description: 'Powerful laptop for developers',
price: 1299.99,
category_id: 1, // References a category from 02_categories.js
created_by: 1 // References the admin user
},
{
id: 2,
name: 'Smartphone',
description: 'Latest smartphone with advanced features',
price: 899.99,
category_id: 1,
created_by: 1
},
{
id: 3,
name: 'Coffee Maker',
description: 'Automatic coffee maker with timer',
price: 79.99,
category_id: 2,
created_by: 2
}
]);
});
};
Advanced Seeding with Knex
// seeds/01_reference_data.js
exports.seed = async function(knex) {
// Seed countries
await knex('countries').del();
await knex('countries').insert([
{ id: 1, code: 'US', name: 'United States' },
{ id: 2, code: 'CA', name: 'Canada' },
{ id: 3, code: 'MX', name: 'Mexico' }
]);
// Seed states for the US
await knex('states').del();
await knex('states').insert([
{ id: 1, code: 'NY', name: 'New York', country_id: 1 },
{ id: 2, code: 'CA', name: 'California', country_id: 1 },
{ id: 3, code: 'TX', name: 'Texas', country_id: 1 },
{ id: 4, code: 'ON', name: 'Ontario', country_id: 2 },
{ id: 5, code: 'BC', name: 'British Columbia', country_id: 2 }
]);
// Seed order statuses
await knex('order_statuses').del();
await knex('order_statuses').insert([
{ id: 1, code: 'pending', name: 'Pending' },
{ id: 2, code: 'processing', name: 'Processing' },
{ id: 3, code: 'shipped', name: 'Shipped' },
{ id: 4, code: 'delivered', name: 'Delivered' },
{ id: 5, code: 'cancelled', name: 'Cancelled' }
]);
};
// seeds/02_development_users.js
const bcrypt = require('bcrypt');
exports.seed = async function(knex) {
// Only seed development data if not in production
const environment = process.env.NODE_ENV || 'development';
if (environment === 'production') {
console.log('Skipping development data seeding in production');
return;
}
// Hash the passwords
const hashPassword = async (password) => {
const salt = await bcrypt.genSalt(10);
return bcrypt.hash(password, salt);
};
// Delete existing entries
await knex('users').del();
// Insert admin user
await knex('users').insert({
id: 1,
username: 'admin',
email: 'admin@example.com',
password_hash: await hashPassword('admin123'),
is_admin: true
});
// Insert regular users
const regularUsers = [];
for (let i = 1; i <= 50; i++) {
regularUsers.push({
username: `user${i}`,
email: `user${i}@example.com`,
password_hash: await hashPassword('password123'),
is_admin: false
});
}
await knex('users').insert(regularUsers);
};
Generating Dynamic Seed Data with Faker.js
Faker.js is a popular library for generating realistic fake data. It's ideal for creating large volumes of varied test data.
Setting Up Faker.js
// Install Faker.js
npm install @faker-js/faker
Basic Faker.js Usage
// seeds/03_faker_products.js
const { faker } = require('@faker-js/faker');
exports.seed = async function(knex) {
// Only run in development
if (process.env.NODE_ENV === 'production') return;
// Clear the products table
await knex('products').del();
// Generate 100 random products
const products = [];
for (let i = 0; i < 100; i++) {
products.push({
name: faker.commerce.productName(),
description: faker.commerce.productDescription(),
price: parseFloat(faker.commerce.price({ min: 10, max: 1000 })),
category_id: faker.number.int({ min: 1, max: 5 }), // Assuming 5 categories
image_url: faker.image.urlLoremFlickr({ category: 'product' }),
created_at: faker.date.past(),
updated_at: faker.date.recent()
});
}
// Insert the products in batches
const chunkSize = 25;
await knex.batchInsert('products', products, chunkSize);
};
Creating Related Data with Faker
// seeds/04_orders_and_items.js
const { faker } = require('@faker-js/faker');
exports.seed = async function(knex) {
// Only run in development
if (process.env.NODE_ENV === 'production') return;
// Clear the tables
await knex('order_items').del();
await knex('orders').del();
// Get all user IDs
const users = await knex('users').select('id');
// Get all product IDs
const products = await knex('products').select('id', 'price');
// Generate 200 orders
const orders = [];
const orderItems = [];
for (let i = 0; i < 200; i++) {
// Create an order
const orderDate = faker.date.past();
const user = faker.helpers.arrayElement(users);
const orderId = i + 1; // Simplification for this example
orders.push({
id: orderId,
user_id: user.id,
status_id: faker.number.int({ min: 1, max: 5 }), // Assuming 5 statuses
total_amount: 0, // Will calculate after items
shipping_address: faker.location.streetAddress(),
shipping_city: faker.location.city(),
shipping_state: faker.location.state({ abbreviated: true }),
shipping_zip: faker.location.zipCode(),
shipping_country: 'US',
created_at: orderDate,
updated_at: orderDate
});
// Create 1-5 order items
const itemCount = faker.number.int({ min: 1, max: 5 });
let orderTotal = 0;
// Randomly select products without duplicates
const selectedProducts = faker.helpers.arrayElements(
products,
itemCount
);
selectedProducts.forEach((product, index) => {
const quantity = faker.number.int({ min: 1, max: 3 });
const price = parseFloat(product.price);
const itemTotal = price * quantity;
orderTotal += itemTotal;
orderItems.push({
order_id: orderId,
product_id: product.id,
quantity: quantity,
price: price,
total: itemTotal
});
});
// Update the order total
orders[i].total_amount = parseFloat(orderTotal.toFixed(2));
}
// Insert the orders and items
await knex('orders').insert(orders);
await knex('order_items').insert(orderItems);
};
Useful Faker.js Generators for Web Applications
| Category | Faker Methods | Examples |
|---|---|---|
| Person |
faker.person.firstName() faker.person.lastName() faker.person.fullName() faker.person.jobTitle() |
"John" "Smith" "Jane Doe" "Senior Developer" |
| Internet |
faker.internet.email() faker.internet.userName() faker.internet.password() faker.internet.url() |
"john.smith@example.com" "jsmith42" "a6B7c8D9e0" "https://example.com/user/1" |
| Commerce |
faker.commerce.productName() faker.commerce.price() faker.commerce.department() |
"Ergonomic Steel Keyboard" "129.99" "Electronics" |
| Date/Time |
faker.date.past() faker.date.future() faker.date.recent() |
"[Date object]" "[Date object]" "[Date object]" |
| Location |
faker.location.streetAddress() faker.location.city() faker.location.country() |
"123 Main St" "New York" "United States" |
| Images |
faker.image.url() faker.image.avatar() |
"https://loremflickr.com/640/480" "https://cloudflare-ipfs.com/ipfs/Qm..." |
| Lorem |
faker.lorem.sentence() faker.lorem.paragraph() faker.lorem.paragraphs() |
"Lorem ipsum dolor sit amet." "[paragraph text]" "[multiple paragraphs]" |
Analogy: Cooking Show Prep
Using Faker.js is like being a prep chef for a cooking show. Instead of manually chopping every vegetable and measuring every ingredient, you use specialized tools and pre-prepped ingredients to quickly assemble a realistic-looking kitchen setup. The audience (your application) sees what looks like a complete, varied, and realistic set of ingredients, but you've created it in a fraction of the time it would take to gather everything for real. And just like how cooking show food doesn't need to be edible long-term (it just needs to look good on camera), Faker data doesn't need to be production-quality—it just needs to provide a realistic development environment.
Seeding MongoDB with Mongoose
For MongoDB databases, you can create seed scripts using Mongoose models.
Basic MongoDB Seeding Script
// scripts/seed.js
const mongoose = require('mongoose');
const User = require('../models/User');
const Category = require('../models/Category');
const Product = require('../models/Product');
// Connect to the database
mongoose.connect('mongodb://localhost/my_app_dev')
.then(() => console.log('Connected to MongoDB...'))
.catch(err => console.error('Could not connect to MongoDB...', err));
// Seed users
async function seedUsers() {
// First, clear the collection
await User.deleteMany({});
const users = [
{
username: 'admin',
email: 'admin@example.com',
password: 'admin123',
isAdmin: true
},
{
username: 'user1',
email: 'user1@example.com',
password: 'password123',
isAdmin: false
}
];
return User.insertMany(users);
}
// Seed categories
async function seedCategories() {
await Category.deleteMany({});
const categories = [
{ name: 'Electronics', slug: 'electronics' },
{ name: 'Clothing', slug: 'clothing' },
{ name: 'Books', slug: 'books' },
{ name: 'Home & Kitchen', slug: 'home-kitchen' }
];
return Category.insertMany(categories);
}
// Seed products
async function seedProducts(categories) {
await Product.deleteMany({});
const products = [
{
name: 'Laptop',
slug: 'laptop',
description: 'Powerful laptop for developers',
price: 1299.99,
category: categories[0]._id,
countInStock: 10
},
{
name: 'T-Shirt',
slug: 't-shirt',
description: 'Comfortable cotton t-shirt',
price: 19.99,
category: categories[1]._id,
countInStock: 50
},
{
name: 'JavaScript Book',
slug: 'javascript-book',
description: 'Comprehensive guide to JavaScript',
price: 39.99,
category: categories[2]._id,
countInStock: 20
}
];
return Product.insertMany(products);
}
// Run the seeding operations
async function seed() {
try {
const users = await seedUsers();
const categories = await seedCategories();
const products = await seedProducts(categories);
console.log(`Seeded ${users.length} users`);
console.log(`Seeded ${categories.length} categories`);
console.log(`Seeded ${products.length} products`);
mongoose.disconnect();
} catch (error) {
console.error('Seeding error:', error);
mongoose.disconnect();
process.exit(1);
}
}
seed();
Using Faker with Mongoose
// scripts/seed-large.js
const mongoose = require('mongoose');
const { faker } = require('@faker-js/faker');
const User = require('../models/User');
const Category = require('../models/Category');
const Product = require('../models/Product');
const Order = require('../models/Order');
mongoose.connect('mongodb://localhost/my_app_dev')
.then(() => console.log('Connected to MongoDB...'))
.catch(err => console.error('Could not connect to MongoDB...', err));
// Seed a large number of users
async function seedUsers(count = 50) {
await User.deleteMany({});
// Create one admin user
const users = [{
username: 'admin',
email: 'admin@example.com',
password: 'admin123', // in a real app, hash this
isAdmin: true
}];
// Create regular users
for (let i = 0; i < count - 1; i++) {
const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
users.push({
username: faker.internet.userName({ firstName, lastName }),
email: faker.internet.email({ firstName, lastName }),
password: 'password123', // in a real app, hash this
isAdmin: false,
name: `${firstName} ${lastName}`,
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
state: faker.location.state(),
zipCode: faker.location.zipCode(),
country: 'USA'
}
});
}
return User.insertMany(users);
}
// Seed categories
async function seedCategories() {
await Category.deleteMany({});
const categories = [
{ name: 'Electronics', slug: 'electronics' },
{ name: 'Clothing', slug: 'clothing' },
{ name: 'Books', slug: 'books' },
{ name: 'Home & Kitchen', slug: 'home-kitchen' },
{ name: 'Toys & Games', slug: 'toys-games' }
];
return Category.insertMany(categories);
}
// Seed a large number of products
async function seedProducts(categories, count = 100) {
await Product.deleteMany({});
const products = [];
for (let i = 0; i < count; i++) {
const name = faker.commerce.productName();
const category = faker.helpers.arrayElement(categories);
products.push({
name,
slug: name.toLowerCase().replace(/[^a-z0-9]+/g, '-'),
description: faker.commerce.productDescription(),
price: parseFloat(faker.commerce.price({ min: 5, max: 2000 })),
category: category._id,
countInStock: faker.number.int({ min: 0, max: 100 }),
rating: faker.number.float({ min: 1, max: 5, precision: 0.1 }),
numReviews: faker.number.int({ min: 0, max: 100 }),
image: faker.image.urlLoremFlickr({ category: 'product' }),
isFeatured: faker.datatype.boolean({ probability: 0.2 })
});
}
return Product.insertMany(products);
}
// Seed orders
async function seedOrders(users, products, count = 200) {
await Order.deleteMany({});
const orders = [];
for (let i = 0; i < count; i++) {
const user = faker.helpers.arrayElement(users);
const orderItems = [];
const itemCount = faker.number.int({ min: 1, max: 5 });
// Select random products
const selectedProducts = faker.helpers.arrayElements(products, itemCount);
let totalPrice = 0;
selectedProducts.forEach(product => {
const quantity = faker.number.int({ min: 1, max: 5 });
const itemPrice = quantity * product.price;
totalPrice += itemPrice;
orderItems.push({
product: product._id,
name: product.name,
quantity,
image: product.image,
price: product.price
});
});
// Add shipping and tax
const shippingPrice = totalPrice > 100 ? 0 : 10;
const taxPrice = parseFloat((totalPrice * 0.15).toFixed(2));
totalPrice += shippingPrice + taxPrice;
orders.push({
user: user._id,
orderItems,
shippingAddress: {
fullName: user.name || `${user.username}`,
address: faker.location.streetAddress(),
city: faker.location.city(),
postalCode: faker.location.zipCode(),
country: 'USA'
},
paymentMethod: faker.helpers.arrayElement(['PayPal', 'Credit Card', 'Cash']),
paymentResult: {
id: faker.string.uuid(),
status: 'succeeded',
email_address: user.email
},
itemsPrice: parseFloat(totalPrice.toFixed(2)) - shippingPrice - taxPrice,
shippingPrice,
taxPrice,
totalPrice: parseFloat(totalPrice.toFixed(2)),
isPaid: faker.datatype.boolean({ probability: 0.7 }),
paidAt: faker.date.past(),
isDelivered: faker.datatype.boolean({ probability: 0.5 }),
deliveredAt: faker.date.recent(),
createdAt: faker.date.past()
});
}
return Order.insertMany(orders);
}
// Run the complete seeding process
async function seedDatabase() {
try {
const users = await seedUsers(50);
console.log(`Seeded ${users.length} users`);
const categories = await seedCategories();
console.log(`Seeded ${categories.length} categories`);
const products = await seedProducts(categories, 100);
console.log(`Seeded ${products.length} products`);
const orders = await seedOrders(users, products, 200);
console.log(`Seeded ${orders.length} orders`);
console.log('Database seeding completed successfully');
mongoose.disconnect();
} catch (error) {
console.error('Seeding error:', error);
mongoose.disconnect();
process.exit(1);
}
}
seedDatabase();
Factory Pattern for Test Data
The factory pattern is a powerful approach for generating test data in a consistent, reusable way. Instead of hard-coding test data, you define factories that produce objects with default values that can be overridden as needed.
Setting Up Factories with Factory-Girl
// Install factory-girl
npm install factory-girl
Defining Factories
// tests/factories/index.js
const factory = require('factory-girl').factory;
const { faker } = require('@faker-js/faker');
const User = require('../../models/User');
const Product = require('../../models/Product');
const Order = require('../../models/Order');
// User factory
factory.define('User', User, {
username: factory.sequence('User.username', (n) => `user${n}`),
email: factory.sequence('User.email', (n) => `user${n}@example.com`),
password: 'password123',
isAdmin: false,
name: () => `${faker.person.firstName()} ${faker.person.lastName()}`
});
// Admin user factory
factory.define('Admin', User, {
username: factory.sequence('Admin.username', (n) => `admin${n}`),
email: factory.sequence('Admin.email', (n) => `admin${n}@example.com`),
password: 'admin123',
isAdmin: true,
name: () => `${faker.person.firstName()} ${faker.person.lastName()}`
});
// Product factory
factory.define('Product', Product, {
name: () => faker.commerce.productName(),
slug: factory.sequence('Product.slug', (n) => `product-${n}`),
description: () => faker.commerce.productDescription(),
price: () => parseFloat(faker.commerce.price({ min: 10, max: 1000 })),
category: factory.assoc('Category', '_id'),
countInStock: () => faker.number.int({ min: 0, max: 100 }),
rating: () => faker.number.float({ min: 1, max: 5, precision: 0.1 }),
numReviews: () => faker.number.int({ min: 0, max: 100 })
});
// Order factory
factory.define('Order', Order, {
user: factory.assoc('User', '_id'),
orderItems: [],
shippingAddress: {
fullName: () => faker.person.fullName(),
address: () => faker.location.streetAddress(),
city: () => faker.location.city(),
postalCode: () => faker.location.zipCode(),
country: 'USA'
},
paymentMethod: 'PayPal',
itemsPrice: 0,
shippingPrice: 10,
taxPrice: 0,
totalPrice: 0,
isPaid: false,
isDelivered: false
});
module.exports = factory;
Using Factories in Tests
// tests/integration/product.test.js
const mongoose = require('mongoose');
const request = require('supertest');
const app = require('../../app');
const factory = require('../factories');
describe('Products API', () => {
beforeAll(async () => {
await mongoose.connect('mongodb://localhost/test_db');
});
afterAll(async () => {
await mongoose.connection.dropDatabase();
await mongoose.connection.close();
});
beforeEach(async () => {
await mongoose.connection.db.dropDatabase();
});
describe('GET /api/products', () => {
it('should return all products', async () => {
// Create 3 products using the factory
await factory.createMany('Product', 3);
const res = await request(app).get('/api/products');
expect(res.status).toBe(200);
expect(res.body.length).toBe(3);
});
});
describe('GET /api/products/:id', () => {
it('should return a product if valid id is passed', async () => {
const product = await factory.create('Product');
const res = await request(app).get(`/api/products/${product._id}`);
expect(res.status).toBe(200);
expect(res.body.name).toBe(product.name);
});
});
describe('POST /api/products', () => {
it('should create a product if authenticated as admin', async () => {
// Create an admin user
const admin = await factory.create('Admin');
// Get auth token (implementation depends on your auth system)
const token = generateAuthToken(admin);
const productData = factory.build('Product');
const res = await request(app)
.post('/api/products')
.set('Authorization', `Bearer ${token}`)
.send(productData);
expect(res.status).toBe(201);
expect(res.body.name).toBe(productData.name);
});
});
});
Using Factories for Application Seeding
// scripts/seed-with-factories.js
const mongoose = require('mongoose');
const factory = require('../tests/factories');
mongoose.connect('mongodb://localhost/my_app_dev')
.then(() => console.log('Connected to MongoDB...'))
.catch(err => console.error('Could not connect to MongoDB...', err));
async function clearDatabase() {
const collections = mongoose.connection.collections;
for (const key in collections) {
await collections[key].deleteMany();
}
}
async function seedDatabase() {
try {
await clearDatabase();
// Create admin user
const admin = await factory.create('Admin', {
username: 'admin',
email: 'admin@example.com'
});
// Create regular users
const users = await factory.createMany('User', 20);
console.log(`Created ${users.length + 1} users`);
// Create categories
const categories = await Promise.all([
factory.create('Category', { name: 'Electronics', slug: 'electronics' }),
factory.create('Category', { name: 'Clothing', slug: 'clothing' }),
factory.create('Category', { name: 'Books', slug: 'books' })
]);
console.log(`Created ${categories.length} categories`);
// Create products for each category
const products = [];
for (const category of categories) {
const categoryProducts = await factory.createMany('Product', 10, {
category: category._id
});
products.push(...categoryProducts);
}
console.log(`Created ${products.length} products`);
// Create orders
const orders = [];
for (const user of [admin, ...users]) {
// Each user gets 1-3 orders
const orderCount = Math.floor(Math.random() * 3) + 1;
for (let i = 0; i < orderCount; i++) {
// Select 1-5 random products
const orderProducts = [];
const productCount = Math.floor(Math.random() * 5) + 1;
for (let j = 0; j < productCount; j++) {
const product = products[Math.floor(Math.random() * products.length)];
const quantity = Math.floor(Math.random() * 3) + 1;
orderProducts.push({
product: product._id,
name: product.name,
quantity,
image: product.image || 'placeholder.jpg',
price: product.price
});
}
// Calculate totals
const itemsPrice = orderProducts.reduce(
(sum, item) => sum + item.price * item.quantity,
0
);
const shippingPrice = itemsPrice > 100 ? 0 : 10;
const taxPrice = parseFloat((itemsPrice * 0.15).toFixed(2));
const totalPrice = itemsPrice + shippingPrice + taxPrice;
// Create the order
const order = await factory.create('Order', {
user: user._id,
orderItems: orderProducts,
itemsPrice,
shippingPrice,
taxPrice,
totalPrice
});
orders.push(order);
}
}
console.log(`Created ${orders.length} orders`);
console.log('Database seeding completed successfully');
mongoose.disconnect();
} catch (error) {
console.error('Seeding error:', error);
mongoose.disconnect();
process.exit(1);
}
}
seedDatabase();
Real-World Example: Test-Driven Development Workflow
In a test-driven development workflow, factories become essential. For example, a team developing an e-commerce platform would create factories for users, products, orders, and reviews. Developers write tests that use these factories to create consistent test scenarios. When they need to add a feature like "wishlist functionality," they first create a wishlist factory, then write tests using that factory and existing user/product factories. This consistent approach speeds up development and ensures reliable testing.
Integration with CI/CD Pipelines
Automating database seeding as part of your continuous integration and deployment pipeline ensures consistent testing and deployment environments.
Example GitHub Actions Workflow
# .github/workflows/test.yml
name: Test
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main, develop ]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:13
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: test_db
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v2
- name: Set up Node.js
uses: actions/setup-node@v2
with:
node-version: '16'
- name: Install dependencies
run: npm ci
- name: Run migrations
run: npm run migrate:test
- name: Seed test database
run: npm run seed:test
- name: Run tests
run: npm test
env:
NODE_ENV: test
DB_HOST: localhost
DB_USER: postgres
DB_PASSWORD: postgres
DB_NAME: test_db
Environment-Specific Seeding Scripts
// package.json
{
"scripts": {
"migrate:dev": "knex migrate:latest --env development",
"migrate:test": "knex migrate:latest --env test",
"migrate:prod": "knex migrate:latest --env production",
"seed:dev": "knex seed:run --env development",
"seed:test": "knex seed:run --env test",
"seed:prod": "knex seed:run --env production --specific=01_reference_data.js",
"setup:dev": "npm run migrate:dev && npm run seed:dev",
"setup:test": "npm run migrate:test && npm run seed:test",
"reset:dev": "knex migrate:rollback --all --env development && npm run setup:dev",
"reset:test": "knex migrate:rollback --all --env test && npm run setup:test"
}
}
Database Reset vs. Incremental Updates
Complete Reset Approach
- Drop all tables and recreate from scratch
- Ensures a clean state for testing
- Good for development and testing environments
- Slower for large datasets
Incremental Update Approach
- Only seed missing or changed data
- Faster for large datasets
- Preserves existing data
- Can lead to inconsistencies if not carefully managed
Production Seeding Considerations
- Only seed essential reference data (countries, roles, etc.)
- Use environment checks to prevent test data in production
- Back up production data before running any seeds
- Consider using migrations instead of seeds for data that must exist
- Use transactions to ensure atomic operations
Best Practices and Common Pitfalls
Seeding Best Practices
- Version Control Your Seeds: Keep seed files in version control with your application code
- Maintain Idempotency: Seeds should be safely re-runnable without causing duplicates
- Handle Dependencies: Seed files should respect data relationships and foreign keys
- Use Transactions: Wrap seeds in transactions to ensure atomic operations
- Separate Reference and Test Data: Keep essential reference data separate from test/development data
- Parameterize Seed Volumes: Make it easy to generate different amounts of data
- Include Clean-up Logic: Seeds should clean up existing data before inserting new data
- Document Seed Data: Include comments about the purpose and structure of seed data
Common Seeding Pitfalls
- Seeding Too Much Data: Generating excessive test data can slow down development
- Inconsistent Foreign Keys: Failing to maintain referential integrity in seed data
- Hard-Coded IDs: Using explicit IDs that may conflict across environments
- Missing Dependencies: Not accounting for data dependencies when ordering seed operations
- Slow Seeds: Inefficiently generating or inserting large volumes of data
- Environment Confusion: Accidentally running development seeds in production
- Unrealistic Data: Using obviously fake data that doesn't exercise real business logic
Analogy: Restaurant Opening
Think of database seeding like preparing a restaurant for its opening day. Migrations are like building the restaurant—the kitchen, dining room, bar, etc. Reference data seeding is like stocking the essentials—plates, glasses, cooking equipment, and basic ingredients that every restaurant needs. Development seeding is like preparing a variety of sample dishes for the staff to practice with. Test seeding is like setting up controlled scenarios to ensure each station operates correctly. And in production, you only want the essential ingredients ready—you don't want random test dishes sitting in the kitchen when real customers arrive.
Practice Activities
Activity 1: Create a Complete Seeding System
Develop a seeding system for a blog application with the following requirements:
- Create reference seeds for user roles, post categories, and post statuses
- Create development seeds for users, posts, and comments
- Use Faker.js to generate realistic content
- Implement environment-specific seeding (different data for development, testing, production)
- Create NPM scripts to run different types of seeds
Activity 2: Factory Pattern Implementation
Implement the factory pattern for test data generation:
- Set up factory-girl or a similar library
- Define factories for a simple e-commerce system (users, products, orders)
- Create relationships between factories
- Write test cases that use factories to create test scenarios
Activity 3: Performance Optimization Challenge
Optimize a seeding script for performance:
- Start with a seed script that generates 10,000 users and 50,000 posts
- Identify performance bottlenecks
- Implement batch inserts
- Use parallel processing where appropriate
- Measure and report performance improvements