Database Design Principles

Building Efficient and Maintainable Database Structures

Why Database Design Matters

Database design is to a software application what architecture is to a building. Just as poor architectural decisions can lead to structural problems, inefficient space usage, and maintenance nightmares, poor database design can result in slow performance, data anomalies, and difficult-to-maintain systems.

As full-stack developers, we often find ourselves making critical database design decisions that will affect our applications for years to come. A well-designed database can:

Today, we'll explore the principles that guide efficient database design, focusing primarily on relational databases while touching on NoSQL considerations.

The Database Design Process

Effective database design follows a structured approach that moves from requirements to implementation:

flowchart LR A[Requirement Analysis] --> B[Conceptual Design] B --> C[Logical Design] C --> D[Normalization] D --> E[Physical Design] E --> F[Implementation] F --> G[Maintenance & Optimization]

Step 1: Requirement Analysis

Begin by understanding what data your application needs to store and how it will be used. Ask questions like:

Example: E-commerce Requirements

For an e-commerce application, you might identify:

  • Entities: Products, Customers, Orders, Reviews, Categories
  • Attributes: Product (name, description, price, inventory), Customer (name, email, address)
  • Relationships: Customers place Orders, Orders contain Products, Products belong to Categories
  • Constraints: Each order must have a customer, products must have non-negative inventory
  • Common queries: List orders by customer, search products by category, calculate sales by period

Step 2: Conceptual Design - Entity-Relationship Modeling

Entity-Relationship (ER) modeling is a technique for visualizing the entities in your system and how they relate to each other. It helps translate business requirements into a database structure.

Key Components of ER Diagrams:
erDiagram CUSTOMER ||--o{ ORDER : places CUSTOMER { int customer_id string name string email string address } ORDER ||--|{ ORDER_ITEM : contains ORDER { int order_id int customer_id date order_date string status float total_amount } PRODUCT ||--o{ ORDER_ITEM : "ordered as" PRODUCT { int product_id string name string description float price int inventory } PRODUCT }|--|| CATEGORY : "belongs to" CATEGORY { int category_id string name string description } ORDER_ITEM { int order_id int product_id int quantity float unit_price } CUSTOMER ||--o{ REVIEW : writes PRODUCT ||--o{ REVIEW : "receives" REVIEW { int review_id int customer_id int product_id int rating string comment date review_date }
Relationship Types:
One-to-One (1:1)

Each entity instance relates to exactly one instance of another entity.

Example: Each User has exactly one User Profile.

User User Profile 1:1
One-to-Many (1:N)

One entity instance can relate to multiple instances of another entity.

Example: One Customer can place many Orders.

Customer Order 1:N
Many-to-Many (M:N)

Multiple instances of an entity relate to multiple instances of another entity.

Example: Students can enroll in multiple Courses, and each Course can have multiple Students.

Student Course M:N

Step 3: Logical Design

In this phase, we transform the conceptual model into a logical database structure (tables, columns, relationships) without considering physical storage details.

For relational databases, this involves:

Example: Converting a Many-to-Many Relationship

A many-to-many relationship between Students and Courses would be implemented with three tables:


-- Students table
CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

-- Courses table
CREATE TABLE courses (
    course_id SERIAL PRIMARY KEY,
    course_code VARCHAR(20) UNIQUE NOT NULL,
    title VARCHAR(100) NOT NULL,
    description TEXT,
    credits INTEGER NOT NULL
);

-- Junction table for enrollments
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(student_id),
    course_id INTEGER REFERENCES courses(course_id),
    enrollment_date DATE NOT NULL,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);
                

Step 4: Normalization - Organizing Data Efficiently

Normalization is a systematic process of organizing data to minimize redundancy and dependency. It involves decomposing larger tables into smaller, more focused ones, and establishing relationships between them.

Benefits of Normalization:

The Normal Forms:

There are several levels of normalization, each building upon the previous one. We'll focus on the first three, which are the most commonly used:

First Normal Form (1NF)

Rule: Each table cell should contain a single value, and each record should be unique.

Not in 1NF (Unnormalized):
Student Courses
John Smith Math, Physics, Chemistry
Jane Doe Biology, Chemistry
In 1NF:
Student Course
John Smith Math
John Smith Physics
John Smith Chemistry
Jane Doe Biology
Jane Doe Chemistry
Second Normal Form (2NF)

Rule: The table must be in 1NF AND all non-key attributes must depend on the entire primary key.

This mainly applies to tables with composite primary keys (keys composed of multiple columns).

Not in 2NF:
Student_ID Course_ID Student_Name Course_Title Grade
1 101 John Smith Algebra A
1 102 John Smith Physics B+

Problem: Student_Name depends only on Student_ID, not the composite key (Student_ID, Course_ID).

In 2NF:

Students Table:

Student_ID Student_Name
1 John Smith

Courses Table:

Course_ID Course_Title
101 Algebra
102 Physics

Enrollments Table:

Student_ID Course_ID Grade
1 101 A
1 102 B+
Third Normal Form (3NF)

Rule: The table must be in 2NF AND all non-key attributes must depend directly on the primary key (and not on other non-key attributes).

Not in 3NF:
Order_ID Product_ID Product_Name Category_ID Category_Name Quantity
1001 5 iPhone 3 Electronics 1

Problem: Category_Name depends on Category_ID, not directly on the primary key (Order_ID, Product_ID).

In 3NF:

Orders Table:

Order_ID Product_ID Quantity
1001 5 1

Products Table:

Product_ID Product_Name Category_ID
5 iPhone 3

Categories Table:

Category_ID Category_Name
3 Electronics

Practical Example: Normalizing an E-commerce Order

Let's walk through normalizing an e-commerce order table that initially contains redundant data:

Unnormalized Table (Customer Orders)

CREATE TABLE customer_orders (
    order_id INT,
    order_date DATE,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_address TEXT,
    product_id INT,
    product_name VARCHAR(100),
    product_description TEXT,
    product_price DECIMAL(10,2),
    quantity INT,
    total_price DECIMAL(10,2)
);
            
First Normal Form (1NF)

Already in 1NF as it has atomic values and a unique identifier (composite key of order_id and product_id).

Second Normal Form (2NF)

-- Orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_address TEXT
);

-- Order Items table
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),
    product_description TEXT,
    product_price DECIMAL(10,2),
    quantity INT,
    total_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);
            
Third Normal Form (3NF)

-- Customers table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100) UNIQUE,
    address TEXT
);

-- Orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Products table
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(100),
    description TEXT,
    price DECIMAL(10,2)
);

-- Order Items table
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    unit_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
            
Final Result

Notice how we've eliminated redundancy. Customer information is stored only once, regardless of how many orders they place. Likewise, product information is stored only once, regardless of how many orders include that product.

Real-World Normalization Considerations

While normalization is a powerful technique for organizing data, there are important practical considerations:

Step 5: Physical Design

Physical design involves making decisions about how the database will be physically implemented in a specific DBMS. This includes:

Indexes

Indexes are data structures that improve the speed of data retrieval operations. They work similarly to a book's index, allowing the database to find rows quickly without scanning the entire table.

When to Create Indexes:
Index Types:
Example: Creating Indexes

-- Simple index on a single column
CREATE INDEX idx_products_name ON products(name);

-- Composite index on multiple columns
CREATE INDEX idx_users_location ON users(country, city);

-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Partial index (PostgreSQL)
CREATE INDEX idx_orders_recent ON orders(order_date) 
WHERE order_date > CURRENT_DATE - INTERVAL '30 days';
            
⚠️ Index Considerations

While indexes speed up reads, they slow down writes because the index must be updated. Don't over-index—only create indexes that will actually be used. Monitor performance and adjust as needed.

Constraints Revisited

Constraints enforce data integrity rules. The most common types are:

Example: Adding Constraints

-- Adding constraints to existing tables
ALTER TABLE products 
ADD CONSTRAINT chk_positive_price CHECK (price > 0);

ALTER TABLE orders
ADD CONSTRAINT fk_orders_customers FOREIGN KEY (customer_id) 
REFERENCES customers(customer_id) ON DELETE RESTRICT;

-- Setting a default value
ALTER TABLE user_sessions 
ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP;
            

Denormalization: When to Break the Rules

While normalization reduces redundancy, sometimes introducing controlled redundancy can improve performance—a process called denormalization.

When to Consider Denormalization:

Common Denormalization Techniques:

Example: Denormalization for an E-commerce Dashboard

-- Denormalized order summary table for reporting
CREATE TABLE order_summaries (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_id INT,
    customer_name VARCHAR(100),
    total_items INT,
    total_amount DECIMAL(10,2),
    shipping_address TEXT,
    status VARCHAR(50),
    payment_method VARCHAR(50),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
            
⚠️ Denormalization Trade-offs

Denormalization introduces data redundancy, which means:

  • Updates must maintain consistency across all redundant copies
  • Data anomalies become possible if not carefully managed
  • Extra storage is required

Always consider whether better indexing or query optimization could solve performance issues before denormalizing.

NoSQL Database Design Considerations

NoSQL database design follows different principles than relational design, often prioritizing performance and specific access patterns over normalization.

Key NoSQL Design Principles:

Document Database Design (e.g., MongoDB)

Document databases excel at storing semi-structured data in flexible, JSON-like documents.

Example: E-commerce Data in MongoDB

// User document
{
  "_id": ObjectId("5f8e7d..."),
  "username": "johndoe",
  "email": "john@example.com",
  "address": {
    "street": "123 Main St",
    "city": "Anytown",
    "state": "CA",
    "zip": "12345"
  },
  "payment_methods": [
    {
      "type": "credit_card",
      "last_four": "1234",
      "expiry": "05/25"
    }
  ]
}

// Order document (with embedded items)
{
  "_id": ObjectId("5f8e8b..."),
  "user_id": ObjectId("5f8e7d..."),
  "order_date": ISODate("2023-04-15T..."),
  "status": "shipped",
  "shipping_address": {
    "street": "123 Main St",
    "city": "Anytown",
    "state": "CA",
    "zip": "12345"
  },
  "items": [
    {
      "product_id": ObjectId("5f8e9c..."),
      "name": "Smartphone X",
      "price": 699.99,
      "quantity": 1
    },
    {
      "product_id": ObjectId("5f8e9d..."),
      "name": "Wireless Headphones",
      "price": 149.99,
      "quantity": 1
    }
  ],
  "total": 849.98
}
            
Key Design Patterns for Document Databases:

Key-Value Store Design (e.g., Redis, DynamoDB)

Key-value stores are optimized for high-throughput, low-latency access to data by key.

Design Considerations:
Example: DynamoDB Keys for an E-commerce Application

// Using composite keys for efficient access patterns
{
  "PK": "USER#123",       // Partition key
  "SK": "PROFILE",        // Sort key
  "username": "johndoe",
  "email": "john@example.com"
}

{
  "PK": "USER#123",       // Same user
  "SK": "ORDER#456",      // Different entity type
  "order_date": "2023-04-15",
  "status": "shipped"
}

{
  "PK": "ORDER#456",      // Order as partition key
  "SK": "ITEM#789",       // Order item
  "product": "Smartphone X",
  "price": 699.99,
  "quantity": 1
}
            

Practical Activities

Activity 1: ER Modeling for a Library System

Design an ER diagram for a library management system with the following requirements:

  • The library lends books, DVDs, and other materials to members
  • Each item can have multiple copies available for lending
  • Members can place holds on items that are currently checked out
  • The system tracks overdue items and calculates fines
  • Books have authors, publishers, and genres

Create a complete ER diagram identifying entities, attributes, and relationships with their cardinality.

Activity 2: Normalization Exercise

Normalize the following table to 3NF:


CREATE TABLE university_courses (
    course_id VARCHAR(10),
    course_name VARCHAR(100),
    department_code VARCHAR(5),
    department_name VARCHAR(50),
    instructor_id INTEGER,
    instructor_name VARCHAR(100),
    instructor_email VARCHAR(100),
    semester VARCHAR(20),
    year INTEGER,
    room_number VARCHAR(10),
    building_name VARCHAR(50),
    credits INTEGER,
    max_students INTEGER
);
                

Create the necessary tables with appropriate primary and foreign keys.

Activity 3: Designing for Different Database Types

For a social media application with posts, comments, likes, and user profiles:

  1. Design a normalized relational schema
  2. Design a document-based schema using MongoDB's approach
  3. Compare the strengths and weaknesses of each approach for typical social media queries and operations

Database Design Best Practices

Further Reading and Resources

Coming Up: PostgreSQL with Node.js

In our next session, we'll move from theory to practice by setting up PostgreSQL and connecting it to a Node.js application. We'll explore how to create database connections, execute queries, and implement a data access layer.