Introduction to Databases

Understanding Relational and NoSQL Database Systems

What are Databases?

At their core, databases are organized collections of structured information or data, typically stored electronically in a computer system. They serve as the backbone of modern applications, allowing for efficient storage, retrieval, manipulation, and management of data.

Imagine a library: without a systematic way to organize books (like the Dewey Decimal System), finding a specific book would be incredibly time-consuming. Databases provide this organization for our digital data, making it accessible and manageable at scale.

graph TD A[Application] -->|Requests Data| B[Database System] B -->|Returns Data| A B --- C[(Stored Data)] D[Users] -->|Interact with| A

Two Major Database Paradigms

In the world of databases, there are two predominant paradigms: Relational (SQL) databases and Non-Relational (NoSQL) databases. Each has its own philosophy, strengths, and ideal use cases.

Relational Databases (SQL)

Users Table id: 1 name: "John Doe" email: "john@ex..." role_id: 2 Roles Table id: 2 name: "Admin" permissions: "..."

Non-Relational Databases (NoSQL)

User Document id: 1 name: "John Doe" email: "john@example.com" role: { name: "Admin", permissions: ["create", "read", "update", "delete"] }

Relational Databases (SQL)

Relational databases organize data into tables (relations) with rows and columns. They follow a strict schema and use Structured Query Language (SQL) for defining and manipulating the data.

Key Characteristics

Popular Relational Database Management Systems (RDBMS)

Real-World Examples

Banking Systems: Banks rely heavily on relational databases to maintain transaction records, account information, and ensure data integrity. When you transfer money, the ACID properties ensure that the money is properly debited from one account and credited to another.

E-commerce Platforms: Online stores like Amazon use relational databases to manage product catalogs, customer information, and order processing where data consistency is crucial.

Sample SQL Code


-- Creating tables with relationships
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    title VARCHAR(200) NOT NULL,
    content TEXT,
    published_at TIMESTAMP
);

-- Querying related data
SELECT users.username, posts.title 
FROM users 
JOIN posts ON users.id = posts.user_id
WHERE posts.published_at > '2023-01-01';
            

NoSQL Databases

NoSQL ("Not Only SQL") databases emerged as a response to the limitations of relational databases, especially for handling large volumes of unstructured data, high traffic loads, and the need for horizontal scalability.

Key Characteristics

graph TD A[NoSQL Databases] --> B[Document Stores] A --> C[Key-Value Stores] A --> D[Column-Family Stores] A --> E[Graph Databases] B --> B1[MongoDB] B --> B2[CouchDB] C --> C1[Redis] C --> C2[DynamoDB] D --> D1[Cassandra] D --> D2[HBase] E --> E1[Neo4j] E --> E2[Amazon Neptune]

Types of NoSQL Databases

1. Document Stores

Store data in flexible, JSON-like documents. Each document can have a different structure.

Examples: MongoDB, Couchbase, Firebase Firestore


// MongoDB document example
{
  "_id": ObjectId("5f8d5f"),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "age": 28,
    "interests": ["coding", "hiking", "reading"]
  },
  "posts": [
    { "title": "My first post", "content": "Hello world!" },
    { "title": "Learning NoSQL", "content": "It's fascinating!" }
  ]
}
            
2. Key-Value Stores

The simplest NoSQL databases, storing data as key-value pairs. Optimized for high-speed reads and writes.

Examples: Redis, Amazon DynamoDB, Riak

Real-World Example: Session storage in web applications, caching layers, real-time leaderboards


// Redis commands example
SET user:1000 '{"username":"johndoe","email":"john@example.com"}'
GET user:1000
            
3. Column-Family Stores

Store data in column families, optimized for queries over large datasets.

Examples: Apache Cassandra, HBase, ScyllaDB

Real-World Example: Time-series data, weather data, IoT sensor readings

4. Graph Databases

Specialized for data with complex relationships, storing data in nodes and edges.

Examples: Neo4j, Amazon Neptune, ArangoDB

Real-World Example: Social networks, recommendation engines, fraud detection systems


// Neo4j Cypher query example
MATCH (user:Person {name: 'John'})-[:FRIENDS_WITH]->(friend)
RETURN friend.name
            

Choosing Between Relational and NoSQL

Factor Relational (SQL) NoSQL
Data Structure Structured (tables with rows and columns) Varies (documents, key-value pairs, graphs, etc.)
Schema Fixed schema (defined in advance) Dynamic schema (can evolve as needed)
Scalability Vertical (scale up with more powerful hardware) Horizontal (scale out across multiple servers)
Transactions ACID-compliant (strong consistency) Often BASE model (eventual consistency)
Query Language SQL (standardized) Varies by database (often proprietary)
Relationships Handled through foreign keys and joins Often denormalized or handled through references

When to Choose Relational Databases

When to Choose NoSQL Databases

Real-World Selection Examples

Netflix: Multiple Database Types

Netflix uses a combination of database technologies. They use Amazon DynamoDB (NoSQL) for handling user session information and viewing history due to its high scalability and performance. For billing and payment processing, they rely on relational databases to ensure transaction integrity.

Airbnb: PostgreSQL + NoSQL

Airbnb primarily uses PostgreSQL (relational) for their core booking data, but they incorporate NoSQL solutions like Apache Hive for analytics and Elasticsearch for their search capabilities.

Modern Hybrid Approaches

Many modern applications use a polyglot persistence approach—using different database types for different aspects of the application:

flowchart TD A[E-commerce Application] --> B[Product Catalog] A --> C[User Sessions] A --> D[Order Processing] A --> E[Recommendations] B --> B1[MongoDB
Document Store] C --> C1[Redis
Key-Value Store] D --> D1[PostgreSQL
Relational Database] E --> E1[Neo4j
Graph Database]

Practical Activities

Activity 1: Database Selection Scenario

Consider the following application scenarios and determine which type of database would be most appropriate. Justify your choices.

  1. A banking application that processes financial transactions
  2. A social media platform that needs to store user profiles and relationships
  3. A content management system for a blog with thousands of articles
  4. A real-time analytics platform processing millions of IoT sensor readings
  5. An e-commerce product catalog with frequent schema changes

Activity 2: Data Modeling Exercise

For a simple blog application with users, posts, and comments:

  1. Create a relational data model with tables, primary keys, and foreign keys
  2. Design an equivalent NoSQL document structure
  3. Compare the pros and cons of each approach for this specific use case

Activity 3: Query Comparison

Given a dataset of products and categories:

  1. Write a SQL query to find all products in a specific category with their prices
  2. Write the equivalent query in MongoDB syntax
  3. Discuss which is more intuitive and why

Further Reading and Resources

Coming Up: SQL Basics

In our next session, we'll dive deeper into SQL fundamentals, learning how to create tables, insert data, and write queries to retrieve and manipulate information in relational databases.