SQL Basics

Introduction to Structured Query Language

What is SQL?

SQL (Structured Query Language) is the standard language for interacting with relational databases. It allows you to create, read, update, and delete data in a structured and predictable way. Since its development in the 1970s at IBM, SQL has become the foundation for most relational database management systems (RDBMS).

Think of SQL as the language you use to communicate with your database—similar to how you might use English to ask a librarian to help you find a book. SQL provides a standardized way to "ask questions" about your data and to manipulate that data.

graph LR A[Developer] -->|SQL Queries| B[Database Server] B -->|Results| A B --- C[(Database)]

Why Learn SQL?

SQL remains one of the most sought-after skills in the tech industry for several compelling reasons:

Real-World Application

Imagine you're building an e-commerce platform. With SQL, you can:

  • Track inventory levels and automatically trigger reorders
  • Calculate revenue across different time periods and product categories
  • Identify your most valuable customers based on purchase history
  • Monitor shipping times and identify potential logistical issues

All of these operations rely on SQL's ability to filter, join, and aggregate data efficiently.

Core Components of SQL

SQL consists of several components that serve different purposes:

Data Definition Language (DDL)

DDL commands are used to define the database structure or schema.

Data Manipulation Language (DML)

DML commands are used to manipulate data within the database.

Data Control Language (DCL)

DCL commands are used to control access to data within the database.

Transaction Control Commands

These commands manage the changes made by DML commands.

flowchart TD A[SQL] --> B[DDL
Data Definition Language] A --> C[DML
Data Manipulation Language] A --> D[DCL
Data Control Language] A --> E[TCL
Transaction Control Language] B --> B1[CREATE] B --> B2[ALTER] B --> B3[DROP] B --> B4[TRUNCATE] C --> C1[SELECT] C --> C2[INSERT] C --> C3[UPDATE] C --> C4[DELETE] D --> D1[GRANT] D --> D2[REVOKE] E --> E1[COMMIT] E --> E2[ROLLBACK] E --> E3[SAVEPOINT]

Creating Database Tables

Tables are the fundamental structures in a relational database. They organize data into rows and columns, much like a spreadsheet. Each table should represent a single entity or concept in your application domain.

Basic CREATE TABLE Syntax


CREATE TABLE table_name (
    column1_name data_type constraints,
    column2_name data_type constraints,
    ...
    table_constraints
);
            

Common Data Types

Numeric Types
  • INTEGER - Whole numbers
  • SMALLINT - Small-range integer
  • BIGINT - Large-range integer
  • DECIMAL(p,s) - Exact decimal with precision and scale
  • NUMERIC(p,s) - Equivalent to DECIMAL
  • REAL - Single precision floating-point
  • DOUBLE PRECISION - Double precision floating-point
Character Types
  • CHAR(n) - Fixed-length character string
  • VARCHAR(n) - Variable-length character string
  • TEXT - Variable unlimited length string
Date and Time Types
  • DATE - Calendar date (year, month, day)
  • TIME - Time of day
  • TIMESTAMP - Date and time
  • INTERVAL - Period of time
Boolean Type
  • BOOLEAN - true/false
Other Types
  • JSON/JSONB - JSON data (PostgreSQL)
  • ARRAY - Array of values (PostgreSQL)
  • UUID - Universally unique identifiers
  • BLOB - Binary Large Object

Constraints

Constraints are rules enforced on the data columns in a table. They maintain the accuracy and reliability of the data.

Example: Creating a Users Table


CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    date_of_birth DATE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT TRUE,
    CHECK (date_of_birth > '1900-01-01')
);
            

Real-world analogy: Creating a table is like designing a form. You decide what information you're going to collect (columns) and what rules each piece of information must follow (constraints). The PRIMARY KEY is like a Social Security number or customer ID—it uniquely identifies each person who fills out the form.

Basic SQL Queries

Now that we understand how to create tables, let's learn how to interact with the data using SQL queries. The SELECT statement is the foundation of data retrieval in SQL.

SELECT Statement Basics


SELECT column1, column2, ... 
FROM table_name
WHERE condition
ORDER BY column1 [ASC|DESC];
            

Example: Querying a Users Table

Let's assume we have inserted some data into our users table:


-- Retrieve all users
SELECT * FROM users;

-- Retrieve specific columns
SELECT username, email, created_at FROM users;

-- Retrieve with a condition
SELECT * FROM users WHERE is_active = TRUE;

-- Retrieve with multiple conditions
SELECT * FROM users 
WHERE date_of_birth > '1990-01-01' AND is_active = TRUE;

-- Retrieve with sorting
SELECT * FROM users ORDER BY created_at DESC;

-- Retrieve with limiting results
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
            

Real-world analogy: Think of a SELECT query as asking a librarian to retrieve books for you. You might say, "Find me all mystery books published after 2020 and arrange them by author's last name." This translates directly to SQL: SELECT * FROM books WHERE genre = 'Mystery' AND publication_year > 2020 ORDER BY author_last_name;

Filtering Data with WHERE

The WHERE clause filters records based on specific conditions. It supports a variety of comparison operators:

Examples of WHERE Conditions


-- Equal operator
SELECT * FROM products WHERE price = 19.99;

-- Greater than
SELECT * FROM orders WHERE order_date > '2023-01-01';

-- Between a range
SELECT * FROM users 
WHERE date_of_birth BETWEEN '1980-01-01' AND '1999-12-31';

-- Pattern matching
SELECT * FROM products 
WHERE product_name LIKE 'Smart%'; -- Anything starting with "Smart"

-- In a list
SELECT * FROM orders 
WHERE status IN ('Shipped', 'Delivered');

-- NULL check
SELECT * FROM users WHERE last_name IS NULL;

-- Combined conditions
SELECT * FROM products 
WHERE category = 'Electronics' 
AND price < 500 
AND stock_quantity > 0;
            

Inserting Data into Tables

The INSERT statement adds new records to a table.

Basic INSERT Syntax


INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);
            

Example: Adding Users


-- Insert a single user
INSERT INTO users (username, email, password_hash, first_name, last_name, date_of_birth)
VALUES ('johndoe', 'john@example.com', 'hashed_password_here', 'John', 'Doe', '1985-07-15');

-- Insert multiple users
INSERT INTO users (username, email, password_hash, first_name, last_name, date_of_birth)
VALUES 
    ('janedoe', 'jane@example.com', 'hashed_password_here', 'Jane', 'Doe', '1990-03-20'),
    ('bobsmith', 'bob@example.com', 'hashed_password_here', 'Bob', 'Smith', '1988-11-07');
            

Note: You don't need to specify values for columns with DEFAULT constraints or for auto-incrementing columns like SERIAL (PostgreSQL) or AUTO_INCREMENT (MySQL).

Updating and Deleting Data

The UPDATE statement modifies existing records, while the DELETE statement removes records.

UPDATE Statement


UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
            

Example: Updating User Information


-- Update a single field for one user
UPDATE users
SET is_active = FALSE
WHERE user_id = 5;

-- Update multiple fields
UPDATE users
SET email = 'newemail@example.com', last_name = 'Johnson'
WHERE username = 'johndoe';

-- Update with calculations
UPDATE products
SET price = price * 1.1 -- Increase price by 10%
WHERE category = 'Electronics';
            
⚠️ Important Warning

Always use a WHERE clause with UPDATE statements, unless you genuinely want to update all records in the table. Without a WHERE clause, the UPDATE will affect all records!

DELETE Statement


DELETE FROM table_name
WHERE condition;
            

Example: Removing User Records


-- Delete a specific user
DELETE FROM users
WHERE user_id = 7;

-- Delete users based on a condition
DELETE FROM users
WHERE created_at < '2020-01-01' AND is_active = FALSE;
            
⚠️ Important Warning

Like UPDATE, always use a WHERE clause with DELETE statements unless you intend to delete all records. Additionally, consider using transactions when performing DELETE operations to allow for rollbacks if needed.

SQL Functions and Aggregation

SQL offers a variety of built-in functions that allow you to perform calculations and manipulate data.

Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single value.

Examples of Aggregate Functions


-- Count total number of users
SELECT COUNT(*) FROM users;

-- Count active users
SELECT COUNT(*) FROM users WHERE is_active = TRUE;

-- Calculate total order value
SELECT SUM(amount) FROM orders WHERE status = 'Completed';

-- Find average product price
SELECT AVG(price) FROM products WHERE category = 'Electronics';

-- Find most expensive and least expensive products
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive 
FROM products;

-- Statistics by category
SELECT category, 
    COUNT(*) AS product_count, 
    AVG(price) AS avg_price, 
    MIN(price) AS min_price, 
    MAX(price) AS max_price
FROM products
GROUP BY category;
            

Real-world analogy: Aggregate functions are like asking for a summary of your bank account activity. Instead of seeing each transaction, you might want to know the total amount spent, the average daily expenses, or the largest single purchase.

GROUP BY Clause

The GROUP BY clause groups rows with the same values in specified columns, typically used with aggregate functions.


-- Count users by signup month
SELECT 
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    COUNT(*) AS new_users
FROM users
GROUP BY EXTRACT(YEAR FROM created_at), EXTRACT(MONTH FROM created_at)
ORDER BY year, month;

-- Calculate average order value by customer
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    AVG(total_amount) AS avg_order_value
FROM orders
GROUP BY customer_id
ORDER BY avg_order_value DESC;
            

HAVING Clause

The HAVING clause filters the results of a GROUP BY based on an aggregate function condition. It's like a WHERE clause but for grouped results.


-- Find customers who've placed more than 5 orders
SELECT 
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY order_count DESC;

-- Find product categories with average price above $100
SELECT 
    category,
    AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 100
ORDER BY avg_price DESC;
            

Working with Multiple Tables

One of the most powerful features of relational databases is the ability to establish relationships between tables and query related data. This is achieved using JOINs.

erDiagram USERS ||--o{ ORDERS : places USERS { int user_id string username string email } ORDERS ||--|{ ORDER_ITEMS : contains ORDERS { int order_id int user_id date order_date string status } PRODUCTS ||--o{ ORDER_ITEMS : includes PRODUCTS { int product_id string name float price string category } ORDER_ITEMS { int order_id int product_id int quantity float price }

Types of JOINs

Visual Representation of JOINs

INNER INNER JOIN LEFT LEFT JOIN RIGHT RIGHT JOIN FULL FULL JOIN

JOIN Examples


-- INNER JOIN: Find all orders with user information
SELECT o.order_id, o.order_date, u.username, u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id;

-- LEFT JOIN: Find all users and their orders (including users with no orders)
SELECT u.username, u.email, o.order_id, o.order_date
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;

-- Complex JOIN: Find detailed order information including products
SELECT 
    o.order_id, 
    o.order_date, 
    u.username AS customer, 
    p.name AS product_name, 
    oi.quantity, 
    oi.price AS unit_price, 
    (oi.quantity * oi.price) AS subtotal
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
ORDER BY o.order_date DESC, o.order_id;

-- Finding users who have never placed an order
SELECT u.username, u.email
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;
            

Real-world analogy: JOINs are like connecting the dots between related information in separate files. Imagine you have a spreadsheet of customer details and another of orders. A JOIN is like creating a new spreadsheet that combines information from both sources, showing which customers placed which orders.

Practical Activities

Activity 1: Database Schema Design

Design a database schema for a simple blog platform with the following requirements:

  • Users can create accounts, write posts, and comment on posts
  • Posts can have categories and tags
  • The system should track when posts and comments are created and updated

Create SQL statements to implement your schema, including tables for users, posts, comments, categories, and tags, with appropriate relationships.

Activity 2: SQL Query Exercises

Assuming a simple e-commerce database with tables for users, products, and orders, write SQL queries to:

  1. Find the 10 most expensive products
  2. Calculate the total revenue for each product category
  3. Identify the top 5 customers by total spending
  4. Find products that haven't been ordered in the last 30 days
  5. Calculate the average time between a user's registration and their first order

Activity 3: Data Manipulation Practice

For the blog database designed in Activity 1:

  1. Write INSERT statements to add sample data to your tables
  2. Update a user's email address and username
  3. Delete all comments older than 1 year from a specific user
  4. Update all posts in a specific category to add a tag

Common SQL Gotchas and Best Practices

Potential Pitfalls

Best Practices

Further Reading and Resources

Coming Up: Database Design Principles

In our next session, we'll dive deeper into database design principles, including normalization, entity-relationship modeling, and optimizing database structure for performance and maintainability.