Introduction to Dockerfiles
In our previous lectures, we explored container concepts, Docker's architecture, and basic Docker commands. Now, we'll focus on perhaps the most important aspect of Docker for developers: creating custom Docker images with Dockerfiles.
A Dockerfile is essentially a text file containing a series of instructions that Docker uses to build an image. It's like a recipe that tells Docker exactly how to prepare and package your application. Just as a cake recipe specifies ingredients, quantities, and baking instructions, a Dockerfile specifies base images, files to include, commands to run, and configuration settings.
Understanding Dockerfile syntax is crucial for several reasons:
- It allows you to create custom images tailored to your application's specific needs
- It enables reproducible builds, ensuring consistent deployment across environments
- It provides a documented, version-controlled way to describe your application's runtime environment
- It facilitates optimization for reduced image size, improved security, and better performance
Dockerfile Basics
Anatomy of a Dockerfile
A Dockerfile consists of a series of instructions, each creating a new layer in the image. Here's a simple example:
FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
This Dockerfile creates an image for a Node.js application. Let's break down what each instruction does:
FROM: Specifies the base image (Node.js with Alpine Linux)WORKDIR: Sets the working directory inside the containerCOPY: Copies files from the host to the containerRUN: Executes commands during the build processEXPOSE: Documents which ports the container listens onCMD: Specifies the command to run when the container starts
Think of each instruction as a step in your recipe. Just as a recipe might say "preheat oven to 350°F" before "mix ingredients," your Dockerfile instructions follow a logical sequence for preparing your application environment.
Building an Image from a Dockerfile
Once you've created a Dockerfile, you build an image from it using the docker build command:
# Basic build command
docker build -t myapp:1.0 .
# Build with a specific Dockerfile
docker build -f Dockerfile.prod -t myapp:prod .
# Build with build arguments
docker build --build-arg VERSION=1.0 -t myapp:1.0 .
The -t flag tags the image with a name and version, and the dot (".") at the end specifies the build context - the directory containing the Dockerfile and any files referenced in it.
Dockerfile Instructions in Detail
FROM
Every Dockerfile must start with a FROM instruction, which specifies the base image to build upon. This is like choosing the foundation for your house.
# Using the latest Node.js image
FROM node:latest
# Using a specific version
FROM node:14.17.0
# Using a minimal Alpine Linux variant
FROM node:14-alpine
# Using scratch (empty image)
FROM scratch
Best practices for FROM:
- Use specific versions instead of
latestfor reproducible builds - Choose lightweight base images (like Alpine) when possible to reduce image size
- Consider security-focused base images for production applications
- Use multi-stage builds to separate build and runtime environments (more on this later)
WORKDIR
WORKDIR sets the working directory for any subsequent instructions. It's like setting up your workspace before starting a project.
# Set working directory
WORKDIR /app
# Subsequent instructions will run in /app
RUN echo "Hello" > hello.txt
# Creates /app/hello.txt
Best practices for WORKDIR:
- Always use absolute paths
- Create a dedicated directory for your application instead of using system directories
- Use
WORKDIRinstead of frequentcdcommands - Keep paths consistent throughout the Dockerfile for clarity
COPY and ADD
COPY and ADD both transfer files from the build context to the image, but have subtle differences.
# Copy a file
COPY package.json .
# Copy a directory
COPY src/ ./src/
# Copy with pattern matching
COPY *.js .
# Copy and set permissions
COPY --chown=node:node . .
# ADD can extract archives and fetch remote URLs
ADD http://example.com/file.tar.gz /tmp/
Best practices for COPY and ADD:
- Prefer
COPYfor simple file copying (most cases) - Use
ADDonly when you need automatic extraction or remote URL support - Copy only what's needed, not entire directories that include unnecessary files
- Use a
.dockerignorefile to exclude files from the build context - Copy dependency files first, then source code, to optimize caching
RUN
RUN executes commands during the build process. Each RUN instruction creates a new layer in the image.
# Basic usage
RUN npm install
# Multiple commands with shell syntax
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/*
# Using a different shell
RUN ["/bin/bash", "-c", "echo hello > /tmp/hello.txt"]
Best practices for RUN:
- Combine related commands in a single
RUNinstruction to reduce layers - Clean up temporary files in the same
RUNinstruction where they're created - Use version pinning for packages to ensure reproducible builds
- Consider using
--no-install-recommendswith apt-get to minimize installed packages - Sort multi-line arguments alphabetically for better readability and avoid duplicates
ENV
ENV sets environment variables in the image, which persist when a container runs.
# Set a single environment variable
ENV NODE_ENV production
# Set multiple variables
ENV PORT=3000 \
DEBUG=false \
LOG_LEVEL=info
# Using variables in other instructions
ENV APP_HOME /app
WORKDIR $APP_HOME
Best practices for ENV:
- Use environment variables for configuration that might change between environments
- Group related environment variables in a single
ENVinstruction - Provide sensible default values that work for most cases
- Use environment variables consistently throughout the Dockerfile
- Don't store secrets in
ENVinstructions (they're visible in the image history)
EXPOSE
EXPOSE informs Docker that the container listens on specified network ports at runtime.
# Expose a single port
EXPOSE 3000
# Expose multiple ports
EXPOSE 80 443
# Expose UDP ports
EXPOSE 53/udp
# Expose both TCP and UDP
EXPOSE 53/tcp 53/udp
Best practices for EXPOSE:
- Document all ports your application uses
EXPOSEis documentation only - you still need-por-Pwhen running containers- Use standard ports for standard services when possible
- Include protocol (TCP/UDP) when exposing non-TCP ports
CMD and ENTRYPOINT
CMD and ENTRYPOINT specify what command to run when the container starts, but they work differently and can be used together.
# CMD examples
CMD ["npm", "start"]
CMD ["node", "server.js"]
CMD echo "Hello World"
# ENTRYPOINT examples
ENTRYPOINT ["node", "server.js"]
ENTRYPOINT ["docker-entrypoint.sh"]
# ENTRYPOINT with CMD as default arguments
ENTRYPOINT ["npm"]
CMD ["start"]
Best practices for CMD and ENTRYPOINT:
- Prefer exec form
["command", "arg1", "arg2"]over shell formcommand arg1 arg2 - Use
ENTRYPOINTfor the main executable andCMDfor default arguments - Only include one
CMDorENTRYPOINTinstruction (the last one takes effect) - Consider creating an entrypoint script for complex startup logic
- Make your containers behave like executables when appropriate
Advanced Dockerfile Features
ARG
ARG defines variables that users can pass at build time with --build-arg. This allows for parameterized builds.
# Define a build argument with a default value
ARG VERSION=latest
# Use the argument
FROM node:${VERSION}
# Define arguments after FROM (limited scope)
ARG NODE_ENV
ENV NODE_ENV=${NODE_ENV:-production}
To use the argument when building:
docker build --build-arg VERSION=14 -t myapp:14 .
Best practices for ARG:
- Provide sensible default values for optional arguments
- Use
ARGfor build-time variability,ENVfor runtime configuration - Don't use
ARGfor secrets (they're visible in the image history) - Be aware that
ARGvalues before theFROMinstruction are only available during the processing of theFROMinstruction
VOLUME
VOLUME creates a mount point for persistent or shared data.
# Create a volume at /data
VOLUME /data
# Create multiple volumes
VOLUME ["/data", "/logs", "/config"]
Best practices for VOLUME:
- Use volumes for persistent data that shouldn't be included in the image
- Document in comments what the volume is for
- Don't create volume mount points in locations that could conflict with the base image
- Remember that data written to a volume before its declaration isn't included in the volume
USER
USER sets the user (and optionally the group) to use when running the image.
# Set user by name
USER nginx
# Set user by ID
USER 1000
# Set user and group
USER node:node
Best practices for USER:
- Don't run containers as root when possible
- Create users and groups before referencing them
- Use numeric IDs for consistency across systems
- Consider permissions when changing users
- Switch to a non-root user as soon as possible
HEALTHCHECK
HEALTHCHECK tells Docker how to test if the container is still functioning properly.
# Simple HTTP check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost/ || exit 1
# Disable health check from base image
HEALTHCHECK NONE
Best practices for HEALTHCHECK:
- Keep health checks lightweight to avoid resource contention
- Set appropriate intervals and timeouts for your application
- Make health checks verify actual functionality, not just that a process is running
- Include reasonable retries to avoid transient failures
- Return non-zero exit codes to indicate unhealthy state
SHELL
SHELL changes the default shell used for the shell form of commands.
# Change to PowerShell on Windows
SHELL ["powershell", "-command"]
# Now shell form uses PowerShell
RUN Get-ChildItem C:\
# Change back to cmd
SHELL ["cmd", "/S", "/C"]
Best practices for SHELL:
- Use the exec form of
RUN,CMD, andENTRYPOINTwhen possible to avoid shell quirks - Be aware of the differences between shells when switching
- Document why you're changing the default shell
Multi-stage Builds
Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.
This is like having a workshop for building components and then only taking the finished components to your assembly area, leaving all the tools, scraps, and intermediate pieces behind.
Basic Multi-stage Build
# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Runtime stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
In this example, we use Node.js to build a frontend application in the first stage, then copy only the build artifacts to an Nginx image in the second stage. The final image only contains Nginx and the built application, not Node.js or any of the build tools.
Advanced Multi-stage Techniques
# Base stage with common dependencies
FROM node:14-alpine AS base
WORKDIR /app
COPY package*.json ./
RUN npm config set registry https://registry.npmjs.org/
RUN npm install
# Development stage
FROM base AS development
ENV NODE_ENV=development
CMD ["npm", "run", "dev"]
# Build stage
FROM base AS build
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine AS production
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
This more advanced example shows:
- A common base stage with shared dependencies
- A development stage for local development
- A build stage for creating production artifacts
- A production stage with only what's needed to run the application
You can choose which stage to build with the --target flag:
# Build the development image
docker build --target development -t myapp:dev .
# Build the production image
docker build --target production -t myapp:prod .
Best practices for multi-stage builds:
- Use multi-stage builds to create minimal production images
- Name your stages with
ASfor clarity - Structure stages from most-changing to least-changing for optimal caching
- Consider separate Dockerfiles for very different environments if multi-stage becomes too complex
- Use build arguments to control which stages to include or skip
Optimizing Dockerfiles
Leveraging Docker's Build Cache
Docker caches individual layers to speed up subsequent builds. Understanding and optimizing for this caching mechanism can dramatically reduce build times.
The build cache works like this:
- Docker checks if it can reuse a cached layer for each instruction
- For
COPYandADDinstructions, the contents of the files are examined - For other instructions, the instruction string itself is compared
- Once a cache miss occurs, all subsequent layers are built from scratch
# Bad caching (entire project rebuilt when any file changes)
FROM node:14
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]
# Better caching (dependencies cached unless package.json changes)
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]
Minimizing Image Size
Smaller images download faster, start faster, and have a smaller attack surface for security vulnerabilities.
# Size-inefficient Dockerfile
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y curl
COPY . /app
CMD ["python3", "/app/app.py"]
# Size-optimized Dockerfile
FROM python:3.9-alpine
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]
Techniques for minimizing image size:
- Use smaller base images (Alpine, Slim, Distroless)
- Combine related RUN instructions to reduce layers
- Clean up package caches and temporary files in the same RUN instruction
- Use multi-stage builds to exclude build tools from the final image
- Include only necessary files with careful COPY instructions and .dockerignore
Using .dockerignore
A .dockerignore file specifies which files and directories should be excluded from the build context, similar to .gitignore for Git.
# Example .dockerignore file
node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.github
.gitignore
README.md
docker-compose.yml
coverage
.env
dist
build
*.log
Benefits of using .dockerignore:
- Reduces build context size for faster uploads to the Docker daemon
- Prevents sensitive files from being inadvertently included in the image
- Prevents local development files from interfering with the build
- Ensures cleaner, more deterministic builds
Practical Dockerfile Examples
Node.js Application
FROM node:16-alpine AS base
WORKDIR /app
ENV NODE_ENV=production
# Dependencies stage
FROM base AS dependencies
COPY package*.json ./
RUN npm ci
# Build stage
FROM dependencies AS build
COPY . .
RUN npm run build
# Production stage
FROM base AS production
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json .
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]
Python Web Application
FROM python:3.9-slim AS base
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1
# Dependencies stage
FROM base AS dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Production stage
FROM dependencies AS production
COPY . .
RUN useradd -m appuser
USER appuser
EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
Java Spring Boot Application
FROM maven:3.8.4-openjdk-17-slim AS build
WORKDIR /app
COPY pom.xml .
# Download dependencies separately for better caching
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
# Runtime stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
Go Application
FROM golang:1.18-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server .
# Final stage
FROM alpine:3.15
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=build /app/server .
COPY --from=build /app/config ./config
EXPOSE 8080
CMD ["./server"]
Best Practices and Common Patterns
Security Best Practices
- Use specific versions of base images to avoid unexpected changes
- Run containers as non-root users with
USER - Remove unnecessary tools and packages
- Scan images for vulnerabilities using tools like Trivy, Clair, or Snyk
- Don't include secrets in your images
- Keep base images updated to get security patches
- Use multi-stage builds to reduce attack surface
Development vs. Production Dockerfiles
Development and production environments often have different requirements:
Approaches to handle these differences:
- Use multi-stage builds with different targets
- Maintain separate Dockerfiles (e.g.,
Dockerfile.devandDockerfile.prod) - Use build arguments to control behavior
- Override CMD and mount volumes for development
Debugging Dockerfile Issues
When things go wrong during a build, these techniques can help:
- Use
docker build --progress=plainfor more verbose output - Add intermediate
RUNcommands to debug the state (e.g.,RUN ls -la) - Build up to a failing instruction and run a container interactively:
# If step 5 fails docker build --target step4 -t debug-image . docker run -it debug-image sh # Then manually run the failing command - Check for common issues:
- File permissions
- Path errors
- Network access during build
- Case sensitivity
- Platform-specific commands
Dockerfile Linting and Best Practices Tools
Several tools can help ensure your Dockerfiles follow best practices:
hadolint
A popular Dockerfile linter that checks for common mistakes and best practices:
# Install hadolint
brew install hadolint # macOS
docker pull hadolint/hadolint # Docker
# Run on a Dockerfile
hadolint Dockerfile
# Using Docker
docker run --rm -i hadolint/hadolint < Dockerfile
Hadolint checks for issues like:
- Using
latesttags - Missing version in package installations
- Using
ADDinstead ofCOPY - Not cleaning up package caches
- Running
apt-get upgradein containers
Docker Scout
Docker's integrated tool for scanning images for vulnerabilities:
# Analyze an image
docker scout cves myimage:latest
# Compare images
docker scout compare myimage:1.0 myimage:2.0
Dive
A tool for exploring layers in Docker images:
# Install dive
brew install dive # macOS
docker pull wagoodman/dive # Docker
# Analyze an image
dive myimage:latest
# Using Docker
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive myimage:latest
Dive helps you:
- Visualize what files are in each layer
- Identify large or redundant files
- Find inefficiencies in your image structure
- See the commands that created each layer
Conclusion
Dockerfiles are the blueprint for your container images. By understanding the syntax, best practices, and optimization techniques, you can create images that are:
- Smaller and more efficient
- More secure
- Faster to build
- More maintainable
- Consistent across environments
Remember that a well-crafted Dockerfile is an investment that pays dividends throughout your application's lifecycle. Take the time to optimize your Dockerfiles using the techniques we've covered, and your containerized applications will be more reliable, secure, and efficient.
In our next lecture, we'll explore Docker Compose, which allows you to define and run multi-container Docker applications.
Practice Activities
Activity 1: Create a Basic Dockerfile
Create a Dockerfile for a simple HTML website:
- Create an index.html file with some basic content
- Write a Dockerfile that uses the nginx:alpine base image
- Copy your HTML file to the correct location in the Nginx container
- Build and run your container, mapping port 8080 on your host to port 80 in the container
- Visit http://localhost:8080 to see your website
Activity 2: Optimize a Node.js Dockerfile
Optimize this inefficient Dockerfile for a Node.js application:
FROM node:latest
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 3000
CMD node server.js
Improve it by:
- Using a specific version of Node.js with a smaller base image
- Optimizing for the build cache
- Creating a .dockerignore file
- Using a non-root user
- Implementing a multi-stage build if appropriate
Activity 3: Multi-stage Build for a React Application
Create a multi-stage Dockerfile for a React application:
- Use create-react-app to generate a new React application
- Create a multi-stage Dockerfile that:
- Uses Node.js to build the application
- Uses Nginx to serve the production build
- Results in a minimal final image
- Build and run the container
- Use
docker historyto examine the layers and sizes
Activity 4: Environment-specific Dockerfiles
Create development and production Dockerfiles for the same application:
- Create a simple Express.js application
- Create
Dockerfile.devthat:- Includes development dependencies
- Enables hot-reloading
- Mounts your source code as a volume
- Create
Dockerfile.prodthat:- Only includes production dependencies
- Builds the application for production
- Uses a non-root user
- Implements best security practices
- Build and run both versions, noting the differences
Challenge: Create a Comprehensive Dockerfile
Create a Dockerfile for a real-world application that implements all the best practices we've discussed:
- Choose a real application you're familiar with (or use an open-source project)
- Create a Dockerfile that:
- Uses multi-stage builds for minimal image size
- Optimizes for build cache performance
- Implements proper security practices
- Includes appropriate health checks
- Handles environment variables correctly
- Uses proper base images and version pinning
- Use hadolint to verify your Dockerfile follows best practices
- Use dive to analyze your built image and look for optimization opportunities
- Document your Dockerfile with comments explaining key decisions
Additional Resources
Official Documentation
Tools
Articles and Tutorials
- Intro Guide to Dockerfile Best Practices
- Docker Security Best Practices
- Advanced Multi-stage Build Patterns
- Dockerizing Python Applications
Books
- "Docker Deep Dive" by Nigel Poulton
- "Docker in Practice" by Ian Miell and Aidan Hobson Sayers
- "Container Security" by Liz Rice