Dockerfile Syntax

Introduction to Dockerfiles

In our previous lectures, we explored container concepts, Docker's architecture, and basic Docker commands. Now, we'll focus on perhaps the most important aspect of Docker for developers: creating custom Docker images with Dockerfiles.

A Dockerfile is essentially a text file containing a series of instructions that Docker uses to build an image. It's like a recipe that tells Docker exactly how to prepare and package your application. Just as a cake recipe specifies ingredients, quantities, and baking instructions, a Dockerfile specifies base images, files to include, commands to run, and configuration settings.

graph TD A[Dockerfile] -->|docker build| B[Docker Image] B -->|docker run| C[Docker Container] style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px

Understanding Dockerfile syntax is crucial for several reasons:

It allows you to create custom images tailored to your application's specific needs
It enables reproducible builds, ensuring consistent deployment across environments
It provides a documented, version-controlled way to describe your application's runtime environment
It facilitates optimization for reduced image size, improved security, and better performance

Dockerfile Basics

Anatomy of a Dockerfile

A Dockerfile consists of a series of instructions, each creating a new layer in the image. Here's a simple example:


FROM node:14-alpine

WORKDIR /app

COPY package*.json ./

RUN npm install

COPY . .

EXPOSE 3000

CMD ["npm", "start"]

This Dockerfile creates an image for a Node.js application. Let's break down what each instruction does:

FROM: Specifies the base image (Node.js with Alpine Linux)
WORKDIR: Sets the working directory inside the container
COPY: Copies files from the host to the container
RUN: Executes commands during the build process
EXPOSE: Documents which ports the container listens on
CMD: Specifies the command to run when the container starts

Think of each instruction as a step in your recipe. Just as a recipe might say "preheat oven to 350°F" before "mix ingredients," your Dockerfile instructions follow a logical sequence for preparing your application environment.

graph TD subgraph "Docker Image Layers" A[Base Image: node:14-alpine] -->|Layer 1| B[Set WORKDIR] B -->|Layer 2| C[Copy package files] C -->|Layer 3| D[Run npm install] D -->|Layer 4| E[Copy application code] E -->|Layer 5| F[Set metadata: EXPOSE] F -->|Layer 6| G[Set default command: CMD] end style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px style D fill:#eaeded,stroke:#333,stroke-width:2px style E fill:#fdebd0,stroke:#333,stroke-width:2px style F fill:#d6eaf8,stroke:#333,stroke-width:2px style G fill:#d4efdf,stroke:#333,stroke-width:2px

Building an Image from a Dockerfile

Once you've created a Dockerfile, you build an image from it using the docker build command:


# Basic build command
docker build -t myapp:1.0 .

# Build with a specific Dockerfile
docker build -f Dockerfile.prod -t myapp:prod .

# Build with build arguments
docker build --build-arg VERSION=1.0 -t myapp:1.0 .

The -t flag tags the image with a name and version, and the dot (".") at the end specifies the build context - the directory containing the Dockerfile and any files referenced in it.

Dockerfile Instructions in Detail

FROM

Every Dockerfile must start with a FROM instruction, which specifies the base image to build upon. This is like choosing the foundation for your house.


# Using the latest Node.js image
FROM node:latest

# Using a specific version
FROM node:14.17.0

# Using a minimal Alpine Linux variant
FROM node:14-alpine

# Using scratch (empty image)
FROM scratch

Best practices for FROM:

Use specific versions instead of latest for reproducible builds
Choose lightweight base images (like Alpine) when possible to reduce image size
Consider security-focused base images for production applications
Use multi-stage builds to separate build and runtime environments (more on this later)

WORKDIR

WORKDIR sets the working directory for any subsequent instructions. It's like setting up your workspace before starting a project.


# Set working directory
WORKDIR /app

# Subsequent instructions will run in /app
RUN echo "Hello" > hello.txt
# Creates /app/hello.txt

Best practices for WORKDIR:

Always use absolute paths
Create a dedicated directory for your application instead of using system directories
Use WORKDIR instead of frequent cd commands
Keep paths consistent throughout the Dockerfile for clarity

COPY and ADD

COPY and ADD both transfer files from the build context to the image, but have subtle differences.


# Copy a file
COPY package.json .

# Copy a directory
COPY src/ ./src/

# Copy with pattern matching
COPY *.js .

# Copy and set permissions
COPY --chown=node:node . .

# ADD can extract archives and fetch remote URLs
ADD http://example.com/file.tar.gz /tmp/

Best practices for COPY and ADD:

Prefer COPY for simple file copying (most cases)
Use ADD only when you need automatic extraction or remote URL support
Copy only what's needed, not entire directories that include unnecessary files
Use a .dockerignore file to exclude files from the build context
Copy dependency files first, then source code, to optimize caching

RUN

RUN executes commands during the build process. Each RUN instruction creates a new layer in the image.


# Basic usage
RUN npm install

# Multiple commands with shell syntax
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

# Using a different shell
RUN ["/bin/bash", "-c", "echo hello > /tmp/hello.txt"]

Best practices for RUN:

Combine related commands in a single RUN instruction to reduce layers
Clean up temporary files in the same RUN instruction where they're created
Use version pinning for packages to ensure reproducible builds
Consider using --no-install-recommends with apt-get to minimize installed packages
Sort multi-line arguments alphabetically for better readability and avoid duplicates

ENV

ENV sets environment variables in the image, which persist when a container runs.


# Set a single environment variable
ENV NODE_ENV production

# Set multiple variables
ENV PORT=3000 \
    DEBUG=false \
    LOG_LEVEL=info

# Using variables in other instructions
ENV APP_HOME /app
WORKDIR $APP_HOME

Best practices for ENV:

Use environment variables for configuration that might change between environments
Group related environment variables in a single ENV instruction
Provide sensible default values that work for most cases
Use environment variables consistently throughout the Dockerfile
Don't store secrets in ENV instructions (they're visible in the image history)

EXPOSE

EXPOSE informs Docker that the container listens on specified network ports at runtime.


# Expose a single port
EXPOSE 3000

# Expose multiple ports
EXPOSE 80 443

# Expose UDP ports
EXPOSE 53/udp

# Expose both TCP and UDP
EXPOSE 53/tcp 53/udp

Best practices for EXPOSE:

Document all ports your application uses
EXPOSE is documentation only - you still need -p or -P when running containers
Use standard ports for standard services when possible
Include protocol (TCP/UDP) when exposing non-TCP ports

CMD and ENTRYPOINT

CMD and ENTRYPOINT specify what command to run when the container starts, but they work differently and can be used together.


# CMD examples
CMD ["npm", "start"]
CMD ["node", "server.js"]
CMD echo "Hello World"

# ENTRYPOINT examples
ENTRYPOINT ["node", "server.js"]
ENTRYPOINT ["docker-entrypoint.sh"]

# ENTRYPOINT with CMD as default arguments
ENTRYPOINT ["npm"]
CMD ["start"]

Best practices for CMD and ENTRYPOINT:

Prefer exec form ["command", "arg1", "arg2"] over shell form command arg1 arg2
Use ENTRYPOINT for the main executable and CMD for default arguments
Only include one CMD or ENTRYPOINT instruction (the last one takes effect)
Consider creating an entrypoint script for complex startup logic
Make your containers behave like executables when appropriate

graph TD A[Container Start] --> B{Has ENTRYPOINT?} B -->|Yes| C[Use ENTRYPOINT as command] B -->|No| D[Use CMD as command] C --> E{Has CMD?} E -->|Yes| F[Add CMD as arguments to ENTRYPOINT] E -->|No| G[Run ENTRYPOINT with no arguments] D --> H[Run CMD] style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px style D fill:#ebdef0,stroke:#333,stroke-width:2px style E fill:#eaeded,stroke:#333,stroke-width:2px style F fill:#fdebd0,stroke:#333,stroke-width:2px style G fill:#fdebd0,stroke:#333,stroke-width:2px style H fill:#fdebd0,stroke:#333,stroke-width:2px

Advanced Dockerfile Features

ARG

ARG defines variables that users can pass at build time with --build-arg. This allows for parameterized builds.


# Define a build argument with a default value
ARG VERSION=latest

# Use the argument
FROM node:${VERSION}

# Define arguments after FROM (limited scope)
ARG NODE_ENV
ENV NODE_ENV=${NODE_ENV:-production}

To use the argument when building:


docker build --build-arg VERSION=14 -t myapp:14 .

Best practices for ARG:

Provide sensible default values for optional arguments
Use ARG for build-time variability, ENV for runtime configuration
Don't use ARG for secrets (they're visible in the image history)
Be aware that ARG values before the FROM instruction are only available during the processing of the FROM instruction

VOLUME

VOLUME creates a mount point for persistent or shared data.


# Create a volume at /data
VOLUME /data

# Create multiple volumes
VOLUME ["/data", "/logs", "/config"]

Best practices for VOLUME:

Use volumes for persistent data that shouldn't be included in the image
Document in comments what the volume is for
Don't create volume mount points in locations that could conflict with the base image
Remember that data written to a volume before its declaration isn't included in the volume

USER

USER sets the user (and optionally the group) to use when running the image.


# Set user by name
USER nginx

# Set user by ID
USER 1000

# Set user and group
USER node:node

Best practices for USER:

Don't run containers as root when possible
Create users and groups before referencing them
Use numeric IDs for consistency across systems
Consider permissions when changing users
Switch to a non-root user as soon as possible

HEALTHCHECK

HEALTHCHECK tells Docker how to test if the container is still functioning properly.


# Simple HTTP check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

# Disable health check from base image
HEALTHCHECK NONE

Best practices for HEALTHCHECK:

Keep health checks lightweight to avoid resource contention
Set appropriate intervals and timeouts for your application
Make health checks verify actual functionality, not just that a process is running
Include reasonable retries to avoid transient failures
Return non-zero exit codes to indicate unhealthy state

SHELL

SHELL changes the default shell used for the shell form of commands.


# Change to PowerShell on Windows
SHELL ["powershell", "-command"]

# Now shell form uses PowerShell
RUN Get-ChildItem C:\

# Change back to cmd
SHELL ["cmd", "/S", "/C"]

Best practices for SHELL:

Use the exec form of RUN, CMD, and ENTRYPOINT when possible to avoid shell quirks
Be aware of the differences between shells when switching
Document why you're changing the default shell

Multi-stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

This is like having a workshop for building components and then only taking the finished components to your assembly area, leaving all the tools, scraps, and intermediate pieces behind.

Basic Multi-stage Build


# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Runtime stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

In this example, we use Node.js to build a frontend application in the first stage, then copy only the build artifacts to an Nginx image in the second stage. The final image only contains Nginx and the built application, not Node.js or any of the build tools.

Advanced Multi-stage Techniques


# Base stage with common dependencies
FROM node:14-alpine AS base
WORKDIR /app
COPY package*.json ./
RUN npm config set registry https://registry.npmjs.org/
RUN npm install

# Development stage
FROM base AS development
ENV NODE_ENV=development
CMD ["npm", "run", "dev"]

# Build stage
FROM base AS build
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine AS production
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

This more advanced example shows:

A common base stage with shared dependencies
A development stage for local development
A build stage for creating production artifacts
A production stage with only what's needed to run the application

You can choose which stage to build with the --target flag:


# Build the development image
docker build --target development -t myapp:dev .

# Build the production image
docker build --target production -t myapp:prod .

Best practices for multi-stage builds:

Use multi-stage builds to create minimal production images
Name your stages with AS for clarity
Structure stages from most-changing to least-changing for optimal caching
Consider separate Dockerfiles for very different environments if multi-stage becomes too complex
Use build arguments to control which stages to include or skip

Optimizing Dockerfiles

Leveraging Docker's Build Cache

Docker caches individual layers to speed up subsequent builds. Understanding and optimizing for this caching mechanism can dramatically reduce build times.

The build cache works like this:

Docker checks if it can reuse a cached layer for each instruction
For COPY and ADD instructions, the contents of the files are examined
For other instructions, the instruction string itself is compared
Once a cache miss occurs, all subsequent layers are built from scratch


# Bad caching (entire project rebuilt when any file changes)
FROM node:14
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]

# Better caching (dependencies cached unless package.json changes)
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]

Minimizing Image Size

Smaller images download faster, start faster, and have a smaller attack surface for security vulnerabilities.


# Size-inefficient Dockerfile
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y curl
COPY . /app
CMD ["python3", "/app/app.py"]

# Size-optimized Dockerfile
FROM python:3.9-alpine
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Techniques for minimizing image size:

Use smaller base images (Alpine, Slim, Distroless)
Combine related RUN instructions to reduce layers
Clean up package caches and temporary files in the same RUN instruction
Use multi-stage builds to exclude build tools from the final image
Include only necessary files with careful COPY instructions and .dockerignore

Using .dockerignore

A .dockerignore file specifies which files and directories should be excluded from the build context, similar to .gitignore for Git.


# Example .dockerignore file
node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.github
.gitignore
README.md
docker-compose.yml
coverage
.env
dist
build
*.log

Benefits of using .dockerignore:

Reduces build context size for faster uploads to the Docker daemon
Prevents sensitive files from being inadvertently included in the image
Prevents local development files from interfering with the build
Ensures cleaner, more deterministic builds

Practical Dockerfile Examples

Node.js Application


FROM node:16-alpine AS base
WORKDIR /app
ENV NODE_ENV=production

# Dependencies stage
FROM base AS dependencies
COPY package*.json ./
RUN npm ci

# Build stage
FROM dependencies AS build
COPY . .
RUN npm run build

# Production stage
FROM base AS production
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json .

USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

Python Web Application


FROM python:3.9-slim AS base
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

# Dependencies stage
FROM base AS dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM dependencies AS production
COPY . .

RUN useradd -m appuser
USER appuser

EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Java Spring Boot Application


FROM maven:3.8.4-openjdk-17-slim AS build
WORKDIR /app
COPY pom.xml .
# Download dependencies separately for better caching
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# Runtime stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar

EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Go Application


FROM golang:1.18-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server .

# Final stage
FROM alpine:3.15
RUN apk --no-cache add ca-certificates
WORKDIR /root/

COPY --from=build /app/server .
COPY --from=build /app/config ./config

EXPOSE 8080
CMD ["./server"]

Best Practices and Common Patterns

Security Best Practices

Use specific versions of base images to avoid unexpected changes
Run containers as non-root users with USER
Remove unnecessary tools and packages
Scan images for vulnerabilities using tools like Trivy, Clair, or Snyk
Don't include secrets in your images
Keep base images updated to get security patches
Use multi-stage builds to reduce attack surface

Development vs. Production Dockerfiles

Development and production environments often have different requirements:

graph TD subgraph "Development" A1[Include Development Tools] B1[Mount Source Code] C1[Hot Reloading] D1[Debugging Support] end subgraph "Production" A2[Minimal Dependencies] B2[Pre-built Artifacts] C2[Security Hardening] D2[Performance Optimization] end style A1 fill:#f9d5e5,stroke:#333,stroke-width:2px style B1 fill:#f9d5e5,stroke:#333,stroke-width:2px style C1 fill:#f9d5e5,stroke:#333,stroke-width:2px style D1 fill:#f9d5e5,stroke:#333,stroke-width:2px style A2 fill:#d5f5e3,stroke:#333,stroke-width:2px style B2 fill:#d5f5e3,stroke:#333,stroke-width:2px style C2 fill:#d5f5e3,stroke:#333,stroke-width:2px style D2 fill:#d5f5e3,stroke:#333,stroke-width:2px

Approaches to handle these differences:

Use multi-stage builds with different targets
Maintain separate Dockerfiles (e.g., Dockerfile.dev and Dockerfile.prod)
Use build arguments to control behavior
Override CMD and mount volumes for development

Debugging Dockerfile Issues

When things go wrong during a build, these techniques can help:

Use docker build --progress=plain for more verbose output
Add intermediate RUN commands to debug the state (e.g., RUN ls -la)

Build up to a failing instruction and run a container interactively:


# If step 5 fails
docker build --target step4 -t debug-image .
docker run -it debug-image sh
# Then manually run the failing command

Check for common issues:
- File permissions
- Path errors
- Network access during build
- Case sensitivity
- Platform-specific commands

Dockerfile Linting and Best Practices Tools

Several tools can help ensure your Dockerfiles follow best practices:

hadolint

A popular Dockerfile linter that checks for common mistakes and best practices:


# Install hadolint
brew install hadolint  # macOS
docker pull hadolint/hadolint  # Docker

# Run on a Dockerfile
hadolint Dockerfile

# Using Docker
docker run --rm -i hadolint/hadolint < Dockerfile

Hadolint checks for issues like:

Using latest tags
Missing version in package installations
Using ADD instead of COPY
Not cleaning up package caches
Running apt-get upgrade in containers

Docker Scout

Docker's integrated tool for scanning images for vulnerabilities:


# Analyze an image
docker scout cves myimage:latest

# Compare images
docker scout compare myimage:1.0 myimage:2.0

Dive

A tool for exploring layers in Docker images:


# Install dive
brew install dive  # macOS
docker pull wagoodman/dive  # Docker

# Analyze an image
dive myimage:latest

# Using Docker
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive myimage:latest

Dive helps you:

Visualize what files are in each layer
Identify large or redundant files
Find inefficiencies in your image structure
See the commands that created each layer

Conclusion

Dockerfiles are the blueprint for your container images. By understanding the syntax, best practices, and optimization techniques, you can create images that are:

Smaller and more efficient
More secure
Faster to build
More maintainable
Consistent across environments

Remember that a well-crafted Dockerfile is an investment that pays dividends throughout your application's lifecycle. Take the time to optimize your Dockerfiles using the techniques we've covered, and your containerized applications will be more reliable, secure, and efficient.

In our next lecture, we'll explore Docker Compose, which allows you to define and run multi-container Docker applications.

mindmap root((Dockerfile)) Basic Instructions FROM WORKDIR COPY/ADD RUN ENV EXPOSE CMD/ENTRYPOINT Advanced Features Multi-stage builds ARG USER VOLUME HEALTHCHECK Optimization Caching Image Size Security .dockerignore Best Practices Layer Management Non-root User Version Pinning Readability

Practice Activities

Activity 1: Create a Basic Dockerfile

Create a Dockerfile for a simple HTML website:

Create an index.html file with some basic content
Write a Dockerfile that uses the nginx:alpine base image
Copy your HTML file to the correct location in the Nginx container
Build and run your container, mapping port 8080 on your host to port 80 in the container
Visit http://localhost:8080 to see your website

Activity 2: Optimize a Node.js Dockerfile

Optimize this inefficient Dockerfile for a Node.js application:


FROM node:latest
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 3000
CMD node server.js

Improve it by:

Using a specific version of Node.js with a smaller base image
Optimizing for the build cache
Creating a .dockerignore file
Using a non-root user
Implementing a multi-stage build if appropriate

Activity 3: Multi-stage Build for a React Application

Create a multi-stage Dockerfile for a React application:

Use create-react-app to generate a new React application
Create a multi-stage Dockerfile that:
- Uses Node.js to build the application
- Uses Nginx to serve the production build
- Results in a minimal final image
Build and run the container
Use docker history to examine the layers and sizes

Activity 4: Environment-specific Dockerfiles

Create development and production Dockerfiles for the same application:

Create a simple Express.js application
Create Dockerfile.dev that:
- Includes development dependencies
- Enables hot-reloading
- Mounts your source code as a volume
Create Dockerfile.prod that:
- Only includes production dependencies
- Builds the application for production
- Uses a non-root user
- Implements best security practices
Build and run both versions, noting the differences

Challenge: Create a Comprehensive Dockerfile

Create a Dockerfile for a real-world application that implements all the best practices we've discussed:

Choose a real application you're familiar with (or use an open-source project)
Create a Dockerfile that:
- Uses multi-stage builds for minimal image size
- Optimizes for build cache performance
- Implements proper security practices
- Includes appropriate health checks
- Handles environment variables correctly
- Uses proper base images and version pinning
Use hadolint to verify your Dockerfile follows best practices
Use dive to analyze your built image and look for optimization opportunities
Document your Dockerfile with comments explaining key decisions