Dockerfile Syntax

Week 11: Docker and Containerization - Tuesday Lecture 2

Introduction to Dockerfiles

In our previous lectures, we explored container concepts, Docker's architecture, and basic Docker commands. Now, we'll focus on perhaps the most important aspect of Docker for developers: creating custom Docker images with Dockerfiles.

A Dockerfile is essentially a text file containing a series of instructions that Docker uses to build an image. It's like a recipe that tells Docker exactly how to prepare and package your application. Just as a cake recipe specifies ingredients, quantities, and baking instructions, a Dockerfile specifies base images, files to include, commands to run, and configuration settings.

graph TD A[Dockerfile] -->|docker build| B[Docker Image] B -->|docker run| C[Docker Container] style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px

Understanding Dockerfile syntax is crucial for several reasons:

Dockerfile Basics

Anatomy of a Dockerfile

A Dockerfile consists of a series of instructions, each creating a new layer in the image. Here's a simple example:


FROM node:14-alpine

WORKDIR /app

COPY package*.json ./

RUN npm install

COPY . .

EXPOSE 3000

CMD ["npm", "start"]
            

This Dockerfile creates an image for a Node.js application. Let's break down what each instruction does:

Think of each instruction as a step in your recipe. Just as a recipe might say "preheat oven to 350°F" before "mix ingredients," your Dockerfile instructions follow a logical sequence for preparing your application environment.

graph TD subgraph "Docker Image Layers" A[Base Image: node:14-alpine] -->|Layer 1| B[Set WORKDIR] B -->|Layer 2| C[Copy package files] C -->|Layer 3| D[Run npm install] D -->|Layer 4| E[Copy application code] E -->|Layer 5| F[Set metadata: EXPOSE] F -->|Layer 6| G[Set default command: CMD] end style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px style D fill:#eaeded,stroke:#333,stroke-width:2px style E fill:#fdebd0,stroke:#333,stroke-width:2px style F fill:#d6eaf8,stroke:#333,stroke-width:2px style G fill:#d4efdf,stroke:#333,stroke-width:2px

Building an Image from a Dockerfile

Once you've created a Dockerfile, you build an image from it using the docker build command:


# Basic build command
docker build -t myapp:1.0 .

# Build with a specific Dockerfile
docker build -f Dockerfile.prod -t myapp:prod .

# Build with build arguments
docker build --build-arg VERSION=1.0 -t myapp:1.0 .
            

The -t flag tags the image with a name and version, and the dot (".") at the end specifies the build context - the directory containing the Dockerfile and any files referenced in it.

Dockerfile Instructions in Detail

FROM

Every Dockerfile must start with a FROM instruction, which specifies the base image to build upon. This is like choosing the foundation for your house.


# Using the latest Node.js image
FROM node:latest

# Using a specific version
FROM node:14.17.0

# Using a minimal Alpine Linux variant
FROM node:14-alpine

# Using scratch (empty image)
FROM scratch
            

Best practices for FROM:

WORKDIR

WORKDIR sets the working directory for any subsequent instructions. It's like setting up your workspace before starting a project.


# Set working directory
WORKDIR /app

# Subsequent instructions will run in /app
RUN echo "Hello" > hello.txt
# Creates /app/hello.txt
            

Best practices for WORKDIR:

COPY and ADD

COPY and ADD both transfer files from the build context to the image, but have subtle differences.


# Copy a file
COPY package.json .

# Copy a directory
COPY src/ ./src/

# Copy with pattern matching
COPY *.js .

# Copy and set permissions
COPY --chown=node:node . .

# ADD can extract archives and fetch remote URLs
ADD http://example.com/file.tar.gz /tmp/
            

Best practices for COPY and ADD:

RUN

RUN executes commands during the build process. Each RUN instruction creates a new layer in the image.


# Basic usage
RUN npm install

# Multiple commands with shell syntax
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*

# Using a different shell
RUN ["/bin/bash", "-c", "echo hello > /tmp/hello.txt"]
            

Best practices for RUN:

ENV

ENV sets environment variables in the image, which persist when a container runs.


# Set a single environment variable
ENV NODE_ENV production

# Set multiple variables
ENV PORT=3000 \
    DEBUG=false \
    LOG_LEVEL=info

# Using variables in other instructions
ENV APP_HOME /app
WORKDIR $APP_HOME
            

Best practices for ENV:

EXPOSE

EXPOSE informs Docker that the container listens on specified network ports at runtime.


# Expose a single port
EXPOSE 3000

# Expose multiple ports
EXPOSE 80 443

# Expose UDP ports
EXPOSE 53/udp

# Expose both TCP and UDP
EXPOSE 53/tcp 53/udp
            

Best practices for EXPOSE:

CMD and ENTRYPOINT

CMD and ENTRYPOINT specify what command to run when the container starts, but they work differently and can be used together.


# CMD examples
CMD ["npm", "start"]
CMD ["node", "server.js"]
CMD echo "Hello World"

# ENTRYPOINT examples
ENTRYPOINT ["node", "server.js"]
ENTRYPOINT ["docker-entrypoint.sh"]

# ENTRYPOINT with CMD as default arguments
ENTRYPOINT ["npm"]
CMD ["start"]
            

Best practices for CMD and ENTRYPOINT:

graph TD A[Container Start] --> B{Has ENTRYPOINT?} B -->|Yes| C[Use ENTRYPOINT as command] B -->|No| D[Use CMD as command] C --> E{Has CMD?} E -->|Yes| F[Add CMD as arguments to ENTRYPOINT] E -->|No| G[Run ENTRYPOINT with no arguments] D --> H[Run CMD] style A fill:#f9d5e5,stroke:#333,stroke-width:2px style B fill:#d5f5e3,stroke:#333,stroke-width:2px style C fill:#ebdef0,stroke:#333,stroke-width:2px style D fill:#ebdef0,stroke:#333,stroke-width:2px style E fill:#eaeded,stroke:#333,stroke-width:2px style F fill:#fdebd0,stroke:#333,stroke-width:2px style G fill:#fdebd0,stroke:#333,stroke-width:2px style H fill:#fdebd0,stroke:#333,stroke-width:2px

Advanced Dockerfile Features

ARG

ARG defines variables that users can pass at build time with --build-arg. This allows for parameterized builds.


# Define a build argument with a default value
ARG VERSION=latest

# Use the argument
FROM node:${VERSION}

# Define arguments after FROM (limited scope)
ARG NODE_ENV
ENV NODE_ENV=${NODE_ENV:-production}
            

To use the argument when building:


docker build --build-arg VERSION=14 -t myapp:14 .
            

Best practices for ARG:

VOLUME

VOLUME creates a mount point for persistent or shared data.


# Create a volume at /data
VOLUME /data

# Create multiple volumes
VOLUME ["/data", "/logs", "/config"]
            

Best practices for VOLUME:

USER

USER sets the user (and optionally the group) to use when running the image.


# Set user by name
USER nginx

# Set user by ID
USER 1000

# Set user and group
USER node:node
            

Best practices for USER:

HEALTHCHECK

HEALTHCHECK tells Docker how to test if the container is still functioning properly.


# Simple HTTP check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

# Disable health check from base image
HEALTHCHECK NONE
            

Best practices for HEALTHCHECK:

SHELL

SHELL changes the default shell used for the shell form of commands.


# Change to PowerShell on Windows
SHELL ["powershell", "-command"]

# Now shell form uses PowerShell
RUN Get-ChildItem C:\

# Change back to cmd
SHELL ["cmd", "/S", "/C"]
            

Best practices for SHELL:

Multi-stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction starts a new build stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

This is like having a workshop for building components and then only taking the finished components to your assembly area, leaving all the tools, scraps, and intermediate pieces behind.

Basic Multi-stage Build


# Build stage
FROM node:14 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Runtime stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
            
graph TD subgraph "Build Stage" A[Node.js Base] -->|Layer 1| B[Install Dependencies] B -->|Layer 2| C[Copy Source Code] C -->|Layer 3| D[Build Application] end subgraph "Runtime Stage" E[Nginx Base] -->|Layer 1| F[Copy Build Artifacts] F -->|Layer 2| G[Configure Server] end D -.->|Copy Only Artifacts| F style A fill:#f9d5e5,stroke:#333,stroke-width:2px style E fill:#d5f5e3,stroke:#333,stroke-width:2px style D fill:#f9d5e5,stroke:#333,stroke-width:2px style G fill:#d5f5e3,stroke:#333,stroke-width:2px

In this example, we use Node.js to build a frontend application in the first stage, then copy only the build artifacts to an Nginx image in the second stage. The final image only contains Nginx and the built application, not Node.js or any of the build tools.

Advanced Multi-stage Techniques


# Base stage with common dependencies
FROM node:14-alpine AS base
WORKDIR /app
COPY package*.json ./
RUN npm config set registry https://registry.npmjs.org/
RUN npm install

# Development stage
FROM base AS development
ENV NODE_ENV=development
CMD ["npm", "run", "dev"]

# Build stage
FROM base AS build
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine AS production
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
            

This more advanced example shows:

You can choose which stage to build with the --target flag:


# Build the development image
docker build --target development -t myapp:dev .

# Build the production image
docker build --target production -t myapp:prod .
            

Best practices for multi-stage builds:

Optimizing Dockerfiles

Leveraging Docker's Build Cache

Docker caches individual layers to speed up subsequent builds. Understanding and optimizing for this caching mechanism can dramatically reduce build times.

The build cache works like this:

  1. Docker checks if it can reuse a cached layer for each instruction
  2. For COPY and ADD instructions, the contents of the files are examined
  3. For other instructions, the instruction string itself is compared
  4. Once a cache miss occurs, all subsequent layers are built from scratch

# Bad caching (entire project rebuilt when any file changes)
FROM node:14
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]

# Better caching (dependencies cached unless package.json changes)
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]
            

Minimizing Image Size

Smaller images download faster, start faster, and have a smaller attack surface for security vulnerabilities.


# Size-inefficient Dockerfile
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y curl
COPY . /app
CMD ["python3", "/app/app.py"]

# Size-optimized Dockerfile
FROM python:3.9-alpine
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]
            

Techniques for minimizing image size:

Using .dockerignore

A .dockerignore file specifies which files and directories should be excluded from the build context, similar to .gitignore for Git.


# Example .dockerignore file
node_modules
npm-debug.log
Dockerfile
.dockerignore
.git
.github
.gitignore
README.md
docker-compose.yml
coverage
.env
dist
build
*.log
            

Benefits of using .dockerignore:

Practical Dockerfile Examples

Node.js Application


FROM node:16-alpine AS base
WORKDIR /app
ENV NODE_ENV=production

# Dependencies stage
FROM base AS dependencies
COPY package*.json ./
RUN npm ci

# Build stage
FROM dependencies AS build
COPY . .
RUN npm run build

# Production stage
FROM base AS production
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json .

USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]
            

Python Web Application


FROM python:3.9-slim AS base
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

# Dependencies stage
FROM base AS dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM dependencies AS production
COPY . .

RUN useradd -m appuser
USER appuser

EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
            

Java Spring Boot Application


FROM maven:3.8.4-openjdk-17-slim AS build
WORKDIR /app
COPY pom.xml .
# Download dependencies separately for better caching
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# Runtime stage
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar

EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
            

Go Application


FROM golang:1.18-alpine AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server .

# Final stage
FROM alpine:3.15
RUN apk --no-cache add ca-certificates
WORKDIR /root/

COPY --from=build /app/server .
COPY --from=build /app/config ./config

EXPOSE 8080
CMD ["./server"]
            

Best Practices and Common Patterns

Security Best Practices

Development vs. Production Dockerfiles

Development and production environments often have different requirements:

graph TD subgraph "Development" A1[Include Development Tools] B1[Mount Source Code] C1[Hot Reloading] D1[Debugging Support] end subgraph "Production" A2[Minimal Dependencies] B2[Pre-built Artifacts] C2[Security Hardening] D2[Performance Optimization] end style A1 fill:#f9d5e5,stroke:#333,stroke-width:2px style B1 fill:#f9d5e5,stroke:#333,stroke-width:2px style C1 fill:#f9d5e5,stroke:#333,stroke-width:2px style D1 fill:#f9d5e5,stroke:#333,stroke-width:2px style A2 fill:#d5f5e3,stroke:#333,stroke-width:2px style B2 fill:#d5f5e3,stroke:#333,stroke-width:2px style C2 fill:#d5f5e3,stroke:#333,stroke-width:2px style D2 fill:#d5f5e3,stroke:#333,stroke-width:2px

Approaches to handle these differences:

Debugging Dockerfile Issues

When things go wrong during a build, these techniques can help:

Dockerfile Linting and Best Practices Tools

Several tools can help ensure your Dockerfiles follow best practices:

hadolint

A popular Dockerfile linter that checks for common mistakes and best practices:


# Install hadolint
brew install hadolint  # macOS
docker pull hadolint/hadolint  # Docker

# Run on a Dockerfile
hadolint Dockerfile

# Using Docker
docker run --rm -i hadolint/hadolint < Dockerfile
            

Hadolint checks for issues like:

Docker Scout

Docker's integrated tool for scanning images for vulnerabilities:


# Analyze an image
docker scout cves myimage:latest

# Compare images
docker scout compare myimage:1.0 myimage:2.0
            

Dive

A tool for exploring layers in Docker images:


# Install dive
brew install dive  # macOS
docker pull wagoodman/dive  # Docker

# Analyze an image
dive myimage:latest

# Using Docker
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive myimage:latest
            

Dive helps you:

Conclusion

Dockerfiles are the blueprint for your container images. By understanding the syntax, best practices, and optimization techniques, you can create images that are:

Remember that a well-crafted Dockerfile is an investment that pays dividends throughout your application's lifecycle. Take the time to optimize your Dockerfiles using the techniques we've covered, and your containerized applications will be more reliable, secure, and efficient.

In our next lecture, we'll explore Docker Compose, which allows you to define and run multi-container Docker applications.

mindmap root((Dockerfile)) Basic Instructions FROM WORKDIR COPY/ADD RUN ENV EXPOSE CMD/ENTRYPOINT Advanced Features Multi-stage builds ARG USER VOLUME HEALTHCHECK Optimization Caching Image Size Security .dockerignore Best Practices Layer Management Non-root User Version Pinning Readability

Practice Activities

Activity 1: Create a Basic Dockerfile

Create a Dockerfile for a simple HTML website:

  1. Create an index.html file with some basic content
  2. Write a Dockerfile that uses the nginx:alpine base image
  3. Copy your HTML file to the correct location in the Nginx container
  4. Build and run your container, mapping port 8080 on your host to port 80 in the container
  5. Visit http://localhost:8080 to see your website

Activity 2: Optimize a Node.js Dockerfile

Optimize this inefficient Dockerfile for a Node.js application:


FROM node:latest
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 3000
CMD node server.js
            

Improve it by:

  1. Using a specific version of Node.js with a smaller base image
  2. Optimizing for the build cache
  3. Creating a .dockerignore file
  4. Using a non-root user
  5. Implementing a multi-stage build if appropriate

Activity 3: Multi-stage Build for a React Application

Create a multi-stage Dockerfile for a React application:

  1. Use create-react-app to generate a new React application
  2. Create a multi-stage Dockerfile that:
    • Uses Node.js to build the application
    • Uses Nginx to serve the production build
    • Results in a minimal final image
  3. Build and run the container
  4. Use docker history to examine the layers and sizes

Activity 4: Environment-specific Dockerfiles

Create development and production Dockerfiles for the same application:

  1. Create a simple Express.js application
  2. Create Dockerfile.dev that:
    • Includes development dependencies
    • Enables hot-reloading
    • Mounts your source code as a volume
  3. Create Dockerfile.prod that:
    • Only includes production dependencies
    • Builds the application for production
    • Uses a non-root user
    • Implements best security practices
  4. Build and run both versions, noting the differences

Challenge: Create a Comprehensive Dockerfile

Create a Dockerfile for a real-world application that implements all the best practices we've discussed:

  1. Choose a real application you're familiar with (or use an open-source project)
  2. Create a Dockerfile that:
    • Uses multi-stage builds for minimal image size
    • Optimizes for build cache performance
    • Implements proper security practices
    • Includes appropriate health checks
    • Handles environment variables correctly
    • Uses proper base images and version pinning
  3. Use hadolint to verify your Dockerfile follows best practices
  4. Use dive to analyze your built image and look for optimization opportunities
  5. Document your Dockerfile with comments explaining key decisions

Additional Resources

Official Documentation

Tools

Articles and Tutorials

Books