Cache Invalidation Patterns

The Cache Invalidation Challenge

In our previous lectures, we explored how caching can dramatically improve application performance by storing frequently accessed data in fast-access storage like Redis. However, caching introduces a critical challenge: ensuring that cached data remains consistent with the source of truth.

Phil Karlton, a computer scientist at Netscape, famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." Today, we'll tackle the first of these challenges.

flowchart TD
    A[Application] --> B[Cache]
    A --> C[Database]
    B -.-> A
    C -.-> A
    
    subgraph "Cache Invalidation Challenge"
        D[When do we update the cache?]
        E[How do we update the cache?]
        F[How do we prevent stale data?]
        G[What happens during failures?]
    end

Why Cache Invalidation Is Important

Cache invalidation is the process of removing or updating cached data when the source data changes. Without proper invalidation strategies:

Users may see stale data: Leading to confusion, errors, and poor user experience
System consistency is compromised: Different parts of your application may operate on different versions of data
Business logic may fail: Operations dependent on accurate data may produce incorrect results
Security issues can arise: Cached permissions or sensitive data may remain accessible after they should be revoked

The Consistency Spectrum

Different applications have different consistency requirements, ranging from:

Strong consistency: Cached data must always be identical to the source data (e.g., financial transactions)
Eventual consistency: Cached data may be temporarily out of sync but will eventually be updated (e.g., social media likes count)
Weak consistency: Some staleness is acceptable as long as it doesn't affect critical functionality (e.g., view counters)

Cache Invalidation vs. Cache Expiration

Before diving into invalidation patterns, it's important to understand the difference:

Cache invalidation: Proactively removing or updating cached data when the source data changes
Cache expiration: Setting a time-to-live (TTL) for cached data, after which it is automatically removed

Both approaches have their place in a comprehensive caching strategy, and we'll explore how they can be combined effectively.

Common Cache Invalidation Patterns

graph TD
    A[Cache Invalidation Patterns] --> B[Time-Based]
    A --> C[Event-Based]
    A --> D[Query-Based]
    A --> E[Hybrid Approaches]
    
    B --> B1[TTL Expiration]
    B --> B2[Scheduled Refresh]
    
    C --> C1[Write-Through]
    C --> C2[Write-Behind]
    C --> C3[Write-Around]
    C --> C4[Cache-Aside]
    
    D --> D1[Content-Based]
    D --> D2[Query Invalidation]
    
    E --> E1[Leased-Based]
    E --> E2[Versioned Cache Keys]
    E --> E3[Cache Stamping]

Time-Based Invalidation

Time-based invalidation relies on setting expiration timeouts for cached data. After the specified time has elapsed, the data is considered invalid and is removed from the cache.

TTL (Time-To-Live) Expiration

The simplest approach to cache invalidation is to set an expiration time on each cached item.


// Using Redis with TTL expiration
async function getCachedData(key, fetchFunction, ttl = 3600) {
  const client = redis.createClient();
  await client.connect();
  
  // Try to get data from cache
  const cachedData = await client.get(key);
  
  if (cachedData) {
    return JSON.parse(cachedData);
  }
  
  // If no cached data, fetch fresh data
  const freshData = await fetchFunction();
  
  // Store in cache with TTL
  await client.set(key, JSON.stringify(freshData), {
    EX: ttl // Expires in ttl seconds
  });
  
  return freshData;
}

// Usage example
async function getUserProfile(userId) {
  return getCachedData(
    `user:${userId}:profile`,
    async () => {
      // Simulate database query
      return db.query('SELECT * FROM users WHERE id = ?', [userId]);
    },
    1800 // Cache for 30 minutes
  );
}

Advantages:

Simple to implement and understand
Requires no coordination between services
Works well for data that changes infrequently or has predictable update patterns
Automatically handles cached items that are rarely accessed

Disadvantages:

Can result in stale data during the TTL period
Can cause unnecessary refetching if data hasn't changed
Difficult to determine the optimal TTL value
Not suitable for data that requires immediate consistency

Scheduled Refresh

Instead of waiting for a cache miss, proactively refresh cached data on a schedule.


// Using a background job scheduler (like node-cron)
const cron = require('node-cron');
const redis = require('redis');

async function setupCacheRefresh() {
  const client = redis.createClient();
  await client.connect();
  
  // Schedule cache refresh for popular items every 15 minutes
  cron.schedule('*/15 * * * *', async () => {
    console.log('Refreshing popular item cache...');
    
    try {
      // Get list of popular items to refresh
      const popularItemIds = await getPopularItemIds();
      
      // Refresh each item's cache
      await Promise.all(popularItemIds.map(async (itemId) => {
        const freshData = await fetchItemData(itemId);
        await client.set(`item:${itemId}`, JSON.stringify(freshData), {
          EX: 3600 // Set TTL to 1 hour
        });
      }));
      
      console.log(`Refreshed cache for ${popularItemIds.length} items`);
    } catch (error) {
      console.error('Cache refresh failed:', error);
    }
  });
}

async function getPopularItemIds() {
  // Implement logic to determine which items are accessed frequently
  // This could be based on analytics, recent access patterns, etc.
  return db.query('SELECT id FROM items ORDER BY view_count DESC LIMIT 100');
}

async function fetchItemData(itemId) {
  // Fetch fresh data from the database
  return db.query('SELECT * FROM items WHERE id = ?', [itemId]);
}

Advantages:

Reduces cache misses for frequently accessed items
Can prioritize refreshing the most important or popular data
Distributes database load by staggering refreshes
Can be scheduled during off-peak hours for less critical data

Disadvantages:

Requires additional background processing
May refresh data that's not being accessed
Still doesn't provide immediate consistency
Can be complex to implement properly with error handling

Event-Based Invalidation

Event-based invalidation reacts to data changes by updating or invalidating the cache when the underlying data changes.

Write-Through Cache

In a write-through cache, data is written to both the cache and the backend storage simultaneously.

sequenceDiagram
    participant App as Application
    participant Cache as Cache Layer
    participant DB as Database
    
    App->>App: Update operation
    App->>Cache: Update cache
    App->>DB: Update database
    DB-->>App: Confirm update
    Cache-->>App: Confirm update
    App->>App: Operation complete


// Write-through cache implementation
async function updateUserProfile(userId, profileData) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Begin transaction
    const transaction = await db.beginTransaction();
    
    try {
      // Update database
      await db.query(
        'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
        [profileData.name, profileData.email, new Date(), userId],
        { transaction }
      );
      
      // Update cache
      await client.set(
        `user:${userId}:profile`,
        JSON.stringify({
          id: userId,
          ...profileData,
          updated_at: new Date()
        }),
        { EX: 3600 } // Cache for 1 hour
      );
      
      // Commit transaction
      await transaction.commit();
      
      return { success: true };
    } catch (error) {
      // Rollback transaction
      await transaction.rollback();
      throw error;
    }
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  } finally {
    await client.quit();
  }
}

Advantages:

Cache is always up-to-date with the database
Provides strong consistency guarantees
Simple to understand and reason about
Read operations are fast since data is already cached

Disadvantages:

Write operations are slower (must update two systems)
If cache update fails but database update succeeds, inconsistency occurs
Can be wasteful for data that's rarely read
Requires tight coupling between cache and database operations

Write-Behind Cache

In a write-behind (or write-back) cache, data is written to the cache first, and then asynchronously written to the backend storage.

sequenceDiagram
    participant App as Application
    participant Cache as Cache Layer
    participant Queue as Message Queue
    participant Worker as Background Worker
    participant DB as Database
    
    App->>App: Update operation
    App->>Cache: Update cache
    App->>Queue: Queue database update
    Queue-->>App: Acknowledge
    App->>App: Operation complete
    
    Queue->>Worker: Process update
    Worker->>DB: Update database
    DB-->>Worker: Confirm update
    Worker->>Cache: Update cache TTL


// Write-behind cache implementation using a message queue
const { Queue } = require('bullmq');
const writeQueue = new Queue('database-writes');

async function updateUserProfile(userId, profileData) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Update cache immediately
    await client.set(
      `user:${userId}:profile`,
      JSON.stringify({
        id: userId,
        ...profileData,
        updated_at: new Date(),
        pending: true // Mark as pending database update
      }),
      { EX: 3600 } // Cache for 1 hour
    );
    
    // Queue database update
    await writeQueue.add('update-user', {
      userId,
      profileData,
      timestamp: Date.now()
    });
    
    return { success: true, message: 'Update queued' };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  } finally {
    await client.quit();
  }
}

// Worker process to handle database writes
const { Worker } = require('bullmq');

const dbWorker = new Worker('database-writes', async (job) => {
  const { userId, profileData, timestamp } = job.data;
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Update database
    await db.query(
      'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
      [profileData.name, profileData.email, new Date(timestamp), userId]
    );
    
    // Update cache status (no longer pending)
    const cachedData = await client.get(`user:${userId}:profile`);
    
    if (cachedData) {
      const parsedData = JSON.parse(cachedData);
      delete parsedData.pending;
      
      await client.set(
        `user:${userId}:profile`,
        JSON.stringify(parsedData),
        { EX: 3600 } // Reset TTL
      );
    }
    
    return { success: true };
  } catch (error) {
    console.error('Database update failed:', error);
    throw error; // Will trigger retry mechanism
  } finally {
    await client.quit();
  }
}, { connection: redisConnection });

Advantages:

Write operations are faster (respond after cache update)
Can batch or coalesce multiple writes to reduce database load
Improves write performance under high load
Provides resilience if the database is temporarily unavailable

Disadvantages:

More complex to implement and debug
Risk of data loss if the system crashes before writing to database
Eventual consistency rather than strong consistency
Requires handling for retry logic and failure scenarios

Write-Around Cache

In a write-around cache, write operations bypass the cache and go directly to the database. The cache is only updated on read misses.

sequenceDiagram
    participant App as Application
    participant Cache as Cache Layer
    participant DB as Database
    
    Note over App: Write Operation
    App->>DB: Write directly to database
    DB-->>App: Confirm write
    
    Note over App: Read Operation
    App->>Cache: Check cache
    alt Cache Hit
        Cache-->>App: Return cached data
    else Cache Miss
        App->>DB: Read from database
        DB-->>App: Return data
        App->>Cache: Update cache
    end


// Write-around cache implementation
async function updateUserProfile(userId, profileData) {
  try {
    // Update database directly, bypassing cache
    await db.query(
      'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
      [profileData.name, profileData.email, new Date(), userId]
    );
    
    // Invalidate cache (optional)
    const client = redis.createClient();
    await client.connect();
    await client.del(`user:${userId}:profile`);
    await client.quit();
    
    return { success: true };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  }
}

// Read operation with cache-aside pattern
async function getUserProfile(userId) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Try to get from cache
    const cachedData = await client.get(`user:${userId}:profile`);
    
    if (cachedData) {
      return JSON.parse(cachedData);
    }
    
    // Cache miss, read from database
    const userData = await db.query(
      'SELECT * FROM users WHERE id = ?',
      [userId]
    );
    
    if (!userData) {
      return null;
    }
    
    // Update cache
    await client.set(
      `user:${userId}:profile`,
      JSON.stringify(userData),
      { EX: 3600 } // Cache for 1 hour
    );
    
    return userData;
  } catch (error) {
    console.error('Get user failed:', error);
    throw error;
  } finally {
    await client.quit();
  }
}

Advantages:

Simpler write path (write only to database)
Works well for data that is infrequently read
Prevents cache pollution with write-heavy data
Good for write-heavy workloads with infrequent reads

Disadvantages:

Cache misses for recently updated data
Higher read latency after writes
Can lead to stale data if cache TTL is long
Not ideal for read-heavy workloads

Cache-Aside (Lazy Loading)

In the cache-aside pattern, the application is responsible for reading from and writing to both the cache and the database.

sequenceDiagram
    participant App as Application
    participant Cache as Cache Layer
    participant DB as Database
    
    Note over App: Read Operation
    App->>Cache: Check cache
    alt Cache Hit
        Cache-->>App: Return cached data
    else Cache Miss
        App->>DB: Read from database
        DB-->>App: Return data
        App->>Cache: Update cache
    end
    
    Note over App: Write Operation
    App->>DB: Update database
    App->>Cache: Invalidate cache


// Cache-aside pattern implementation
class UserRepository {
  constructor() {
    this.redisClient = redis.createClient();
    this.redisClient.connect();
  }
  
  async getUser(userId) {
    const cacheKey = `user:${userId}`;
    
    // Try to get from cache
    const cachedUser = await this.redisClient.get(cacheKey);
    
    if (cachedUser) {
      console.log('Cache hit for user:', userId);
      return JSON.parse(cachedUser);
    }
    
    console.log('Cache miss for user:', userId);
    
    // Get from database
    const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
    
    if (!user) {
      return null;
    }
    
    // Update cache
    await this.redisClient.set(cacheKey, JSON.stringify(user), {
      EX: 3600 // Cache for 1 hour
    });
    
    return user;
  }
  
  async updateUser(userId, userData) {
    // Update database
    await db.query(
      'UPDATE users SET name = ?, email = ? WHERE id = ?',
      [userData.name, userData.email, userId]
    );
    
    // Invalidate cache
    await this.redisClient.del(`user:${userId}`);
    
    return { success: true };
  }
  
  async close() {
    await this.redisClient.quit();
  }
}

Advantages:

Simple to implement and understand
Only caches data that is actually requested
Works well for read-heavy workloads
Provides a good balance between performance and consistency

Disadvantages:

Cache misses incur additional latency (database + cache write)
Application code is responsible for maintaining consistency
Potential for race conditions between concurrent requests
Can lead to stale data if invalidation fails

Query-Based Invalidation

Query-based invalidation focuses on invalidating cached data based on the content of the data or the structure of the queries.

Content-Based Invalidation

Invalidate cache entries based on the content or relationships in the data.


// Content-based invalidation example
async function updateProduct(productId, productData) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Begin transaction
    const transaction = await db.beginTransaction();
    
    try {
      // Get previous category ID (if changed)
      let previousCategoryId = null;
      if (productData.categoryId) {
        const existingProduct = await db.query(
          'SELECT category_id FROM products WHERE id = ?',
          [productId],
          { transaction }
        );
        
        if (existingProduct && existingProduct.category_id !== productData.categoryId) {
          previousCategoryId = existingProduct.category_id;
        }
      }
      
      // Update database
      await db.query(
        'UPDATE products SET name = ?, price = ?, category_id = ? WHERE id = ?',
        [productData.name, productData.price, productData.categoryId, productId],
        { transaction }
      );
      
      // Invalidate direct product cache
      await client.del(`product:${productId}`);
      
      // Invalidate category-related caches
      if (previousCategoryId) {
        await client.del(`category:${previousCategoryId}:products`);
      }
      
      if (productData.categoryId) {
        await client.del(`category:${productData.categoryId}:products`);
      }
      
      // Invalidate any cached search results or listings
      const productListKeys = await client.keys('product:list:*');
      if (productListKeys.length > 0) {
        await client.del(productListKeys);
      }
      
      // Commit transaction
      await transaction.commit();
      
      return { success: true };
    } catch (error) {
      // Rollback transaction
      await transaction.rollback();
      throw error;
    }
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  } finally {
    await client.quit();
  }
}

Advantages:

More precise invalidation based on actual data relationships
Can help maintain consistency across related data
Reduces unnecessary cache invalidations
Works well with complex data models

Disadvantages:

Requires understanding of data relationships
More complex to implement and maintain
May still miss some indirect relationships
Can be harder to debug when issues occur

Query Invalidation

Track which database tables or objects are involved in each query, and invalidate cached results when those tables change.


// Query invalidation system
class CacheInvalidator {
  constructor(redisClient) {
    this.client = redisClient;
  }
  
  // Track which tables are accessed by a query
  async trackQuery(queryId, tables) {
    // Store mapping from tables to queries
    for (const table of tables) {
      await this.client.sAdd(`table:${table}:queries`, queryId);
    }
    
    // Store mapping from query to tables
    await this.client.sAdd(`query:${queryId}:tables`, ...tables);
  }
  
  // Invalidate all queries affected by a table update
  async invalidateTable(table) {
    // Get all queries that depend on this table
    const affectedQueries = await this.client.sMembers(`table:${table}:queries`);
    
    if (affectedQueries.length === 0) {
      return { invalidated: 0 };
    }
    
    // Delete all affected query results
    const pipeline = this.client.multi();
    
    for (const queryId of affectedQueries) {
      pipeline.del(`cache:${queryId}`);
    }
    
    await pipeline.exec();
    
    return { invalidated: affectedQueries.length };
  }
  
  // Helper to generate a deterministic query ID
  static generateQueryId(query, params) {
    const hash = crypto.createHash('md5');
    hash.update(query);
    hash.update(JSON.stringify(params || {}));
    return hash.digest('hex');
  }
}

// Usage example
async function getProductsByCategory(categoryId, options = {}) {
  const client = redis.createClient();
  await client.connect();
  
  const invalidator = new CacheInvalidator(client);
  
  // Generate a deterministic query ID
  const query = 'SELECT * FROM products WHERE category_id = ? ORDER BY created_at DESC LIMIT ? OFFSET ?';
  const params = [categoryId, options.limit || 20, options.offset || 0];
  const queryId = CacheInvalidator.generateQueryId(query, params);
  
  try {
    // Try to get from cache
    const cachedResult = await client.get(`cache:${queryId}`);
    
    if (cachedResult) {
      return JSON.parse(cachedResult);
    }
    
    // Cache miss, execute query
    const results = await db.query(query, params);
    
    // Store in cache
    await client.set(`cache:${queryId}`, JSON.stringify(results), {
      EX: 3600 // Cache for 1 hour
    });
    
    // Track which tables this query depends on
    await invalidator.trackQuery(queryId, ['products', 'categories']);
    
    return results;
  } finally {
    await client.quit();
  }
}

// When updating a product
async function updateProduct(productId, data) {
  // Update database
  await db.query('UPDATE products SET ... WHERE id = ?', [...Object.values(data), productId]);
  
  // Invalidate all queries that depend on the products table
  const client = redis.createClient();
  await client.connect();
  
  const invalidator = new CacheInvalidator(client);
  const result = await invalidator.invalidateTable('products');
  
  console.log(`Invalidated ${result.invalidated} cached queries`);
  
  await client.quit();
  
  return { success: true };
}

Advantages:

Automatically invalidates all affected queries
Reduces the need for manual invalidation logic
Can handle complex query dependencies
Good for applications with many different query patterns

Disadvantages:

More overhead for tracking query dependencies
Can lead to over-invalidation (table-level granularity)
Complex to implement correctly
May require database-specific integration

Hybrid Approaches

Most real-world applications use a combination of the above strategies, tailored to their specific requirements.

Versioned Cache Keys

Instead of invalidating cache entries, use versioned keys that change when the data changes.


// Versioned cache keys implementation
class VersionedCache {
  constructor(redisClient) {
    this.client = redisClient;
  }
  
  // Get a versioned key
  async getVersionedKey(baseKey) {
    // Get the current version for this key
    const version = await this.client.get(`version:${baseKey}`) || '1';
    return `${baseKey}:v${version}`;
  }
  
  // Increment the version
  async incrementVersion(baseKey) {
    // Increment the version number
    return this.client.incr(`version:${baseKey}`);
  }
  
  // Get cache data with versioned key
  async get(baseKey) {
    const versionedKey = await this.getVersionedKey(baseKey);
    const data = await this.client.get(versionedKey);
    
    return data ? JSON.parse(data) : null;
  }
  
  // Set cache data with versioned key
  async set(baseKey, data, ttl = 3600) {
    const versionedKey = await this.getVersionedKey(baseKey);
    
    await this.client.set(versionedKey, JSON.stringify(data), {
      EX: ttl
    });
    
    return versionedKey;
  }
  
  // Invalidate by incrementing version
  async invalidate(baseKey) {
    return this.incrementVersion(baseKey);
  }
}

// Usage example
async function getUserProfile(userId) {
  const client = redis.createClient();
  await client.connect();
  
  const cache = new VersionedCache(client);
  
  try {
    // Try to get from cache
    const baseKey = `user:${userId}:profile`;
    const cachedData = await cache.get(baseKey);
    
    if (cachedData) {
      return cachedData;
    }
    
    // Cache miss, get from database
    const userData = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
    
    if (!userData) {
      return null;
    }
    
    // Store in cache
    await cache.set(baseKey, userData);
    
    return userData;
  } finally {
    await client.quit();
  }
}

// When updating a user
async function updateUserProfile(userId, data) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Update database
    await db.query(
      'UPDATE users SET name = ?, email = ? WHERE id = ?',
      [data.name, data.email, userId]
    );
    
    // Invalidate by incrementing version
    const cache = new VersionedCache(client);
    const newVersion = await cache.invalidate(`user:${userId}:profile`);
    
    console.log(`Invalidated user ${userId} profile, new version: ${newVersion}`);
    
    return { success: true };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  } finally {
    await client.quit();
  }
}

Advantages:

No need to delete cache entries
Old versions expire naturally via TTL
Less chance of race conditions
Can track version history if needed

Disadvantages:

Requires additional storage for version numbers
Slightly more complex implementation
Can lead to cache bloat with many versions
May need periodic cleanup of old versions

Cache Stamping

Store a timestamp or version with each cached item and verify it's still valid before using it.


// Cache stamping implementation
async function getUserWithStamp(userId) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Get latest timestamp from database
    const latestStamp = await db.query(
      'SELECT updated_at FROM users WHERE id = ? LIMIT 1',
      [userId]
    ).then(result => result ? result.updated_at.getTime() : 0);
    
    // Try to get from cache
    const cacheKey = `user:${userId}`;
    const cachedData = await client.get(cacheKey);
    
    if (cachedData) {
      const parsed = JSON.parse(cachedData);
      
      // Check if cache is still valid
      if (parsed._timestamp >= latestStamp) {
        return parsed;
      }
      
      console.log('Cache stamp outdated, refreshing');
    }
    
    // Cache miss or outdated, get from database
    const userData = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
    
    if (!userData) {
      return null;
    }
    
    // Add timestamp and store in cache
    userData._timestamp = latestStamp;
    
    await client.set(cacheKey, JSON.stringify(userData), {
      EX: 3600 // Cache for 1 hour
    });
    
    return userData;
  } finally {
    await client.quit();
  }
}

// When updating a user
async function updateUserWithStamp(userId, data) {
  try {
    // Update database with new timestamp
    const now = new Date();
    await db.query(
      'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
      [data.name, data.email, now, userId]
    );
    
    return { success: true, timestamp: now.getTime() };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  }
}

Advantages:

Self-validating cache entries
Works well with distributed systems
Can avoid explicit invalidation in many cases
Natural handling of concurrent updates

Disadvantages:

Requires additional database queries for validation
Slightly more complex implementation
Can be less efficient for frequently changing data
Requires a reliable timestamp or version mechanism

Real-World Cache Invalidation Scenarios

Microservice Architectures

In microservice architectures, cache invalidation becomes more challenging because different services may update data independently.

graph TD
    A[User Service] --> B[User Cache]
    C[Order Service] --> D[Order Cache]
    E[Product Service] --> F[Product Cache]
    G[API Gateway] --> B
    G --> D
    G --> F
    H[Event Bus] --> A
    H --> C
    H --> E

Event-Driven Invalidation

Use an event bus to publish data change events that trigger cache invalidation.


// Using an event bus (e.g., RabbitMQ, Kafka) for cache invalidation
const amqp = require('amqplib');

// Publisher service
async function updateUser(userId, userData) {
  try {
    // Update database
    await db.query(
      'UPDATE users SET name = ?, email = ? WHERE id = ?',
      [userData.name, userData.email, userId]
    );
    
    // Publish event
    const connection = await amqp.connect('amqp://localhost');
    const channel = await connection.createChannel();
    
    await channel.assertExchange('data-changes', 'topic', { durable: true });
    
    const event = {
      type: 'user.updated',
      userId,
      timestamp: Date.now()
    };
    
    channel.publish(
      'data-changes',
      'user.updated',
      Buffer.from(JSON.stringify(event))
    );
    
    await channel.close();
    await connection.close();
    
    return { success: true };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  }
}

// Subscriber service (Cache Manager)
async function startCacheInvalidator() {
  const connection = await amqp.connect('amqp://localhost');
  const channel = await connection.createChannel();
  
  await channel.assertExchange('data-changes', 'topic', { durable: true });
  
  const { queue } = await channel.assertQueue('cache-invalidator', {
    exclusive: false
  });
  
  // Bind to relevant events
  await channel.bindQueue(queue, 'data-changes', 'user.#');
  await channel.bindQueue(queue, 'data-changes', 'order.#');
  await channel.bindQueue(queue, 'data-changes', 'product.#');
  
  console.log('Cache invalidator listening for events');
  
  channel.consume(queue, async (msg) => {
    if (!msg) return;
    
    try {
      const event = JSON.parse(msg.content.toString());
      console.log('Received event:', event);
      
      // Connect to Redis
      const client = redis.createClient();
      await client.connect();
      
      // Handle different event types
      switch (event.type) {
        case 'user.updated':
          await client.del(`user:${event.userId}`);
          await client.del(`user:${event.userId}:profile`);
          console.log(`Invalidated cache for user ${event.userId}`);
          break;
          
        case 'order.created':
        case 'order.updated':
          await client.del(`order:${event.orderId}`);
          // Also invalidate user's orders list
          await client.del(`user:${event.userId}:orders`);
          console.log(`Invalidated cache for order ${event.orderId}`);
          break;
          
        case 'product.updated':
          await client.del(`product:${event.productId}`);
          // Also invalidate category product lists
          await client.del(`category:${event.categoryId}:products`);
          console.log(`Invalidated cache for product ${event.productId}`);
          break;
      }
      
      await client.quit();
      
      // Acknowledge message
      channel.ack(msg);
    } catch (error) {
      console.error('Error processing event:', error);
      // Nack message for requeue
      channel.nack(msg);
    }
  });
}

startCacheInvalidator().catch(console.error);

Multi-Region Deployment

In multi-region deployments, cache invalidation must be coordinated across different geographical locations.

Global Cache Invalidation Service

Use a centralized service to coordinate cache invalidation across regions.


// Using Redis Pub/Sub for cross-region invalidation
const redis = require('redis');

class GlobalCacheInvalidator {
  constructor(options = {}) {
    // Create publisher client
    this.publisher = redis.createClient({
      url: options.redisUrl || 'redis://localhost:6379'
    });
    
    // Create subscriber client
    this.subscriber = this.publisher.duplicate();
    
    // Connect both clients
    Promise.all([
      this.publisher.connect(),
      this.subscriber.connect()
    ]).then(() => {
      console.log('Connected to Redis');
    }).catch(err => {
      console.error('Redis connection error:', err);
    });
    
    // The channel for invalidation events
    this.channel = options.channel || 'cache:invalidations';
    
    // Region identifier
    this.region = options.region || 'default';
    
    // Local cache client
    this.cacheClient = options.cacheClient || redis.createClient({
      url: options.cacheRedisUrl || 'redis://localhost:6379'
    });
    this.cacheClient.connect().catch(console.error);
  }
  
  // Start listening for invalidation events
  async startListening() {
    await this.subscriber.subscribe(this.channel, (message, channel) => {
      try {
        const event = JSON.parse(message);
        
        // Skip events from this region (we already handled them)
        if (event.sourceRegion === this.region) {
          return;
        }
        
        console.log(`Received invalidation from ${event.sourceRegion}:`, event);
        
        // Process the invalidation
        this.processInvalidation(event);
      } catch (error) {
        console.error('Error processing invalidation event:', error);
      }
    });
    
    console.log(`Listening for invalidations on channel: ${this.channel}`);
  }
  
  // Process an invalidation event
  async processInvalidation(event) {
    switch (event.type) {
      case 'key':
        // Invalidate a specific key
        await this.cacheClient.del(event.key);
        console.log(`Invalidated key: ${event.key}`);
        break;
        
      case 'pattern':
        // Invalidate keys matching a pattern
        const keys = await this.cacheClient.keys(event.pattern);
        
        if (keys.length > 0) {
          await this.cacheClient.del(keys);
          console.log(`Invalidated ${keys.length} keys matching pattern: ${event.pattern}`);
        }
        break;
        
      case 'tag':
        // Invalidate keys with a specific tag
        // Requires an implementation of tagged cache keys
        console.log(`Tag invalidation not implemented yet: ${event.tag}`);
        break;
    }
  }
  
  // Publish an invalidation event
  async invalidate(type, target) {
    const event = {
      type,
      sourceRegion: this.region,
      timestamp: Date.now()
    };
    
    // Add the appropriate target property
    switch (type) {
      case 'key':
        event.key = target;
        break;
      case 'pattern':
        event.pattern = target;
        break;
      case 'tag':
        event.tag = target;
        break;
    }
    
    // Publish the event
    await this.publisher.publish(this.channel, JSON.stringify(event));
    
    // Also process locally
    await this.processInvalidation(event);
    
    return true;
  }
  
  // Convenience methods
  async invalidateKey(key) {
    return this.invalidate('key', key);
  }
  
  async invalidatePattern(pattern) {
    return this.invalidate('pattern', pattern);
  }
  
  async invalidateTag(tag) {
    return this.invalidate('tag', tag);
  }
  
  // Clean up resources
  async close() {
    await this.publisher.quit();
    await this.subscriber.quit();
    await this.cacheClient.quit();
  }
}

// Usage example
async function updateUserGlobally(userId, userData) {
  try {
    // Update database
    await db.query(
      'UPDATE users SET name = ?, email = ? WHERE id = ?',
      [userData.name, userData.email, userId]
    );
    
    // Invalidate cache across all regions
    const invalidator = new GlobalCacheInvalidator({
      region: 'us-east-1'
    });
    
    await invalidator.invalidatePattern(`user:${userId}:*`);
    
    await invalidator.close();
    
    return { success: true };
  } catch (error) {
    console.error('Update failed:', error);
    return { success: false, error: error.message };
  }
}

High-Traffic Applications

For high-traffic applications, inefficient cache invalidation can lead to "thundering herd" problems.

Staggered Invalidation

Invalidate cache entries gradually to prevent all clients from hitting the database simultaneously.


// Staggered invalidation implementation
class StaggeredInvalidator {
  constructor(redisClient) {
    this.client = redisClient;
  }
  
  // Add a soft invalidation (probabilistic expiration)
  async softInvalidate(pattern, durationMs = 60000) {
    // Find keys matching the pattern
    const keys = await this.client.keys(pattern);
    
    if (keys.length === 0) {
      return { invalidated: 0 };
    }
    
    const startTime = Date.now();
    const endTime = startTime + durationMs;
    
    // For each key, set a special "invalidation" flag
    // with the end time of the invalidation window
    const pipeline = this.client.multi();
    
    for (const key of keys) {
      pipeline.hSet(`${key}:meta`, 'invalidating', endTime.toString());
    }
    
    await pipeline.exec();
    
    return { invalidated: keys.length, endTime };
  }
  
  // Check if a key is being invalidated
  async isInvalidating(key) {
    const invalidatingUntil = await this.client.hGet(`${key}:meta`, 'invalidating');
    
    if (!invalidatingUntil) {
      return false;
    }
    
    const endTime = parseInt(invalidatingUntil, 10);
    const now = Date.now();
    
    // Check if we're still in the invalidation window
    return now < endTime;
  }
  
  // Decide whether to use cache based on how far we are in the invalidation window
  async shouldUseCache(key) {
    const invalidatingUntil = await this.client.hGet(`${key}:meta`, 'invalidating');
    
    if (!invalidatingUntil) {
      return true; // Not being invalidated, use cache normally
    }
    
    const endTime = parseInt(invalidatingUntil, 10);
    const now = Date.now();
    
    // If invalidation period is over, clear the flag and use cache
    if (now >= endTime) {
      await this.client.hDel(`${key}:meta`, 'invalidating');
      return true;
    }
    
    // Calculate how far we are in the invalidation window (0.0 to 1.0)
    const startTime = endTime - 60000; // Assuming 1-minute window
    const progress = (now - startTime) / (endTime - startTime);
    
    // Probability increases as we get closer to the end of the window
    // This creates a gradual ramp-up of cache refreshes
    return Math.random() > progress;
  }
}

// Usage example with cache-aside pattern
async function getProductWithStaggered(productId) {
  const client = redis.createClient();
  await client.connect();
  
  const invalidator = new StaggeredInvalidator(client);
  
  try {
    const cacheKey = `product:${productId}`;
    
    // Check if we should use the cache
    const useCache = await invalidator.shouldUseCache(cacheKey);
    
    if (useCache) {
      // Try to get from cache
      const cachedData = await client.get(cacheKey);
      
      if (cachedData) {
        return JSON.parse(cachedData);
      }
    }
    
    // Cache miss or bypassing cache during invalidation
    console.log('Getting fresh data from database');
    
    const productData = await db.query(
      'SELECT * FROM products WHERE id = ?',
      [productId]
    );
    
    if (!productData) {
      return null;
    }
    
    // Update cache
    await client.set(cacheKey, JSON.stringify(productData), {
      EX: 3600 // Cache for 1 hour
    });
    
    return productData;
  } finally {
    await client.quit();
  }
}

// When updating a product (e.g., during a flash sale)
async function prepareProductUpdate(productId) {
  const client = redis.createClient();
  await client.connect();
  
  try {
    // Start a 1-minute staggered invalidation before the update
    const invalidator = new StaggeredInvalidator(client);
    await invalidator.softInvalidate(`product:${productId}`, 60000);
    
    console.log(`Started staggered invalidation for product ${productId}`);
    
    return { success: true };
  } finally {
    await client.quit();
  }
}

Best Practices for Cache Invalidation

Design Principles

Choose the Right Invalidation Strategy: Select the most appropriate strategy based on your consistency requirements.
Be Pessimistic: When in doubt, invalidate the cache. It's better to have a cache miss than to serve stale data.
Use TTLs as a Safety Net: Even with active invalidation, set TTLs to prevent indefinitely stale data.
Keep Invalidation Logic Close to Data Mutation: The code that updates data should also be responsible for invalidation.
Consider Partial Updates: For large objects, consider updating just the changed portions rather than invalidating the entire object.

Implementation Tips

Use Consistent Key Naming: Establish naming conventions that make relationships between keys clear.
Batch Invalidations: When invalidating multiple keys, use multi/exec or pipeline operations.
Log Invalidation Events: Keep track of cache invalidations to help debug issues.
Monitor Cache Hit Rates: A sudden drop in hit rate may indicate over-invalidation.
Test Invalidation Paths: Explicitly test cache invalidation in your test suite.

Common Pitfalls

Race Conditions: Be aware of potential race conditions between reads, writes, and invalidations.
Over-Invalidation: Invalidating too much data reduces the effectiveness of your cache.
Under-Invalidation: Missing invalidation paths leads to stale data.
Thundering Herd: Be careful about invalidating popular cache keys simultaneously.
Forgetting about Related Data: Remember to invalidate derived or aggregated data.

Operational Considerations

Cache Warmup: Consider pre-populating caches after invalidation for critical data.
Emergency Purge: Have a mechanism to purge the entire cache in emergencies.
Monitoring and Alerting: Set up monitoring for cache size, hit rates, and invalidation patterns.
Fallback Strategies: Implement graceful degradation if cache service is unavailable.
Documentation: Document cache invalidation strategies for each data type.

Conclusion

Cache invalidation is one of the most challenging aspects of distributed systems, but with the right patterns and strategies, it can be managed effectively. The key is to understand your application's consistency requirements and to choose the appropriate invalidation approach for each type of data.

Remember that there's no one-size-fits-all solution. Most applications will use a combination of approaches:

Time-based expiration for infrequently accessed data
Active invalidation for frequently accessed data
Event-based propagation for distributed systems
Versioning or stamping for complex data models

By applying the principles and patterns we've covered in this lecture, you'll be better equipped to build high-performance, consistent, and reliable applications that leverage the power of caching without falling into its pitfalls.

Practical Exercises

Exercise 1: Implement Cache-Aside Pattern

Implement a cache-aside pattern for a product catalog API. Create endpoints for:

Getting a product by ID
Listing products by category
Updating a product

Ensure that the cache is properly invalidated when products are updated.

Exercise 2: Event-Based Invalidation

Create a simple event bus using Redis Pub/Sub. Implement a system where multiple services can publish data change events, and a cache invalidation service subscribes to these events to keep the cache consistent.

Exercise 3: Versioned Cache Keys

Implement the versioned cache keys pattern for a user profile system. Ensure that profile views always see the latest data without explicitly deleting cache entries.

Exercise 4: Benchmark Different Strategies

Create a benchmark to compare the performance of different cache invalidation strategies under various workloads. Consider factors like:

Read-heavy vs. write-heavy workloads
Low vs. high concurrency
Cold vs. warm cache scenarios

Exercise 5: Cache Invalidation Dashboard

Create a simple dashboard that monitors cache invalidation events and displays metrics like:

Cache hit rate over time
Number of invalidations per minute
Top invalidated keys or patterns
Cache size and memory usage

The Cache Invalidation Challenge

Why Cache Invalidation Is Important

The Consistency Spectrum

Cache Invalidation vs. Cache Expiration

Common Cache Invalidation Patterns

Time-Based Invalidation

TTL (Time-To-Live) Expiration

Scheduled Refresh

Event-Based Invalidation

Write-Through Cache

Write-Behind Cache

Write-Around Cache

Cache-Aside (Lazy Loading)

Query-Based Invalidation

Content-Based Invalidation

Query Invalidation

Hybrid Approaches

Versioned Cache Keys

Cache Stamping

Real-World Cache Invalidation Scenarios

Microservice Architectures

Event-Driven Invalidation

Multi-Region Deployment

Global Cache Invalidation Service

High-Traffic Applications

Staggered Invalidation

Best Practices for Cache Invalidation

Design Principles

Implementation Tips

Common Pitfalls

Operational Considerations

Conclusion

Further Reading

Practical Exercises

Exercise 1: Implement Cache-Aside Pattern

Exercise 2: Event-Based Invalidation

Exercise 3: Versioned Cache Keys

Exercise 4: Benchmark Different Strategies

Exercise 5: Cache Invalidation Dashboard