The Cache Invalidation Challenge
In our previous lectures, we explored how caching can dramatically improve application performance by storing frequently accessed data in fast-access storage like Redis. However, caching introduces a critical challenge: ensuring that cached data remains consistent with the source of truth.
Phil Karlton, a computer scientist at Netscape, famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." Today, we'll tackle the first of these challenges.
flowchart TD
A[Application] --> B[Cache]
A --> C[Database]
B -.-> A
C -.-> A
subgraph "Cache Invalidation Challenge"
D[When do we update the cache?]
E[How do we update the cache?]
F[How do we prevent stale data?]
G[What happens during failures?]
end
Why Cache Invalidation Is Important
Cache invalidation is the process of removing or updating cached data when the source data changes. Without proper invalidation strategies:
- Users may see stale data: Leading to confusion, errors, and poor user experience
- System consistency is compromised: Different parts of your application may operate on different versions of data
- Business logic may fail: Operations dependent on accurate data may produce incorrect results
- Security issues can arise: Cached permissions or sensitive data may remain accessible after they should be revoked
The Consistency Spectrum
Different applications have different consistency requirements, ranging from:
- Strong consistency: Cached data must always be identical to the source data (e.g., financial transactions)
- Eventual consistency: Cached data may be temporarily out of sync but will eventually be updated (e.g., social media likes count)
- Weak consistency: Some staleness is acceptable as long as it doesn't affect critical functionality (e.g., view counters)
Cache Invalidation vs. Cache Expiration
Before diving into invalidation patterns, it's important to understand the difference:
- Cache invalidation: Proactively removing or updating cached data when the source data changes
- Cache expiration: Setting a time-to-live (TTL) for cached data, after which it is automatically removed
Both approaches have their place in a comprehensive caching strategy, and we'll explore how they can be combined effectively.
Common Cache Invalidation Patterns
graph TD
A[Cache Invalidation Patterns] --> B[Time-Based]
A --> C[Event-Based]
A --> D[Query-Based]
A --> E[Hybrid Approaches]
B --> B1[TTL Expiration]
B --> B2[Scheduled Refresh]
C --> C1[Write-Through]
C --> C2[Write-Behind]
C --> C3[Write-Around]
C --> C4[Cache-Aside]
D --> D1[Content-Based]
D --> D2[Query Invalidation]
E --> E1[Leased-Based]
E --> E2[Versioned Cache Keys]
E --> E3[Cache Stamping]
Time-Based Invalidation
Time-based invalidation relies on setting expiration timeouts for cached data. After the specified time has elapsed, the data is considered invalid and is removed from the cache.
TTL (Time-To-Live) Expiration
The simplest approach to cache invalidation is to set an expiration time on each cached item.
// Using Redis with TTL expiration
async function getCachedData(key, fetchFunction, ttl = 3600) {
const client = redis.createClient();
await client.connect();
// Try to get data from cache
const cachedData = await client.get(key);
if (cachedData) {
return JSON.parse(cachedData);
}
// If no cached data, fetch fresh data
const freshData = await fetchFunction();
// Store in cache with TTL
await client.set(key, JSON.stringify(freshData), {
EX: ttl // Expires in ttl seconds
});
return freshData;
}
// Usage example
async function getUserProfile(userId) {
return getCachedData(
`user:${userId}:profile`,
async () => {
// Simulate database query
return db.query('SELECT * FROM users WHERE id = ?', [userId]);
},
1800 // Cache for 30 minutes
);
}
Advantages:
- Simple to implement and understand
- Requires no coordination between services
- Works well for data that changes infrequently or has predictable update patterns
- Automatically handles cached items that are rarely accessed
Disadvantages:
- Can result in stale data during the TTL period
- Can cause unnecessary refetching if data hasn't changed
- Difficult to determine the optimal TTL value
- Not suitable for data that requires immediate consistency
Scheduled Refresh
Instead of waiting for a cache miss, proactively refresh cached data on a schedule.
// Using a background job scheduler (like node-cron)
const cron = require('node-cron');
const redis = require('redis');
async function setupCacheRefresh() {
const client = redis.createClient();
await client.connect();
// Schedule cache refresh for popular items every 15 minutes
cron.schedule('*/15 * * * *', async () => {
console.log('Refreshing popular item cache...');
try {
// Get list of popular items to refresh
const popularItemIds = await getPopularItemIds();
// Refresh each item's cache
await Promise.all(popularItemIds.map(async (itemId) => {
const freshData = await fetchItemData(itemId);
await client.set(`item:${itemId}`, JSON.stringify(freshData), {
EX: 3600 // Set TTL to 1 hour
});
}));
console.log(`Refreshed cache for ${popularItemIds.length} items`);
} catch (error) {
console.error('Cache refresh failed:', error);
}
});
}
async function getPopularItemIds() {
// Implement logic to determine which items are accessed frequently
// This could be based on analytics, recent access patterns, etc.
return db.query('SELECT id FROM items ORDER BY view_count DESC LIMIT 100');
}
async function fetchItemData(itemId) {
// Fetch fresh data from the database
return db.query('SELECT * FROM items WHERE id = ?', [itemId]);
}
Advantages:
- Reduces cache misses for frequently accessed items
- Can prioritize refreshing the most important or popular data
- Distributes database load by staggering refreshes
- Can be scheduled during off-peak hours for less critical data
Disadvantages:
- Requires additional background processing
- May refresh data that's not being accessed
- Still doesn't provide immediate consistency
- Can be complex to implement properly with error handling
Event-Based Invalidation
Event-based invalidation reacts to data changes by updating or invalidating the cache when the underlying data changes.
Write-Through Cache
In a write-through cache, data is written to both the cache and the backend storage simultaneously.
sequenceDiagram
participant App as Application
participant Cache as Cache Layer
participant DB as Database
App->>App: Update operation
App->>Cache: Update cache
App->>DB: Update database
DB-->>App: Confirm update
Cache-->>App: Confirm update
App->>App: Operation complete
// Write-through cache implementation
async function updateUserProfile(userId, profileData) {
const client = redis.createClient();
await client.connect();
try {
// Begin transaction
const transaction = await db.beginTransaction();
try {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
[profileData.name, profileData.email, new Date(), userId],
{ transaction }
);
// Update cache
await client.set(
`user:${userId}:profile`,
JSON.stringify({
id: userId,
...profileData,
updated_at: new Date()
}),
{ EX: 3600 } // Cache for 1 hour
);
// Commit transaction
await transaction.commit();
return { success: true };
} catch (error) {
// Rollback transaction
await transaction.rollback();
throw error;
}
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
} finally {
await client.quit();
}
}
Advantages:
- Cache is always up-to-date with the database
- Provides strong consistency guarantees
- Simple to understand and reason about
- Read operations are fast since data is already cached
Disadvantages:
- Write operations are slower (must update two systems)
- If cache update fails but database update succeeds, inconsistency occurs
- Can be wasteful for data that's rarely read
- Requires tight coupling between cache and database operations
Write-Behind Cache
In a write-behind (or write-back) cache, data is written to the cache first, and then asynchronously written to the backend storage.
sequenceDiagram
participant App as Application
participant Cache as Cache Layer
participant Queue as Message Queue
participant Worker as Background Worker
participant DB as Database
App->>App: Update operation
App->>Cache: Update cache
App->>Queue: Queue database update
Queue-->>App: Acknowledge
App->>App: Operation complete
Queue->>Worker: Process update
Worker->>DB: Update database
DB-->>Worker: Confirm update
Worker->>Cache: Update cache TTL
// Write-behind cache implementation using a message queue
const { Queue } = require('bullmq');
const writeQueue = new Queue('database-writes');
async function updateUserProfile(userId, profileData) {
const client = redis.createClient();
await client.connect();
try {
// Update cache immediately
await client.set(
`user:${userId}:profile`,
JSON.stringify({
id: userId,
...profileData,
updated_at: new Date(),
pending: true // Mark as pending database update
}),
{ EX: 3600 } // Cache for 1 hour
);
// Queue database update
await writeQueue.add('update-user', {
userId,
profileData,
timestamp: Date.now()
});
return { success: true, message: 'Update queued' };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
} finally {
await client.quit();
}
}
// Worker process to handle database writes
const { Worker } = require('bullmq');
const dbWorker = new Worker('database-writes', async (job) => {
const { userId, profileData, timestamp } = job.data;
const client = redis.createClient();
await client.connect();
try {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
[profileData.name, profileData.email, new Date(timestamp), userId]
);
// Update cache status (no longer pending)
const cachedData = await client.get(`user:${userId}:profile`);
if (cachedData) {
const parsedData = JSON.parse(cachedData);
delete parsedData.pending;
await client.set(
`user:${userId}:profile`,
JSON.stringify(parsedData),
{ EX: 3600 } // Reset TTL
);
}
return { success: true };
} catch (error) {
console.error('Database update failed:', error);
throw error; // Will trigger retry mechanism
} finally {
await client.quit();
}
}, { connection: redisConnection });
Advantages:
- Write operations are faster (respond after cache update)
- Can batch or coalesce multiple writes to reduce database load
- Improves write performance under high load
- Provides resilience if the database is temporarily unavailable
Disadvantages:
- More complex to implement and debug
- Risk of data loss if the system crashes before writing to database
- Eventual consistency rather than strong consistency
- Requires handling for retry logic and failure scenarios
Write-Around Cache
In a write-around cache, write operations bypass the cache and go directly to the database. The cache is only updated on read misses.
sequenceDiagram
participant App as Application
participant Cache as Cache Layer
participant DB as Database
Note over App: Write Operation
App->>DB: Write directly to database
DB-->>App: Confirm write
Note over App: Read Operation
App->>Cache: Check cache
alt Cache Hit
Cache-->>App: Return cached data
else Cache Miss
App->>DB: Read from database
DB-->>App: Return data
App->>Cache: Update cache
end
// Write-around cache implementation
async function updateUserProfile(userId, profileData) {
try {
// Update database directly, bypassing cache
await db.query(
'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
[profileData.name, profileData.email, new Date(), userId]
);
// Invalidate cache (optional)
const client = redis.createClient();
await client.connect();
await client.del(`user:${userId}:profile`);
await client.quit();
return { success: true };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
}
}
// Read operation with cache-aside pattern
async function getUserProfile(userId) {
const client = redis.createClient();
await client.connect();
try {
// Try to get from cache
const cachedData = await client.get(`user:${userId}:profile`);
if (cachedData) {
return JSON.parse(cachedData);
}
// Cache miss, read from database
const userData = await db.query(
'SELECT * FROM users WHERE id = ?',
[userId]
);
if (!userData) {
return null;
}
// Update cache
await client.set(
`user:${userId}:profile`,
JSON.stringify(userData),
{ EX: 3600 } // Cache for 1 hour
);
return userData;
} catch (error) {
console.error('Get user failed:', error);
throw error;
} finally {
await client.quit();
}
}
Advantages:
- Simpler write path (write only to database)
- Works well for data that is infrequently read
- Prevents cache pollution with write-heavy data
- Good for write-heavy workloads with infrequent reads
Disadvantages:
- Cache misses for recently updated data
- Higher read latency after writes
- Can lead to stale data if cache TTL is long
- Not ideal for read-heavy workloads
Cache-Aside (Lazy Loading)
In the cache-aside pattern, the application is responsible for reading from and writing to both the cache and the database.
sequenceDiagram
participant App as Application
participant Cache as Cache Layer
participant DB as Database
Note over App: Read Operation
App->>Cache: Check cache
alt Cache Hit
Cache-->>App: Return cached data
else Cache Miss
App->>DB: Read from database
DB-->>App: Return data
App->>Cache: Update cache
end
Note over App: Write Operation
App->>DB: Update database
App->>Cache: Invalidate cache
// Cache-aside pattern implementation
class UserRepository {
constructor() {
this.redisClient = redis.createClient();
this.redisClient.connect();
}
async getUser(userId) {
const cacheKey = `user:${userId}`;
// Try to get from cache
const cachedUser = await this.redisClient.get(cacheKey);
if (cachedUser) {
console.log('Cache hit for user:', userId);
return JSON.parse(cachedUser);
}
console.log('Cache miss for user:', userId);
// Get from database
const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
if (!user) {
return null;
}
// Update cache
await this.redisClient.set(cacheKey, JSON.stringify(user), {
EX: 3600 // Cache for 1 hour
});
return user;
}
async updateUser(userId, userData) {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ? WHERE id = ?',
[userData.name, userData.email, userId]
);
// Invalidate cache
await this.redisClient.del(`user:${userId}`);
return { success: true };
}
async close() {
await this.redisClient.quit();
}
}
Advantages:
- Simple to implement and understand
- Only caches data that is actually requested
- Works well for read-heavy workloads
- Provides a good balance between performance and consistency
Disadvantages:
- Cache misses incur additional latency (database + cache write)
- Application code is responsible for maintaining consistency
- Potential for race conditions between concurrent requests
- Can lead to stale data if invalidation fails
Query-Based Invalidation
Query-based invalidation focuses on invalidating cached data based on the content of the data or the structure of the queries.
Content-Based Invalidation
Invalidate cache entries based on the content or relationships in the data.
// Content-based invalidation example
async function updateProduct(productId, productData) {
const client = redis.createClient();
await client.connect();
try {
// Begin transaction
const transaction = await db.beginTransaction();
try {
// Get previous category ID (if changed)
let previousCategoryId = null;
if (productData.categoryId) {
const existingProduct = await db.query(
'SELECT category_id FROM products WHERE id = ?',
[productId],
{ transaction }
);
if (existingProduct && existingProduct.category_id !== productData.categoryId) {
previousCategoryId = existingProduct.category_id;
}
}
// Update database
await db.query(
'UPDATE products SET name = ?, price = ?, category_id = ? WHERE id = ?',
[productData.name, productData.price, productData.categoryId, productId],
{ transaction }
);
// Invalidate direct product cache
await client.del(`product:${productId}`);
// Invalidate category-related caches
if (previousCategoryId) {
await client.del(`category:${previousCategoryId}:products`);
}
if (productData.categoryId) {
await client.del(`category:${productData.categoryId}:products`);
}
// Invalidate any cached search results or listings
const productListKeys = await client.keys('product:list:*');
if (productListKeys.length > 0) {
await client.del(productListKeys);
}
// Commit transaction
await transaction.commit();
return { success: true };
} catch (error) {
// Rollback transaction
await transaction.rollback();
throw error;
}
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
} finally {
await client.quit();
}
}
Advantages:
- More precise invalidation based on actual data relationships
- Can help maintain consistency across related data
- Reduces unnecessary cache invalidations
- Works well with complex data models
Disadvantages:
- Requires understanding of data relationships
- More complex to implement and maintain
- May still miss some indirect relationships
- Can be harder to debug when issues occur
Query Invalidation
Track which database tables or objects are involved in each query, and invalidate cached results when those tables change.
// Query invalidation system
class CacheInvalidator {
constructor(redisClient) {
this.client = redisClient;
}
// Track which tables are accessed by a query
async trackQuery(queryId, tables) {
// Store mapping from tables to queries
for (const table of tables) {
await this.client.sAdd(`table:${table}:queries`, queryId);
}
// Store mapping from query to tables
await this.client.sAdd(`query:${queryId}:tables`, ...tables);
}
// Invalidate all queries affected by a table update
async invalidateTable(table) {
// Get all queries that depend on this table
const affectedQueries = await this.client.sMembers(`table:${table}:queries`);
if (affectedQueries.length === 0) {
return { invalidated: 0 };
}
// Delete all affected query results
const pipeline = this.client.multi();
for (const queryId of affectedQueries) {
pipeline.del(`cache:${queryId}`);
}
await pipeline.exec();
return { invalidated: affectedQueries.length };
}
// Helper to generate a deterministic query ID
static generateQueryId(query, params) {
const hash = crypto.createHash('md5');
hash.update(query);
hash.update(JSON.stringify(params || {}));
return hash.digest('hex');
}
}
// Usage example
async function getProductsByCategory(categoryId, options = {}) {
const client = redis.createClient();
await client.connect();
const invalidator = new CacheInvalidator(client);
// Generate a deterministic query ID
const query = 'SELECT * FROM products WHERE category_id = ? ORDER BY created_at DESC LIMIT ? OFFSET ?';
const params = [categoryId, options.limit || 20, options.offset || 0];
const queryId = CacheInvalidator.generateQueryId(query, params);
try {
// Try to get from cache
const cachedResult = await client.get(`cache:${queryId}`);
if (cachedResult) {
return JSON.parse(cachedResult);
}
// Cache miss, execute query
const results = await db.query(query, params);
// Store in cache
await client.set(`cache:${queryId}`, JSON.stringify(results), {
EX: 3600 // Cache for 1 hour
});
// Track which tables this query depends on
await invalidator.trackQuery(queryId, ['products', 'categories']);
return results;
} finally {
await client.quit();
}
}
// When updating a product
async function updateProduct(productId, data) {
// Update database
await db.query('UPDATE products SET ... WHERE id = ?', [...Object.values(data), productId]);
// Invalidate all queries that depend on the products table
const client = redis.createClient();
await client.connect();
const invalidator = new CacheInvalidator(client);
const result = await invalidator.invalidateTable('products');
console.log(`Invalidated ${result.invalidated} cached queries`);
await client.quit();
return { success: true };
}
Advantages:
- Automatically invalidates all affected queries
- Reduces the need for manual invalidation logic
- Can handle complex query dependencies
- Good for applications with many different query patterns
Disadvantages:
- More overhead for tracking query dependencies
- Can lead to over-invalidation (table-level granularity)
- Complex to implement correctly
- May require database-specific integration
Hybrid Approaches
Most real-world applications use a combination of the above strategies, tailored to their specific requirements.
Versioned Cache Keys
Instead of invalidating cache entries, use versioned keys that change when the data changes.
// Versioned cache keys implementation
class VersionedCache {
constructor(redisClient) {
this.client = redisClient;
}
// Get a versioned key
async getVersionedKey(baseKey) {
// Get the current version for this key
const version = await this.client.get(`version:${baseKey}`) || '1';
return `${baseKey}:v${version}`;
}
// Increment the version
async incrementVersion(baseKey) {
// Increment the version number
return this.client.incr(`version:${baseKey}`);
}
// Get cache data with versioned key
async get(baseKey) {
const versionedKey = await this.getVersionedKey(baseKey);
const data = await this.client.get(versionedKey);
return data ? JSON.parse(data) : null;
}
// Set cache data with versioned key
async set(baseKey, data, ttl = 3600) {
const versionedKey = await this.getVersionedKey(baseKey);
await this.client.set(versionedKey, JSON.stringify(data), {
EX: ttl
});
return versionedKey;
}
// Invalidate by incrementing version
async invalidate(baseKey) {
return this.incrementVersion(baseKey);
}
}
// Usage example
async function getUserProfile(userId) {
const client = redis.createClient();
await client.connect();
const cache = new VersionedCache(client);
try {
// Try to get from cache
const baseKey = `user:${userId}:profile`;
const cachedData = await cache.get(baseKey);
if (cachedData) {
return cachedData;
}
// Cache miss, get from database
const userData = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
if (!userData) {
return null;
}
// Store in cache
await cache.set(baseKey, userData);
return userData;
} finally {
await client.quit();
}
}
// When updating a user
async function updateUserProfile(userId, data) {
const client = redis.createClient();
await client.connect();
try {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ? WHERE id = ?',
[data.name, data.email, userId]
);
// Invalidate by incrementing version
const cache = new VersionedCache(client);
const newVersion = await cache.invalidate(`user:${userId}:profile`);
console.log(`Invalidated user ${userId} profile, new version: ${newVersion}`);
return { success: true };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
} finally {
await client.quit();
}
}
Advantages:
- No need to delete cache entries
- Old versions expire naturally via TTL
- Less chance of race conditions
- Can track version history if needed
Disadvantages:
- Requires additional storage for version numbers
- Slightly more complex implementation
- Can lead to cache bloat with many versions
- May need periodic cleanup of old versions
Cache Stamping
Store a timestamp or version with each cached item and verify it's still valid before using it.
// Cache stamping implementation
async function getUserWithStamp(userId) {
const client = redis.createClient();
await client.connect();
try {
// Get latest timestamp from database
const latestStamp = await db.query(
'SELECT updated_at FROM users WHERE id = ? LIMIT 1',
[userId]
).then(result => result ? result.updated_at.getTime() : 0);
// Try to get from cache
const cacheKey = `user:${userId}`;
const cachedData = await client.get(cacheKey);
if (cachedData) {
const parsed = JSON.parse(cachedData);
// Check if cache is still valid
if (parsed._timestamp >= latestStamp) {
return parsed;
}
console.log('Cache stamp outdated, refreshing');
}
// Cache miss or outdated, get from database
const userData = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
if (!userData) {
return null;
}
// Add timestamp and store in cache
userData._timestamp = latestStamp;
await client.set(cacheKey, JSON.stringify(userData), {
EX: 3600 // Cache for 1 hour
});
return userData;
} finally {
await client.quit();
}
}
// When updating a user
async function updateUserWithStamp(userId, data) {
try {
// Update database with new timestamp
const now = new Date();
await db.query(
'UPDATE users SET name = ?, email = ?, updated_at = ? WHERE id = ?',
[data.name, data.email, now, userId]
);
return { success: true, timestamp: now.getTime() };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
}
}
Advantages:
- Self-validating cache entries
- Works well with distributed systems
- Can avoid explicit invalidation in many cases
- Natural handling of concurrent updates
Disadvantages:
- Requires additional database queries for validation
- Slightly more complex implementation
- Can be less efficient for frequently changing data
- Requires a reliable timestamp or version mechanism
Real-World Cache Invalidation Scenarios
Microservice Architectures
In microservice architectures, cache invalidation becomes more challenging because different services may update data independently.
graph TD
A[User Service] --> B[User Cache]
C[Order Service] --> D[Order Cache]
E[Product Service] --> F[Product Cache]
G[API Gateway] --> B
G --> D
G --> F
H[Event Bus] --> A
H --> C
H --> E
Event-Driven Invalidation
Use an event bus to publish data change events that trigger cache invalidation.
// Using an event bus (e.g., RabbitMQ, Kafka) for cache invalidation
const amqp = require('amqplib');
// Publisher service
async function updateUser(userId, userData) {
try {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ? WHERE id = ?',
[userData.name, userData.email, userId]
);
// Publish event
const connection = await amqp.connect('amqp://localhost');
const channel = await connection.createChannel();
await channel.assertExchange('data-changes', 'topic', { durable: true });
const event = {
type: 'user.updated',
userId,
timestamp: Date.now()
};
channel.publish(
'data-changes',
'user.updated',
Buffer.from(JSON.stringify(event))
);
await channel.close();
await connection.close();
return { success: true };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
}
}
// Subscriber service (Cache Manager)
async function startCacheInvalidator() {
const connection = await amqp.connect('amqp://localhost');
const channel = await connection.createChannel();
await channel.assertExchange('data-changes', 'topic', { durable: true });
const { queue } = await channel.assertQueue('cache-invalidator', {
exclusive: false
});
// Bind to relevant events
await channel.bindQueue(queue, 'data-changes', 'user.#');
await channel.bindQueue(queue, 'data-changes', 'order.#');
await channel.bindQueue(queue, 'data-changes', 'product.#');
console.log('Cache invalidator listening for events');
channel.consume(queue, async (msg) => {
if (!msg) return;
try {
const event = JSON.parse(msg.content.toString());
console.log('Received event:', event);
// Connect to Redis
const client = redis.createClient();
await client.connect();
// Handle different event types
switch (event.type) {
case 'user.updated':
await client.del(`user:${event.userId}`);
await client.del(`user:${event.userId}:profile`);
console.log(`Invalidated cache for user ${event.userId}`);
break;
case 'order.created':
case 'order.updated':
await client.del(`order:${event.orderId}`);
// Also invalidate user's orders list
await client.del(`user:${event.userId}:orders`);
console.log(`Invalidated cache for order ${event.orderId}`);
break;
case 'product.updated':
await client.del(`product:${event.productId}`);
// Also invalidate category product lists
await client.del(`category:${event.categoryId}:products`);
console.log(`Invalidated cache for product ${event.productId}`);
break;
}
await client.quit();
// Acknowledge message
channel.ack(msg);
} catch (error) {
console.error('Error processing event:', error);
// Nack message for requeue
channel.nack(msg);
}
});
}
startCacheInvalidator().catch(console.error);
Multi-Region Deployment
In multi-region deployments, cache invalidation must be coordinated across different geographical locations.
Global Cache Invalidation Service
Use a centralized service to coordinate cache invalidation across regions.
// Using Redis Pub/Sub for cross-region invalidation
const redis = require('redis');
class GlobalCacheInvalidator {
constructor(options = {}) {
// Create publisher client
this.publisher = redis.createClient({
url: options.redisUrl || 'redis://localhost:6379'
});
// Create subscriber client
this.subscriber = this.publisher.duplicate();
// Connect both clients
Promise.all([
this.publisher.connect(),
this.subscriber.connect()
]).then(() => {
console.log('Connected to Redis');
}).catch(err => {
console.error('Redis connection error:', err);
});
// The channel for invalidation events
this.channel = options.channel || 'cache:invalidations';
// Region identifier
this.region = options.region || 'default';
// Local cache client
this.cacheClient = options.cacheClient || redis.createClient({
url: options.cacheRedisUrl || 'redis://localhost:6379'
});
this.cacheClient.connect().catch(console.error);
}
// Start listening for invalidation events
async startListening() {
await this.subscriber.subscribe(this.channel, (message, channel) => {
try {
const event = JSON.parse(message);
// Skip events from this region (we already handled them)
if (event.sourceRegion === this.region) {
return;
}
console.log(`Received invalidation from ${event.sourceRegion}:`, event);
// Process the invalidation
this.processInvalidation(event);
} catch (error) {
console.error('Error processing invalidation event:', error);
}
});
console.log(`Listening for invalidations on channel: ${this.channel}`);
}
// Process an invalidation event
async processInvalidation(event) {
switch (event.type) {
case 'key':
// Invalidate a specific key
await this.cacheClient.del(event.key);
console.log(`Invalidated key: ${event.key}`);
break;
case 'pattern':
// Invalidate keys matching a pattern
const keys = await this.cacheClient.keys(event.pattern);
if (keys.length > 0) {
await this.cacheClient.del(keys);
console.log(`Invalidated ${keys.length} keys matching pattern: ${event.pattern}`);
}
break;
case 'tag':
// Invalidate keys with a specific tag
// Requires an implementation of tagged cache keys
console.log(`Tag invalidation not implemented yet: ${event.tag}`);
break;
}
}
// Publish an invalidation event
async invalidate(type, target) {
const event = {
type,
sourceRegion: this.region,
timestamp: Date.now()
};
// Add the appropriate target property
switch (type) {
case 'key':
event.key = target;
break;
case 'pattern':
event.pattern = target;
break;
case 'tag':
event.tag = target;
break;
}
// Publish the event
await this.publisher.publish(this.channel, JSON.stringify(event));
// Also process locally
await this.processInvalidation(event);
return true;
}
// Convenience methods
async invalidateKey(key) {
return this.invalidate('key', key);
}
async invalidatePattern(pattern) {
return this.invalidate('pattern', pattern);
}
async invalidateTag(tag) {
return this.invalidate('tag', tag);
}
// Clean up resources
async close() {
await this.publisher.quit();
await this.subscriber.quit();
await this.cacheClient.quit();
}
}
// Usage example
async function updateUserGlobally(userId, userData) {
try {
// Update database
await db.query(
'UPDATE users SET name = ?, email = ? WHERE id = ?',
[userData.name, userData.email, userId]
);
// Invalidate cache across all regions
const invalidator = new GlobalCacheInvalidator({
region: 'us-east-1'
});
await invalidator.invalidatePattern(`user:${userId}:*`);
await invalidator.close();
return { success: true };
} catch (error) {
console.error('Update failed:', error);
return { success: false, error: error.message };
}
}
High-Traffic Applications
For high-traffic applications, inefficient cache invalidation can lead to "thundering herd" problems.
Staggered Invalidation
Invalidate cache entries gradually to prevent all clients from hitting the database simultaneously.
// Staggered invalidation implementation
class StaggeredInvalidator {
constructor(redisClient) {
this.client = redisClient;
}
// Add a soft invalidation (probabilistic expiration)
async softInvalidate(pattern, durationMs = 60000) {
// Find keys matching the pattern
const keys = await this.client.keys(pattern);
if (keys.length === 0) {
return { invalidated: 0 };
}
const startTime = Date.now();
const endTime = startTime + durationMs;
// For each key, set a special "invalidation" flag
// with the end time of the invalidation window
const pipeline = this.client.multi();
for (const key of keys) {
pipeline.hSet(`${key}:meta`, 'invalidating', endTime.toString());
}
await pipeline.exec();
return { invalidated: keys.length, endTime };
}
// Check if a key is being invalidated
async isInvalidating(key) {
const invalidatingUntil = await this.client.hGet(`${key}:meta`, 'invalidating');
if (!invalidatingUntil) {
return false;
}
const endTime = parseInt(invalidatingUntil, 10);
const now = Date.now();
// Check if we're still in the invalidation window
return now < endTime;
}
// Decide whether to use cache based on how far we are in the invalidation window
async shouldUseCache(key) {
const invalidatingUntil = await this.client.hGet(`${key}:meta`, 'invalidating');
if (!invalidatingUntil) {
return true; // Not being invalidated, use cache normally
}
const endTime = parseInt(invalidatingUntil, 10);
const now = Date.now();
// If invalidation period is over, clear the flag and use cache
if (now >= endTime) {
await this.client.hDel(`${key}:meta`, 'invalidating');
return true;
}
// Calculate how far we are in the invalidation window (0.0 to 1.0)
const startTime = endTime - 60000; // Assuming 1-minute window
const progress = (now - startTime) / (endTime - startTime);
// Probability increases as we get closer to the end of the window
// This creates a gradual ramp-up of cache refreshes
return Math.random() > progress;
}
}
// Usage example with cache-aside pattern
async function getProductWithStaggered(productId) {
const client = redis.createClient();
await client.connect();
const invalidator = new StaggeredInvalidator(client);
try {
const cacheKey = `product:${productId}`;
// Check if we should use the cache
const useCache = await invalidator.shouldUseCache(cacheKey);
if (useCache) {
// Try to get from cache
const cachedData = await client.get(cacheKey);
if (cachedData) {
return JSON.parse(cachedData);
}
}
// Cache miss or bypassing cache during invalidation
console.log('Getting fresh data from database');
const productData = await db.query(
'SELECT * FROM products WHERE id = ?',
[productId]
);
if (!productData) {
return null;
}
// Update cache
await client.set(cacheKey, JSON.stringify(productData), {
EX: 3600 // Cache for 1 hour
});
return productData;
} finally {
await client.quit();
}
}
// When updating a product (e.g., during a flash sale)
async function prepareProductUpdate(productId) {
const client = redis.createClient();
await client.connect();
try {
// Start a 1-minute staggered invalidation before the update
const invalidator = new StaggeredInvalidator(client);
await invalidator.softInvalidate(`product:${productId}`, 60000);
console.log(`Started staggered invalidation for product ${productId}`);
return { success: true };
} finally {
await client.quit();
}
}
Best Practices for Cache Invalidation
Design Principles
- Choose the Right Invalidation Strategy: Select the most appropriate strategy based on your consistency requirements.
- Be Pessimistic: When in doubt, invalidate the cache. It's better to have a cache miss than to serve stale data.
- Use TTLs as a Safety Net: Even with active invalidation, set TTLs to prevent indefinitely stale data.
- Keep Invalidation Logic Close to Data Mutation: The code that updates data should also be responsible for invalidation.
- Consider Partial Updates: For large objects, consider updating just the changed portions rather than invalidating the entire object.
Implementation Tips
- Use Consistent Key Naming: Establish naming conventions that make relationships between keys clear.
- Batch Invalidations: When invalidating multiple keys, use multi/exec or pipeline operations.
- Log Invalidation Events: Keep track of cache invalidations to help debug issues.
- Monitor Cache Hit Rates: A sudden drop in hit rate may indicate over-invalidation.
- Test Invalidation Paths: Explicitly test cache invalidation in your test suite.
Common Pitfalls
- Race Conditions: Be aware of potential race conditions between reads, writes, and invalidations.
- Over-Invalidation: Invalidating too much data reduces the effectiveness of your cache.
- Under-Invalidation: Missing invalidation paths leads to stale data.
- Thundering Herd: Be careful about invalidating popular cache keys simultaneously.
- Forgetting about Related Data: Remember to invalidate derived or aggregated data.
Operational Considerations
- Cache Warmup: Consider pre-populating caches after invalidation for critical data.
- Emergency Purge: Have a mechanism to purge the entire cache in emergencies.
- Monitoring and Alerting: Set up monitoring for cache size, hit rates, and invalidation patterns.
- Fallback Strategies: Implement graceful degradation if cache service is unavailable.
- Documentation: Document cache invalidation strategies for each data type.
Conclusion
Cache invalidation is one of the most challenging aspects of distributed systems, but with the right patterns and strategies, it can be managed effectively. The key is to understand your application's consistency requirements and to choose the appropriate invalidation approach for each type of data.
Remember that there's no one-size-fits-all solution. Most applications will use a combination of approaches:
- Time-based expiration for infrequently accessed data
- Active invalidation for frequently accessed data
- Event-based propagation for distributed systems
- Versioning or stamping for complex data models
By applying the principles and patterns we've covered in this lecture, you'll be better equipped to build high-performance, consistent, and reliable applications that leverage the power of caching without falling into its pitfalls.
Further Reading
Practical Exercises
Exercise 1: Implement Cache-Aside Pattern
Implement a cache-aside pattern for a product catalog API. Create endpoints for:
- Getting a product by ID
- Listing products by category
- Updating a product
Ensure that the cache is properly invalidated when products are updated.
Exercise 2: Event-Based Invalidation
Create a simple event bus using Redis Pub/Sub. Implement a system where multiple services can publish data change events, and a cache invalidation service subscribes to these events to keep the cache consistent.
Exercise 3: Versioned Cache Keys
Implement the versioned cache keys pattern for a user profile system. Ensure that profile views always see the latest data without explicitly deleting cache entries.
Exercise 4: Benchmark Different Strategies
Create a benchmark to compare the performance of different cache invalidation strategies under various workloads. Consider factors like:
- Read-heavy vs. write-heavy workloads
- Low vs. high concurrency
- Cold vs. warm cache scenarios
Exercise 5: Cache Invalidation Dashboard
Create a simple dashboard that monitors cache invalidation events and displays metrics like:
- Cache hit rate over time
- Number of invalidations per minute
- Top invalidated keys or patterns
- Cache size and memory usage