Introduction to Scheduled Tasks
In modern web applications, certain operations need to run automatically at specific times without user intervention. These scheduled tasks (also called cron jobs, background jobs, or scheduled jobs) are crucial for maintaining application health, processing data regularly, and automating routine operations.
Think of scheduled tasks like the automatic systems in your home. Your heating system runs on a schedule, your dishwasher might run at night, and your smartphone backs up while you sleep. Similarly, web applications need automated processes that run at predetermined times to handle various maintenance and operational tasks.
Common Use Cases for Scheduled Tasks
Database Maintenance
- Removing old or temporary records
- Creating database backups
- Optimizing database performance (rebuilding indexes, etc.)
- Archiving historical data
Data Processing
- Aggregating analytics data
- Processing uploaded files or data feeds
- Training or updating machine learning models
- ETL (Extract, Transform, Load) operations
User Engagement
- Sending daily/weekly newsletters
- Generating "digest" emails
- Re-engagement campaigns for inactive users
- Reminder notifications
Business Operations
- Generating monthly invoices
- Processing recurring payments
- Generating business reports
- Updating product inventory or prices
System Health
- Checking service availability
- Monitoring system resources
- Rotating log files
- Cleaning up temporary files
Real-World Example: E-commerce Platform
In an e-commerce application, you might schedule the following tasks:
- Every 5 minutes: Check inventory levels and alert when stock is low
- Every hour: Update product popularity rankings based on recent views and purchases
- Daily at midnight: Send order fulfillment reports to warehouse
- Every Monday at 8 AM: Send weekly sales summary to management
- First day of month at 2 AM: Generate and send monthly invoices to vendors
- Every 3 months: Archive old order data to cold storage
Understanding Cron
What is Cron?
Cron is a time-based job scheduler in Unix-like operating systems. It enables users to schedule commands or scripts to run automatically at specified dates and times. The name "cron" comes from the Greek word for time, "chronos."
In modern web development, we often use cron-like systems that follow similar principles but are implemented within our application stack rather than at the operating system level.
Cron Syntax
A cron expression is a string representing a schedule. It consists of five or six fields separated by spaces:
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
│ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
│ │ │ │ │
* * * * * [command to execute]
Cron Expression Examples
| Expression | Meaning |
|---|---|
* * * * * |
Every minute |
0 * * * * |
Every hour (at minute 0) |
0 0 * * * |
Every day at midnight (00:00) |
0 0 * * 0 |
Every Sunday at midnight |
0 9 * * 1-5 |
Every weekday at 9 AM |
0 0 1 * * |
First day of every month at midnight |
*/15 * * * * |
Every 15 minutes |
0 12 * * MON |
Every Monday at noon |
Special Characters in Cron Expressions
*- Matches any value (wildcard),- Separates multiple values (e.g.,1,3,5)-- Specifies a range (e.g.,1-5)/- Specifies step values (e.g.,*/5means "every 5 units")- Month and day of week can use three-letter abbreviations (e.g.,
JAN,MON)
Common Cron Variations
Different implementations of cron may have slight variations:
- Standard cron (5 fields): The classic Unix cron format
- Extended cron (6 fields): Adds seconds as the first field
- Quartz cron (7 fields): Adds seconds and year fields
- Non-standard expressions: Some systems support
@yearly,@monthly,@weekly,@daily,@hourly, and@rebootshortcuts
Implementing Scheduled Tasks in Node.js
Node.js Libraries for Scheduled Tasks
Several libraries can help implement cron-like functionality in Node.js applications:
| Library | Description | Use Case |
|---|---|---|
| node-cron | Pure JavaScript implementation of cron for Node.js | Simple in-process scheduling |
| node-schedule | Flexible job scheduler with cron-like syntax | More complex scheduling patterns |
| Bull/Agenda | Job queue libraries with scheduling capabilities | Distributed, persistent job scheduling |
| Bree | Modern job scheduler with sandboxed worker threads | High-performance, isolated job execution |
Using node-cron
node-cron is a simple and lightweight cron-like scheduler for Node.js:
// Install with: npm install node-cron
const cron = require('node-cron');
// Schedule a task to run every day at midnight
cron.schedule('0 0 * * *', () => {
console.log('Running a task every day at midnight');
// Your job logic here
performDailyBackup();
});
// Schedule a task to run every hour
cron.schedule('0 * * * *', () => {
console.log('Running a task every hour');
// Your job logic here
checkSystemHealth();
});
Using node-schedule
node-schedule provides more flexible scheduling options:
// Install with: npm install node-schedule
const schedule = require('node-schedule');
// Schedule a job for every 5th minute of every hour
const job1 = schedule.scheduleJob('5 * * * *', function() {
console.log('Running job at 5 minutes past the hour');
// Your job logic here
});
// Schedule using Date object
const date = new Date(2025, 4, 10, 15, 30, 0);
const job2 = schedule.scheduleJob(date, function() {
console.log('Job ran at the specified date');
// One-time job logic here
});
// More complex recurrence rule
const rule = new schedule.RecurrenceRule();
rule.dayOfWeek = [0, new schedule.Range(4, 6)]; // Sunday and Thursday to Saturday
rule.hour = 17;
rule.minute = 0;
const job3 = schedule.scheduleJob(rule, function() {
console.log('Running at 5:00 PM on Sunday, Thursday, Friday, and Saturday');
// Your job logic here
});
Scheduling with Bull
Bull is a Redis-based queue system that also supports scheduled jobs:
// Install with: npm install bull
const Queue = require('bull');
const reportQueue = new Queue('report-generation');
// Schedule a job to run after a delay
reportQueue.add(
{ userId: 123, reportType: 'monthly' },
{ delay: 60 * 1000 } // Run after 1 minute
);
// Schedule a recurring job (every day at 3 AM)
reportQueue.add(
{ reportType: 'daily-summary' },
{
repeat: {
cron: '0 3 * * *' // Every day at 3 AM
}
}
);
// Process the jobs
reportQueue.process(async (job) => {
console.log(`Processing job: ${job.id}`);
const { reportType } = job.data;
switch(reportType) {
case 'monthly':
await generateMonthlyReport(job.data.userId);
break;
case 'daily-summary':
await generateDailySummary();
break;
}
return { success: true };
});
Using Bree
Bree is a modern job scheduler that runs tasks in separate threads for better performance:
// Install with: npm install bree
const Bree = require('bree');
// Initialize the scheduler
const bree = new Bree({
jobs: [
// Cron job (runs at midnight every day)
{
name: 'daily-cleanup',
cron: '0 0 * * *',
path: './jobs/cleanup.js'
},
// Interval job (runs every 5 minutes)
{
name: 'check-health',
interval: '5m',
path: './jobs/health-check.js'
},
// One-time job (runs after 30 seconds)
{
name: 'welcome-email',
timeout: '30s',
path: './jobs/welcome-email.js'
}
]
});
// Start all jobs
bree.start();
// You can also control individual jobs
bree.start('daily-cleanup');
bree.stop('check-health');
Example job file (./jobs/cleanup.js):
// This runs in its own worker thread
console.log('Starting daily cleanup task');
// Your cleanup logic here
async function performCleanup() {
// Delete temporary files
// Archive old data
// etc.
}
// Run the task and handle errors
(async () => {
try {
await performCleanup();
console.log('Cleanup completed successfully');
} catch (error) {
console.error('Cleanup failed:', error);
process.exit(1); // Exit with error
}
})();
Best Practices for Scheduled Tasks
Separate Task Logic from Scheduling
Keep your task logic separate from the scheduling mechanism. This makes your code more maintainable and easier to test.
// Good practice
// tasks.js - Task logic
const tasks = {
async dailyReport() {
// Report generation logic
},
async cleanupTempFiles() {
// Cleanup logic
}
};
module.exports = tasks;
// scheduler.js - Scheduling
const cron = require('node-cron');
const tasks = require('./tasks');
cron.schedule('0 0 * * *', tasks.dailyReport);
cron.schedule('0 2 * * *', tasks.cleanupTempFiles);
Error Handling
Always implement proper error handling in your scheduled tasks to prevent crashes.
cron.schedule('0 0 * * *', async () => {
try {
await generateDailyReport();
console.log('Daily report generated successfully');
} catch (error) {
console.error('Failed to generate daily report:', error);
// Notify administrators or log to monitoring system
await notifyAdmins('Daily report generation failed', error);
}
});
Logging and Monitoring
Implement comprehensive logging to track execution and diagnose issues:
cron.schedule('0 0 * * *', async () => {
console.log(`[${new Date().toISOString()}] Starting daily backup`);
try {
const startTime = Date.now();
await performBackup();
const duration = Date.now() - startTime;
console.log(`[${new Date().toISOString()}] Backup completed successfully in ${duration}ms`);
await metrics.recordTaskExecution('daily-backup', {
success: true,
duration
});
} catch (error) {
console.error(`[${new Date().toISOString()}] Backup failed:`, error);
await metrics.recordTaskExecution('daily-backup', {
success: false,
error: error.message
});
}
});
Task Idempotence
Design tasks to be idempotent (can be run multiple times without causing problems):
// Non-idempotent example (problematic)
async function sendDailyNewsletter() {
const users = await User.findAll({ where: { subscribed: true } });
for (const user of users) {
await emailService.send(user.email, 'Daily Newsletter', template);
// If this crashes halfway through, some users get duplicate emails on retry
}
}
// Idempotent example (better)
async function sendDailyNewsletter() {
const today = new Date().toISOString().split('T')[0];
const users = await User.findAll({
where: {
subscribed: true,
// Skip users who already received today's newsletter
'$emailRecords.date$': { $ne: today }
},
include: [{
model: EmailRecord,
where: { type: 'daily-newsletter', date: today },
required: false
}]
});
for (const user of users) {
await emailService.send(user.email, 'Daily Newsletter', template);
await EmailRecord.create({
userId: user.id,
type: 'daily-newsletter',
date: today
});
}
}
Avoid Overlapping Executions
Ensure long-running tasks don't overlap:
// Using a locking mechanism
const lockKey = 'daily-report-lock';
const lockExpiry = 60 * 60; // 1 hour in seconds
cron.schedule('0 0 * * *', async () => {
// Try to acquire a lock
const locked = await redis.set(lockKey, 'locked', 'EX', lockExpiry, 'NX');
if (!locked) {
console.log('Another instance of this task is already running');
return;
}
try {
await generateDailyReport();
} catch (error) {
console.error('Error generating report:', error);
} finally {
// Release the lock
await redis.del(lockKey);
}
});
Time Zone Considerations
Be aware of time zone issues when scheduling tasks:
// Schedule a task to run at midnight in a specific timezone
cron.schedule('0 0 * * *', () => {
console.log('Running at midnight in the specified timezone');
}, {
scheduled: true,
timezone: "America/New_York"
});
Deployment Considerations
In-Process vs Out-of-Process Scheduling
In-Process Scheduling
Pros:
- Simple to implement
- No additional infrastructure required
- Direct access to application context
Cons:
- Can impact application performance
- Tasks stop if the app crashes
- Scaling issues (tasks run on every instance)
- Memory leaks affect the main application
Out-of-Process Scheduling
Pros:
- Better isolation and reliability
- Won't affect main application performance
- Can scale independently
- Tasks continue even if web server restarts
Cons:
- More complex to set up
- Requires additional infrastructure
- Potential overhead in communication
Choosing the Right Approach
| Use In-Process When: | Use Out-of-Process When: |
|---|---|
| Tasks are lightweight | Tasks are resource-intensive |
| Application has low traffic | Application has high traffic |
| Tasks are tied to app state | Task reliability is critical |
| Simple development is prioritized | System is deployed across multiple servers |
Implementation Options for Production
Option 1: Process Manager (PM2)
Use PM2 to run a separate worker process for scheduled tasks:
// ecosystem.config.js
module.exports = {
apps: [{
name: 'web-server',
script: 'server.js',
instances: 'max',
exec_mode: 'cluster'
}, {
name: 'scheduler',
script: 'scheduler.js',
instances: 1, // Only run one instance of the scheduler
exec_mode: 'fork'
}]
};
Option 2: Dedicated Cron Container in Docker
For containerized applications, use a separate container for scheduled tasks:
# docker-compose.yml
version: '3'
services:
web:
build: .
ports:
- "3000:3000"
depends_on:
- db
- redis
environment:
- NODE_ENV=production
worker:
build: .
command: node scheduler.js
depends_on:
- db
- redis
environment:
- NODE_ENV=production
db:
image: postgres:13
redis:
image: redis:6
Option 3: Specialized Job Systems
- Cloud provider solutions:
- AWS Lambda + EventBridge
- Google Cloud Scheduler + Cloud Functions
- Azure Logic Apps
- Specialized job platforms:
- Apache Airflow
- Jenkins
- Prefect
Example AWS CloudFormation template for a scheduled Lambda function:
Resources:
ScheduledFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./functions/
Handler: scheduled-task.handler
Runtime: nodejs16.x
Events:
DailyAt3AM:
Type: Schedule
Properties:
Schedule: cron(0 3 * * ? *)
Enabled: true
Advanced Patterns
Distributed Scheduling
When running multiple instances of your application, you need to ensure scheduled tasks aren't duplicated.
One approach is using a distributed lock with Redis:
const schedule = require('node-schedule');
const Redis = require('ioredis');
const redis = new Redis();
async function scheduleTasks() {
// Try to acquire a leadership lock
const result = await redis.set('scheduler-leader', process.env.HOSTNAME, 'NX', 'EX', 30);
const isLeader = result === 'OK';
if (isLeader) {
console.log('This instance is the scheduling leader');
// Refresh the lock periodically
const interval = setInterval(async () => {
await redis.expire('scheduler-leader', 30);
}, 10000);
// Set up schedules
const dailyReport = schedule.scheduleJob('0 0 * * *', generateDailyReport);
// Handle shutdown
process.on('SIGTERM', async () => {
clearInterval(interval);
await redis.del('scheduler-leader');
dailyReport.cancel();
process.exit(0);
});
} else {
console.log('Another instance is the scheduling leader');
// Check periodically if the leader has failed
setInterval(async () => {
const leaderExists = await redis.exists('scheduler-leader');
if (!leaderExists) {
// Leader has failed, try to become the new leader
scheduleTasks();
}
}, 15000);
}
}
scheduleTasks();
Dynamic Scheduling
Sometimes you need to create or modify schedules dynamically based on user preferences or business rules:
// Store jobs in memory for easy access
const activeJobs = new Map();
// Cancel and reschedule a job
function updateJobSchedule(jobId, newCronExpression) {
// Cancel existing job if it exists
if (activeJobs.has(jobId)) {
activeJobs.get(jobId).cancel();
}
// Create new job with updated schedule
const job = schedule.scheduleJob(newCronExpression, async () => {
await executeJob(jobId);
});
// Store the new job
activeJobs.set(jobId, job);
return job;
}
// Example: User updates notification preferences
app.post('/api/notification-settings', async (req, res) => {
const { userId, frequency, time } = req.body;
// Translate user preferences to cron expression
let cronExpression;
switch (frequency) {
case 'daily':
// Convert time (e.g., "09:00") to cron
const [hour, minute] = time.split(':');
cronExpression = `${minute} ${hour} * * *`;
break;
case 'weekly':
cronExpression = `${minute} ${hour} * * 1`; // Mondays
break;
// other cases...
}
// Update the job schedule
updateJobSchedule(`notification-${userId}`, cronExpression);
// Save to database
await User.update({
notificationSchedule: cronExpression
}, {
where: { id: userId }
});
res.json({ success: true });
});
Task Dependencies and Workflows
For complex sequences of tasks where one depends on another, consider a workflow approach:
// Define a workflow with dependent tasks
const DailyReportingWorkflow = {
name: 'daily-reporting',
schedule: '0 1 * * *', // Every day at 1 AM
tasks: [
{
name: 'collect-data',
handler: collectData,
retryCount: 3,
retryDelay: 5 * 60 * 1000 // 5 minutes
},
{
name: 'generate-reports',
handler: generateReports,
dependencies: ['collect-data'] // This task depends on the first one
},
{
name: 'send-emails',
handler: sendEmailReports,
dependencies: ['generate-reports'] // This task depends on the second one
}
]
};
// Workflow execution engine
async function executeWorkflow(workflow) {
console.log(`Starting workflow: ${workflow.name}`);
// Track task completion
const completedTasks = new Set();
const failedTasks = new Map();
// Process tasks until all are complete or max retries exceeded
while (completedTasks.size < workflow.tasks.length) {
for (const task of workflow.tasks) {
// Skip if task is already completed
if (completedTasks.has(task.name)) continue;
// Skip if dependencies aren't satisfied
if (task.dependencies && task.dependencies.some(dep => !completedTasks.has(dep))) {
continue;
}
// Skip if already tried and failed too many times
const failCount = failedTasks.get(task.name) || 0;
if (failCount >= (task.retryCount || 0)) continue;
// Execute the task
try {
console.log(`Executing task: ${task.name}`);
await task.handler();
completedTasks.add(task.name);
console.log(`Task completed: ${task.name}`);
} catch (error) {
console.error(`Task failed: ${task.name}`, error);
failedTasks.set(task.name, (failedTasks.get(task.name) || 0) + 1);
// Wait before retry if specified
if (task.retryDelay) {
await new Promise(resolve => setTimeout(resolve, task.retryDelay));
}
}
}
// If no progress is being made, abort
const canProgress = workflow.tasks.some(task => {
// Task is not completed
if (completedTasks.has(task.name)) return false;
// Dependencies are satisfied
if (task.dependencies && task.dependencies.some(dep => !completedTasks.has(dep))) {
return false;
}
// Has retries left
const failCount = failedTasks.get(task.name) || 0;
return failCount < (task.retryCount || 0);
});
if (!canProgress) break;
}
// Report workflow completion status
const allCompleted = completedTasks.size === workflow.tasks.length;
console.log(`Workflow ${workflow.name} ${allCompleted ? 'completed' : 'failed'}`);
return allCompleted;
}
// Schedule the workflow
cron.schedule(DailyReportingWorkflow.schedule, () => {
executeWorkflow(DailyReportingWorkflow);
});
Case Study: Building a Newsletter System
Let's build a practical example of a scheduled task system for sending newsletters to subscribers. This system includes:
- Daily digest emails sent at specific times
- Weekly newsletters sent on Mondays
- Retry mechanism for failed email deliveries
- Monitoring and reporting
Project Structure
newsletter-system/
├── server.js # Express server for UI and API
├── scheduler.js # Main scheduler process
├── tasks/
│ ├── dailyDigest.js # Daily email task
│ ├── weeklyNewsletter.js # Weekly newsletter task
│ └── utils.js # Shared utilities
├── models/
│ ├── user.js # User model
│ └── emailLog.js # Email delivery logging
├── services/
│ ├── emailService.js # Email sending service
│ └── contentService.js # Content generation service
└── monitoring/
├── logger.js # Logging utility
└── metrics.js # Metrics collection
Initialize the Scheduler
// scheduler.js
const cron = require('node-cron');
const Redis = require('ioredis');
const redis = new Redis();
const logger = require('./monitoring/logger');
const metrics = require('./monitoring/metrics');
const dailyDigest = require('./tasks/dailyDigest');
const weeklyNewsletter = require('./tasks/weeklyNewsletter');
// Acquire a lock to prevent duplicate scheduling
async function initialize() {
const lockAcquired = await redis.set('newsletter-scheduler-lock', process.pid, 'NX', 'EX', 60);
if (!lockAcquired) {
logger.info('Another scheduler is already running');
return;
}
// Maintain the lock
const lockInterval = setInterval(async () => {
await redis.expire('newsletter-scheduler-lock', 60);
}, 30000);
// Handle graceful shutdown
process.on('SIGTERM', async () => {
clearInterval(lockInterval);
await redis.del('newsletter-scheduler-lock');
process.exit(0);
});
// Schedule tasks
setupSchedules();
}
function setupSchedules() {
// Daily digest at 8 AM in each time zone
const timeZones = ['America/New_York', 'Europe/London', 'Asia/Tokyo'];
timeZones.forEach(timeZone => {
cron.schedule('0 8 * * *', async () => {
logger.info(`Starting daily digest for ${timeZone}`);
const task = `daily-digest-${timeZone}`;
metrics.taskStarted(task);
try {
await dailyDigest.sendToTimeZone(timeZone);
metrics.taskCompleted(task);
} catch (error) {
logger.error(`Daily digest for ${timeZone} failed`, error);
metrics.taskFailed(task, error);
}
}, {
scheduled: true,
timezone: timeZone
});
});
// Weekly newsletter every Monday at 10 AM UTC
cron.schedule('0 10 * * 1', async () => {
logger.info('Starting weekly newsletter');
const task = 'weekly-newsletter';
metrics.taskStarted(task);
try {
await weeklyNewsletter.sendToAllSubscribers();
metrics.taskCompleted(task);
} catch (error) {
logger.error('Weekly newsletter failed', error);
metrics.taskFailed(task, error);
}
});
// Process failed emails every hour
cron.schedule('0 * * * *', async () => {
logger.info('Retrying failed emails');
const task = 'retry-failed-emails';
metrics.taskStarted(task);
try {
const retryCount = await retryFailedEmails();
logger.info(`Retried ${retryCount} failed emails`);
metrics.taskCompleted(task, { retryCount });
} catch (error) {
logger.error('Failed to retry emails', error);
metrics.taskFailed(task, error);
}
});
logger.info('All schedules have been set up');
}
// Retry mechanism for failed emails
async function retryFailedEmails() {
const emailService = require('./services/emailService');
const EmailLog = require('./models/emailLog');
// Find failed emails with retry count < 3
const failedEmails = await EmailLog.findAll({
where: {
status: 'failed',
retryCount: { $lt: 3 },
updatedAt: { $lt: new Date(Date.now() - 30 * 60 * 1000) } // 30 minutes ago
}
});
let retryCount = 0;
for (const email of failedEmails) {
try {
await emailService.sendEmail({
to: email.recipient,
subject: email.subject,
template: email.template,
data: JSON.parse(email.data)
});
// Update status
await email.update({
status: 'sent',
sentAt: new Date()
});
retryCount++;
} catch (error) {
// Update retry count
await email.update({
retryCount: email.retryCount + 1,
lastError: error.message
});
// If max retries reached, mark as permanently failed
if (email.retryCount + 1 >= 3) {
await email.update({ status: 'permanently-failed' });
}
}
}
return retryCount;
}
// Start the scheduler
initialize().catch(err => {
logger.error('Failed to initialize scheduler', err);
process.exit(1);
});
Daily Digest Task
// tasks/dailyDigest.js
const User = require('../models/user');
const EmailLog = require('../models/emailLog');
const emailService = require('../services/emailService');
const contentService = require('../services/contentService');
const logger = require('../monitoring/logger');
// Send daily digest to users in a specific time zone
async function sendToTimeZone(timeZone) {
logger.info(`Preparing daily digest for users in ${timeZone}`);
// Get content for today's digest
const content = await contentService.getDailyDigestContent();
// Find users in the specified time zone who are subscribed to daily digests
const users = await User.findAll({
where: {
timeZone,
emailPreferences: {
dailyDigest: true
},
active: true
}
});
logger.info(`Sending daily digest to ${users.length} users in ${timeZone}`);
// Process in batches to avoid memory issues
const batchSize = 100;
for (let i = 0; i < users.length; i += batchSize) {
const batch = users.slice(i, i + batchSize);
await Promise.all(batch.map(async (user) => {
try {
// Personalize content
const personalizedContent = contentService.personalizeContent(content, user);
// Send email
await emailService.sendEmail({
to: user.email,
subject: `Your Daily Digest for ${new Date().toLocaleDateString()}`,
template: 'daily-digest',
data: {
firstName: user.firstName,
content: personalizedContent,
unsubscribeUrl: `https://example.com/unsubscribe?token=${user.unsubscribeToken}`
}
});
// Log successful send
await EmailLog.create({
userId: user.id,
type: 'daily-digest',
recipient: user.email,
subject: `Your Daily Digest for ${new Date().toLocaleDateString()}`,
template: 'daily-digest',
data: JSON.stringify(personalizedContent),
status: 'sent',
sentAt: new Date()
});
} catch (error) {
logger.error(`Failed to send daily digest to ${user.email}`, error);
// Log failed send for retry
await EmailLog.create({
userId: user.id,
type: 'daily-digest',
recipient: user.email,
subject: `Your Daily Digest for ${new Date().toLocaleDateString()}`,
template: 'daily-digest',
data: JSON.stringify(content),
status: 'failed',
retryCount: 0,
lastError: error.message
});
}
}));
logger.info(`Processed batch ${i / batchSize + 1} of ${Math.ceil(users.length / batchSize)}`);
}
return users.length;
}
module.exports = {
sendToTimeZone
};
Monitoring and Metrics
// monitoring/metrics.js
const Prometheus = require('prom-client');
// Create metrics
const taskCounter = new Prometheus.Counter({
name: 'scheduler_task_total',
help: 'Count of scheduler tasks',
labelNames: ['task', 'status']
});
const taskDuration = new Prometheus.Histogram({
name: 'scheduler_task_duration_seconds',
help: 'Duration of scheduler tasks in seconds',
labelNames: ['task'],
buckets: [0.1, 0.5, 1, 2, 5, 10, 30, 60, 120, 300, 600]
});
const emailCounter = new Prometheus.Counter({
name: 'email_sent_total',
help: 'Count of emails sent',
labelNames: ['type', 'status']
});
// Track task execution
function taskStarted(taskName) {
this.startTime = Date.now();
taskCounter.inc({ task: taskName, status: 'started' });
}
function taskCompleted(taskName, data = {}) {
const duration = (Date.now() - this.startTime) / 1000;
taskCounter.inc({ task: taskName, status: 'completed' });
taskDuration.observe({ task: taskName }, duration);
// Track specific metrics based on task
if (taskName === 'daily-digest' && data.emailsSent) {
emailCounter.inc({ type: 'daily-digest', status: 'sent' }, data.emailsSent);
}
}
function taskFailed(taskName, error) {
taskCounter.inc({ task: taskName, status: 'failed' });
}
// Setup metrics endpoint for Prometheus
function setupMetricsEndpoint(app) {
app.get('/metrics', (req, res) => {
res.set('Content-Type', Prometheus.register.contentType);
res.end(Prometheus.register.metrics());
});
}
module.exports = {
taskStarted,
taskCompleted,
taskFailed,
setupMetricsEndpoint
};
Running the System
To run this newsletter system in production, you would set up:
- A PM2 configuration to run the scheduler as a separate process
- Redis for distributed locking
- A database for user data and email logs
- Prometheus for metrics collection
- Grafana for monitoring and alerting
// ecosystem.config.js
module.exports = {
apps: [{
name: 'newsletter-api',
script: 'server.js',
instances: 'max',
exec_mode: 'cluster',
env: {
NODE_ENV: 'production',
PORT: 3000
}
}, {
name: 'newsletter-scheduler',
script: 'scheduler.js',
instances: 1,
exec_mode: 'fork',
env: {
NODE_ENV: 'production'
},
restart_delay: 10000, // Wait 10s before restart
max_memory_restart: '300M'
}]
};
Conclusion and Best Practices Summary
Key Takeaways
- Scheduled tasks are essential for automating routine operations in web applications
- Choose the right scheduling approach based on task complexity, resource requirements, and reliability needs
- Implement proper error handling, logging, and monitoring for all scheduled tasks
- Consider time zone issues, especially for user-facing tasks
- For production systems, use dedicated processes or services for critical scheduled tasks
- Design tasks to be idempotent and handle retry scenarios
- Use distributed locks when running multiple instances of your application
Further Learning
Practice Exercises
Exercise 1: Basic Scheduled Task
Create a simple Node.js application that uses node-cron to schedule and execute a task every minute. The task should write the current timestamp to a file. Add proper error handling and logging.
Exercise 2: Database Cleanup
Build a scheduled task that connects to a database (MongoDB or PostgreSQL) and removes records older than 30 days from a "temporary_data" collection/table. Schedule it to run daily at midnight.
Exercise 3: User Engagement Email
Create a scheduled task that sends a "We miss you" email to users who haven't logged in for 14 days. Use Bull for job scheduling and processing, and implement a retry mechanism for failed email deliveries.
Exercise 4: Dynamically Scheduled Tasks
Build a small application with an API that allows creating, updating, and deleting scheduled tasks. Store task definitions in a database and implement a scheduler that loads and executes these tasks according to their schedules.
Exercise 5: Multi-stage Workflow
Implement a workflow system as described in the "Task Dependencies and Workflows" section. Create a workflow with at least three dependent tasks and schedule it to run daily.