Back
AWS

Monitoring and Logging with CloudWatch on AWS

Learn how to monitor your AWS infrastructure with CloudWatch. Metrics, Log Groups, Log Streams, SNS alarms, custom dashboards, CloudWatch Insights, EC2, Lambda, and RDS monitoring, custom metrics, and cost optimization.

Francisco ZapataWritten by Francisco Zapata
February 19, 202615 min read
Monitoring and Logging with CloudWatch on AWS

Monitoring is essential for keeping applications healthy in production. Amazon CloudWatch is the AWS observability service that collects metrics, centralizes logs, configures alarms, and lets you visualize the state of your entire infrastructure in one place. In this guide, you will learn how to set up CloudWatch to monitor EC2, Lambda, and RDS, create smart alarms, analyze logs with CloudWatch Insights, and build custom dashboards.

What Is Amazon CloudWatch?

CloudWatch is a monitoring and observability service that collects data from your AWS resources in the form of metrics, logs, and events. Every AWS service automatically sends metrics to CloudWatch, and you can add your own custom metrics as well.

Key Concepts:

  • Namespace: A container that isolates metrics from different services (e.g., AWS/EC2, AWS/Lambda)
  • Metric: A time-ordered set of data points (e.g., CPUUtilization)
  • Dimension: A name/value pair that identifies a specific metric (e.g., InstanceId=i-1234)
  • Period: The length of time to aggregate statistics (minimum 1 second)
  • Statistic: Data aggregation (Average, Sum, Minimum, Maximum, p99)

CloudWatch Metrics: What AWS Monitors for You

Every AWS service sends metrics automatically. Here are the most important ones by service:

EC2 Metrics

| Metric | Description | Resolution |

|---|---|---|

| CPUUtilization | CPU usage percentage | 5 min (basic) / 1 min (detailed) |

| NetworkIn/Out | Bytes transferred | 5 min |

| DiskReadOps | Read operations | 5 min |

| StatusCheckFailed | Health check status | 1 min |

# Enable detailed monitoring on EC2 (1 minute instead of 5)
aws ec2 monitor-instances --instance-ids i-0abcd1234efgh5678

# Query CPUUtilization for an instance
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abcd1234efgh5678 \
  --start-time 2026-02-19T00:00:00Z \
  --end-time 2026-02-19T23:59:59Z \
  --period 3600 \
  --statistics Average Maximum

Lambda Metrics

| Metric | Description |

|---|---|

| Invocations | Number of invocations |

| Duration | Execution time in ms |

| Errors | Invocations with errors |

| Throttles | Throttled invocations |

| ConcurrentExecutions | Simultaneous executions |

| ColdStarts | Cold start occurrences |

RDS Metrics

| Metric | Description |

|---|---|

| DatabaseConnections | Active connections |

| FreeStorageSpace | Available storage |

| ReadIOPS / WriteIOPS | I/O operations per second |

| CPUUtilization | CPU usage |

| FreeableMemory | Available memory |

CloudWatch Logs: Centralize Your Records

CloudWatch Logs organizes records in a two-level hierarchy:

  • Log Group: The main container (e.g., /aws/lambda/my-function)
  • Log Stream: An individual flow within the group (e.g., a specific instance or invocation)

Sending Logs from a Node.js Application

// logger.js
import {
  CloudWatchLogsClient,
  CreateLogGroupCommand,
  CreateLogStreamCommand,
  PutLogEventsCommand
} from '@aws-sdk/client-cloudwatch-logs';

const client = new CloudWatchLogsClient({ region: 'us-east-1' });
const LOG_GROUP = '/app/my-application/production';
const LOG_STREAM = `stream-${new Date().toISOString().split('T')[0]}`;

async function initializeLogs() {
  try {
    await client.send(new CreateLogGroupCommand({
      logGroupName: LOG_GROUP
    }));
  } catch (e) {
    // Group already exists, continue
  }

  await client.send(new CreateLogStreamCommand({
    logGroupName: LOG_GROUP,
    logStreamName: LOG_STREAM
  }));
}

export async function logEvent(message, level = 'INFO') {
  const command = new PutLogEventsCommand({
    logGroupName: LOG_GROUP,
    logStreamName: LOG_STREAM,
    logEvents: [{
      timestamp: Date.now(),
      message: JSON.stringify({
        level,
        message,
        timestamp: new Date().toISOString()
      })
    }]
  });
  await client.send(command);
}

Configuring Log Retention

By default, CloudWatch Logs retains records indefinitely, which can drive up costs. Set a retention policy:

# Set 30-day retention
aws logs put-retention-policy \
  --log-group-name /aws/lambda/my-function \
  --retention-in-days 30

# Available options: 1, 3, 5, 7, 14, 30, 60, 90,
# 120, 150, 180, 365, 400, 545, 731, 1096, 1827, 2192, 2557, 2922, 3653

Alarms with SNS: Real-Time Alerts

CloudWatch alarms watch a metric and trigger actions when a threshold is crossed. Combined with SNS (Simple Notification Service), you can receive alerts via email, SMS, or webhook:

# Create an SNS topic for notifications
aws sns create-topic --name production-alerts

# Subscribe an email to the topic
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:production-alerts \
  --protocol email \
  --notification-endpoint your-email@example.com

# Create a high CPU alarm for EC2
aws cloudwatch put-metric-alarm \
  --alarm-name "ec2-high-cpu" \
  --alarm-description "CPU above 80% for 5 minutes" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts \
  --dimensions Name=InstanceId,Value=i-0abcd1234efgh5678

# Create a Lambda error alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "lambda-errors" \
  --alarm-description "More than 10 errors in 5 minutes" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts \
  --dimensions Name=FunctionName,Value=my-products-api

# Create an RDS low storage alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "rds-low-storage" \
  --alarm-description "Less than 5 GB of free storage" \
  --metric-name FreeStorageSpace \
  --namespace AWS/RDS \
  --statistic Average \
  --period 300 \
  --threshold 5368709120 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts \
  --dimensions Name=DBInstanceIdentifier,Value=my-database

CloudWatch Insights: Advanced Log Analysis

CloudWatch Logs Insights lets you query and analyze logs with a purpose-built query language, similar to SQL:

# Query recent Lambda errors
aws logs start-query \
  --log-group-name /aws/lambda/my-products-api \
  --start-time $(date -d "1 hour ago" +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /ERROR/
    | sort @timestamp desc
    | limit 50
  '

Useful Insights query examples:

# Top 10 slowest Lambda functions
fields @timestamp, @duration, @requestId
| filter @type = "REPORT"
| sort @duration desc
| limit 10

# Error count by hour
fields @timestamp, @message
| filter @message like /ERROR/
| stats count(*) as errors by bin(1h)

# 99th percentile duration
fields @duration
| filter @type = "REPORT"
| stats avg(@duration) as average,
        max(@duration) as maximum,
        pct(@duration, 99) as p99

# Find cold starts
fields @timestamp, @duration, @initDuration
| filter ispresent(@initDuration)
| sort @initDuration desc
| limit 20

Custom Dashboards

CloudWatch dashboards let you visualize metrics from multiple services in a single pane of glass:

# Create a dashboard
aws cloudwatch put-dashboard \
  --dashboard-name "production-overview" \
  --dashboard-body '{
  "widgets": [
    {
      "type": "metric",
      "x": 0, "y": 0,
      "width": 12, "height": 6,
      "properties": {
        "title": "EC2 CPU Utilization",
        "metrics": [
          ["AWS/EC2", "CPUUtilization", "InstanceId", "i-0abcd1234efgh5678"]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1"
      }
    },
    {
      "type": "metric",
      "x": 12, "y": 0,
      "width": 12, "height": 6,
      "properties": {
        "title": "Lambda Invocations & Errors",
        "metrics": [
          ["AWS/Lambda", "Invocations", "FunctionName", "my-products-api"],
          ["AWS/Lambda", "Errors", "FunctionName", "my-products-api"]
        ],
        "period": 300,
        "stat": "Sum",
        "region": "us-east-1"
      }
    },
    {
      "type": "log",
      "x": 0, "y": 6,
      "width": 24, "height": 6,
      "properties": {
        "title": "Recent Errors",
        "query": "fields @timestamp, @message\n| filter @message like /ERROR/\n| sort @timestamp desc\n| limit 20",
        "region": "us-east-1",
        "stacked": false,
        "view": "table"
      }
    }
  ]
}'

Custom Metrics

You can publish your own metrics to CloudWatch to monitor business logic:

// custom-metrics.js
import {
  CloudWatchClient,
  PutMetricDataCommand
} from '@aws-sdk/client-cloudwatch';

const client = new CloudWatchClient({ region: 'us-east-1' });

export async function publishMetric(metricName, value, unit = 'Count') {
  const command = new PutMetricDataCommand({
    Namespace: 'MyApplication/Production',
    MetricData: [{
      MetricName: metricName,
      Value: value,
      Unit: unit,
      Timestamp: new Date(),
      Dimensions: [
        {
          Name: 'Environment',
          Value: 'production'
        },
        {
          Name: 'Service',
          Value: 'products-api'
        }
      ]
    }]
  });
  await client.send(command);
}

// Usage in your application
await publishMetric('OrdersProcessed', 1);
await publishMetric('PaymentAmount', 49.99, 'None');
await publishMetric('ResponseTime', 234, 'Milliseconds');
# Publish a custom metric from the CLI
aws cloudwatch put-metric-data \
  --namespace "MyApplication/Production" \
  --metric-name "OrdersProcessed" \
  --value 1 \
  --unit Count \
  --dimensions Environment=production,Service=products-api

Metrics Retention

CloudWatch retains metrics with the following policy:

| Resolution | Retention |

|---|---|

| < 60 seconds (high resolution) | 3 hours |

| 60 seconds (1 minute) | 15 days |

| 300 seconds (5 minutes) | 63 days |

| 3600 seconds (1 hour) | 455 days (15 months) |

Metrics that receive no new data for 15 months are automatically deleted.

CloudWatch Pricing

| Resource | Free Tier | Additional Cost |

|---|---|---|

| Basic metrics | 10 detailed metrics | $0.30/metric/month |

| Custom metrics | 10 metrics | $0.30/metric/month |

| Dashboards | 3 dashboards (50 metrics) | $3.00/dashboard/month |

| Alarms | 10 alarms | $0.10/alarm/month |

| Logs ingestion | 5 GB | $0.50/GB |

| Logs storage | 5 GB | $0.03/GB/month |

| Logs Insights | Not included | $0.005/GB scanned |

| API GetMetricData | 1 million requests | $0.01/1,000 requests |

Tip: Set log retention aggressively and use subscription filters to forward only relevant logs. This can significantly reduce your costs.

Conclusion

CloudWatch is the observability hub of AWS. With automatic metrics, centralized logs, smart alarms, and visual dashboards, you have everything you need to monitor your infrastructure effectively. Pair CloudWatch with SNS for proactive alerting and use Insights for deep log analysis. The key is to set alarms on the metrics that truly matter and establish log retention policies that balance visibility with cost.

Comments (0)

Leave a comment

Be the first to comment