AWS CloudWatch Metrics: The Complete Reference for Grafana Users

If you are serious about observability on AWS, you will spend a lot of time in CloudWatch. It is the single source of truth for service-level metrics, and it integrates directly with Grafana. The problem is that AWS documentation scatters metric information across dozens of service guides.

This article brings it all together. It explains the metrics available from CloudWatch for the most commonly used AWS services, what each metric means, and how you can use them effectively in Grafana.

How CloudWatch Metrics Work

Namespaces group metrics by service. Examples include AWS/EC2, AWS/RDS, and AWS/Lambda.
Dimensions identify the resource the metric applies to, such as InstanceId, FunctionName, or TableName.
Statistics define how values are summarised: Average, Sum, Minimum, Maximum, or percentiles like p95.
Resolution: Basic monitoring collects metrics at 5-minute intervals. Detailed monitoring enables 1-minute resolution.

EC2 Metrics (`AWS/EC2`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`CPUUtilization`	Percentage of CPU used	`InstanceId`	Percent	Spot bottlenecks, plan scaling
`DiskReadOps` / `DiskWriteOps`	Read/write operations on instance store volumes	`InstanceId`	Count	Understand IOPS demand
`DiskReadBytes` / `DiskWriteBytes`	Data read/written on instance store volumes	`InstanceId`	Bytes	Analyse throughput patterns
`StatusCheckFailed`	Combined instance/system health	`InstanceId`	Count (0/1)	Alert if >0
`StatusCheckFailed_Instance`	Instance-level failure	`InstanceId`	Count (0/1)	Debug configuration/OS issues
`StatusCheckFailed_System`	AWS hardware failure	`InstanceId`	Count (0/1)	Indicates AWS-side issue
`CPUCreditUsage`	CPU credits consumed (T2/T3/T4)	`InstanceId`	Credits	Track burst usage
`CPUCreditBalance`	Remaining CPU credits	`InstanceId`	Credits	Alert when balance low

EBS Metrics (`AWS/EBS`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`VolumeReadOps` / `VolumeWriteOps`	IOPS per volume	`VolumeId`	Count/sec	Monitor storage demand
`VolumeReadBytes` / `VolumeWriteBytes`	Data throughput	`VolumeId`	Bytes/sec	Detect throughput-heavy workloads
`VolumeTotalReadTime` / `VolumeTotalWriteTime`	Total latency per operation	`VolumeId`	Seconds	Spot performance bottlenecks
`VolumeIdleTime`	Time volume spent idle	`VolumeId`	Seconds	Useful for cost tuning
`VolumeQueueLength`	Outstanding I/O requests	`VolumeId`	Count	High queues = contention risk

RDS Metrics (`AWS/RDS`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`CPUUtilization`	CPU used by DB instance	`DBInstanceIdentifier`	Percent	High CPU = scaling/queries issue
`DatabaseConnections`	Number of DB connections	`DBInstanceIdentifier`	Count	Alert when near max connections
`FreeableMemory`	Available RAM	`DBInstanceIdentifier`	Bytes	Low memory = poor performance
`FreeStorageSpace`	Disk capacity remaining	`DBInstanceIdentifier`	Bytes	Critical to avoid outages
`ReadIOPS` / `WriteIOPS`	Reads/writes per second	`DBInstanceIdentifier`	Count/sec	Understand workload demand
`ReadLatency` / `WriteLatency`	Time per read/write	`DBInstanceIdentifier`	Seconds	Latency impacts UX
`ReplicaLag`	Replication delay	`DBInstanceIdentifier`	Seconds	Important for read replicas

DynamoDB Metrics (`AWS/DynamoDB`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`ConsumedReadCapacityUnits` / `ConsumedWriteCapacityUnits`	Throughput consumed	`TableName`, `GlobalSecondaryIndex`	Count	Track against provisioned capacity
`ThrottledRequests`	Requests rejected due to capacity limits	`TableName`	Count	Alert if >0
`ReadThrottleEvents` / `WriteThrottleEvents`	Breakdown of throttling	`TableName`	Count	Debug workload patterns
`SuccessfulRequestLatency`	Latency for successful requests	`TableName`	Milliseconds	Measure user experience impact

Lambda Metrics (`AWS/Lambda`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`Invocations`	Number of function calls	`FunctionName`	Count	Track workload volume
`Errors`	Failed invocations	`FunctionName`	Count	Alert on spikes
`Duration`	Execution time per call	`FunctionName`	ms	Track p95/p99 for UX
`Throttles`	Throttled invocations	`FunctionName`	Count	Indicates concurrency issues
`ConcurrentExecutions`	Functions running at once	`FunctionName`	Count	Watch concurrency limits
`IteratorAge`	Lag for stream triggers	`FunctionName`	ms	Spot delays in stream processing

API Gateway Metrics (`AWS/ApiGateway`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`Count`	Number of requests	`ApiName`, `Stage`	Count	Traffic trends
`Latency`	End-to-end latency	`ApiName`, `Stage`	ms	Track user impact (use p95/p99)
`IntegrationLatency`	Backend processing time	`ApiName`, `Stage`	ms	Diagnose backend vs gateway
`4XXError`	Client errors	`ApiName`, `Stage`	Count	Spot misuse or auth failures
`5XXError`	Server errors	`ApiName`, `Stage`	Count	Alert on backend/system faults

CloudFront Metrics (`AWS/CloudFront`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`Requests`	Number of requests	`DistributionId`, `Region=Global`	Count	Request volume
`BytesDownloaded` / `BytesUploaded`	Data transfer	Same	Bytes	Bandwidth usage
`TotalErrorRate`	Error % across all requests	Same	Percent	High values = reliability issue
`4xxErrorRate` / `5xxErrorRate`	Client vs server error breakdown	Same	Percent	Debug caching vs origin issues
`CacheHitRate`	% served from cache	Same	Percent	Optimise caching behaviour

Note: CloudFront metrics are only available in us-east-1 with Region=Global.

CloudWatch Logs Metrics (`AWS/Logs`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`IncomingLogEvents`	Number of log events ingested	`LogGroupName`	Count	Track log volume
`IncomingBytes`	Data volume ingested	`LogGroupName`	Bytes	Estimate storage cost
`DeliveryErrors`	Failed delivery attempts	`LogGroupName`, `DestinationType`	Count	Debug subscription issues
`DeliveryThrottling`	Throttled log events	Same	Count	Detect limits exceeded
`ErrorCount`	API errors	`Service`, `Resource`	Count	Detect problems in log delivery

CloudWatch Agent Metrics (`CWAgent`)

Metric	Description	Key Dimensions	Units	Use in Grafana
`cpu_usage_active / idle / system`	CPU breakdown	Host, InstanceId	Percent	Detailed CPU analysis
`mem_used / mem_free / mem_available`	Memory statistics	Host	Bytes	Memory pressure detection
`disk_used_percent / disk_free`	Disk usage	MountPath, Device	Percent/Bytes	Disk capacity alerts
`swap_used / swap_used_percent`	Swap activity	Host	Bytes/Percent	Spot performance issues
`processes_running / processes_sleeping`	Process states	Host	Count	Detect overloaded hosts

Using Metrics Effectively in Grafana

Group by user impact: Focus dashboards on latency, errors, and throttles.
Overlay related metrics: Plot API latency alongside DynamoDB throttles and consumed capacity to spot correlations.
Alert on symptoms, not noise: A single throttle is irrelevant. A sustained latency increase plus throttles is actionable.
Use percentiles, not averages: Percentiles reflect what users actually see.
Keep dashboards alive: Review and refine them as your architecture evolves.

Final Thoughts

CloudWatch metrics are AWS’s built-in observability layer. Grafana makes them usable by connecting signals across services. Together, they provide visibility and confidence without requiring a heavy observability stack.

If you want to move faster, reduce wasted effort, and calm down those “will it scale?” conversations, this is where you start: collect the right metrics, display them clearly, and use them to make decisions.

Gary Worthington is a software engineer, delivery consultant, and agile coach who helps teams move fast, learn faster, and scale when it matters. He writes about modern engineering, product thinking, and helping teams ship things that matter.

Through his consultancy, More Than Monkeys, Gary helps startups and scaleups improve how they build software, from tech strategy and agile delivery to product validation and team development.

Visit morethanmonkeys.co.uk to learn how we can help you build better, faster.

Follow Gary on LinkedIn for practical insights into engineering leadership, agile delivery, and team performance.