Load Testing AWS API Gateway with Locust: A Complete Walkthrough

When you move your backend behind AWS API Gateway, the first thing you notice is how easy it is to stand up an HTTPS API. What’s less obvious is how your API behaves under stress.

Will API Gateway throttle you earlier than you expect?
Will Lambda cold starts ruin your tail latency?
Will your DynamoDB tables or Aurora clusters keep up?

That’s where load testing comes in. And one of the best tools for the job is Locust, a Python-based framework that makes it easy to simulate realistic traffic patterns and push your API until you learn its true limits.

This tutorial is designed to be hands-on. By the end, you’ll:

Understand what Locust is and how it works.
Run a simple test against https://example.com.
Add authentication for real API Gateway endpoints.
Run distributed load tests in AWS.
Interpret results using CloudWatch and X-Ray.

1. What is Locust?

Locust is an open-source load testing tool written in Python.

Instead of writing XML config (like JMeter) or JavaScript scripts (like k6), you write plain Python classes that define how a “user” behaves. Each user runs in a lightweight green thread, so you can run thousands on a laptop.

Example Locust “user”:

from locust import FastHttpUser, task, between

class HelloUser(FastHttpUser):
    wait_time = between(1, 2)
    @task
    def hello(self):
        self.client.get("/")

This user fetches / every 1–2 seconds. Locust spins up many of these “users” at once, and you see aggregated stats in real time.

Why Locust for AWS API Gateway?
Because your API Gateway endpoints are HTTPS APIs, Locust can call them directly. And since you write Python, you can also handle AWS-specific authentication (SigV4, Cognito JWTs, API keys) using the same libraries you’d use in production.

2. Why API Gateway needs special attention

API Gateway is powerful but opinionated. When load testing, you must account for:

Request throttling

Default account-level limits (often ~10k RPS per region).
Per-method throttling (e.g. 100 RPS for POST /items).
If you exceed them, you’ll get HTTP 429 Too Many Requests.

Burst vs steady state

Gateway allows short bursts above the steady rate, then enforces limits.

Cold starts

If Gateway invokes a Lambda that hasn’t run recently, cold starts add 200–800 ms latency.

Downstream dependencies

DynamoDB hot partitions, Aurora connection pools, SQS throughput.
Gateway won’t protect you from downstream saturation.

When you test, you’re testing the whole path: TLS termination → Gateway → backend (Lambda/HTTP integration) → database → response.

3. Setting clear performance goals

Load tests without goals just generate graphs. Useful performance goals look like:

“p95 latency under 200 ms at 100 RPS sustained for 15 minutes.”
“<1% 5xx errors under 500 concurrent users.”
“Cold start adds <500 ms and occurs <1% of invocations.”

Tie goals to user experience and business risk.

Checkout APIs → optimise for latency.
Reporting APIs → optimise for throughput.
Admin APIs → optimise for resilience under spikes.

4. Installing Locust

Locust is Python-based:

python -m venv .venv
source .venv/bin/activate
pip install locust

Verify:

locust -V
# locust 2.x

5. Running your first test (example.com)

Create locustfile.py:

from locust import FastHttpUser, task, between

class ExampleUser(FastHttpUser):
    # set this here, or in the UI for greater flexibility
    host = "https://example.com"

    wait_time = between(1, 2)
    @task
    def homepage(self):
        self.client.get("/", name="GET /")

Run:

locust -f locustfile.py

Open http://localhost:8089. Enter:

Users: 10
Spawn rate: 2

Now you’re load testing example.com.

6. Designing realistic scenarios for API Gateway

Realistic traffic matters more than raw throughput. Define personas:

Reader: GET /items
Writer: POST /items
Health checker: GET /health

Example:

from locust import FastHttpUser, task, between

class Reader(FastHttpUser):
    wait_time = between(1, 3)
    @task(3)
    def list_items(self):
        self.client.get("/items?limit=10", name="GET /items")

class Writer(FastHttpUser):
    wait_time = between(5, 10)
    @task
    def create_item(self):
        payload = {"id": "123", "name": "test"}
        self.client.post("/items", json=payload, name="POST /items")

Here, Reader runs more often than Writer, simulating traffic mix.

7. Adding authentication

API Keys

import os
from locust import FastHttpUser, task

class ApiKeyUser(FastHttpUser):
    def on_start(self):
        self.api_key = os.getenv("API_KEY")

    @task
    def get_items(self):
        self.client.get(
            "/items",
            headers={"x-api-key": self.api_key},
            name="GET /items"
        )

Cognito JWTs

Fetch an ID token with boto3:

import boto3, os

client = boto3.client("cognito-idp", region_name=os.getenv("AWS_REGION"))

resp = client.initiate_auth(
    AuthFlow="USER_PASSWORD_AUTH",
    ClientId=os.getenv("COGNITO_CLIENT_ID"),
    AuthParameters={
        "USERNAME": os.getenv("COGNITO_USERNAME"),
        "PASSWORD": os.getenv("COGNITO_PASSWORD"),
    }
)

token = resp["AuthenticationResult"]["IdToken"]

Attach it:

headers = {"Authorization": f"Bearer {token}"}
self.client.get("/items", headers=headers)

SigV4 (IAM auth)

Use requests-aws4auth or a custom signer to sign requests. This is required if your API uses IAM-based authorisation.

8. Running Locust headless

For automation:

locust -f locustfile.py --headless -u 100 -r 10 -t 5m --csv results

-u: users
-r: spawn rate
-t: duration
--csv: save results

This runs a 5-minute test with 100 users, spawning 10 per second.

9. Distributed testing in AWS

To reach higher throughput, run Locust in master/worker mode:

# Master
locust -f locustfile.py --master --expect-workers 4

# Worker
locust -f locustfile.py --worker --master-host <master-ip>

Options for AWS:

EC2: Master on t3.medium, workers on c5.large.
ECS Fargate: One service for master, another for workers.
EKS: Run Locust in Kubernetes via Helm.

10. Observability in AWS

Locust shows client metrics. Pair with AWS telemetry:

CloudWatch metrics

Latency, IntegrationLatency, 4XXError, 5XXError, Count

CloudWatch logs

Enable access and execution logging.

X-Ray tracing

See where latency is spent (Gateway vs Lambda vs DB).

11. Analysing results

Typical patterns:

p95/p99 latency spikes → cold starts or database hot partitions.
Throughput caps at round numbers → throttling.
Writes slower/failing → DB bottlenecks.

Always cross-reference Locust output with CloudWatch graphs.

12. Automating performance tests

Best practice: integrate load tests into delivery.

Smoke tests: Run on every deploy, low RPS.
Nightly runs: Catch regressions early.
Full load tests: Before major launches.

Export reports:

locust -f locustfile.py --csv results --html report.html --headless -u 100 -r 10 -t 5m

Store artefacts in CI/CD for traceability.

13. Checklist before testing

Staging environment matches production config.
CloudWatch dashboards/logging enabled.
Alarms configured.
Rollback plan in place.
Test users provisioned in Cognito if needed.

14. Downloadable sample project

To get started quickly:

Download locust-example.zip

Includes:

locustfile.py — simple test for https://example.com
requirements.txt
utils/headers.py

Run with:

pip install -r requirements.txt
locust -f locustfile.py

15. Conclusion

Locust makes load testing approachable. API Gateway makes hosting APIs simple. Put them together, and you get a repeatable way to:

Simulate real traffic.
Measure latency, throughput, and errors.
Identify bottlenecks before your users do.

Performance testing isn’t about breaking things — it’s about gaining confidence.

Gary Worthington is a software engineer, delivery consultant, and agile coach who helps teams move fast, learn faster, and scale when it matters. He writes about modern engineering, product thinking, and helping teams ship things that matter.

Through his consultancy, More Than Monkeys, Gary helps startups and scaleups improve how they build software — from tech strategy and agile delivery to product validation and team development.

Visit morethanmonkeys.co.uk to learn how we can help you build better, faster.

Follow Gary on LinkedIn for practical insights into engineering leadership, agile delivery, and team performance.