Article

Demystifying AWS IGW and NAT GW

by Gary Worthington, More Than Monkeys

If you’ve ever found yourself muttering “why can this EC2 instance download updates but that one can’t?”, you’ve probably stumbled into the swamp of Internet Gateways (IGW) and NAT Gateways (NATGW). AWS networking has a special talent for being both simple and endlessly confusing, usually because people talk about “public subnets” like they are a physical thing, rather than what they actually are: a route table decision.

Let’s sort this out properly.

The mental model that stops you getting lost

There are only a few moving parts you need to hold in your head:

Subnets:

  • just IP ranges. They do not make traffic public or private on their own.

Route tables:

  • where you decide where traffic goes.

Internet Gateway (IGW):

  • the VPC’s doorway to the internet.

NAT Gateway (NATGW):

  • a managed service that lets private things initiate outbound internet access without becoming reachable inbound.

Everything else is just implementation detail and people being dramatic.

What an Internet Gateway actually is

An Internet Gateway is a VPC-level component you attach to a VPC. It provides a path between your VPC and the public internet.

Key points that save time in design reviews:

  • IGW is not “a public subnet”. It is attached to the VPC, not a subnet.
  • IGW does not magically expose your instances to the world.
  • IGW does not do NAT for your instances. If your instance has a private RFC1918 address only (like 10.0.1.25), the internet cannot route back to it.

For an instance to be internet-reachable via an IGW, it needs:

  1. A route to the IGW (0.0.0.0/0 -> igw-...) in its subnet’s route table.
  2. A public IPv4 address (auto-assigned public IP or an Elastic IP).
  3. Security Group rules that allow the inbound traffic you care about.
  4. Network ACLs that are not sabotaging you.

That is it. No incense, no chanting.

“Public subnet” is just a route table label

A subnet is typically called “public” when its route table has:

  • 0.0.0.0/0 -> Internet Gateway

If that default route points somewhere else (or nowhere), it is not “public”, regardless of what someone named it in Terraform.

A subnet is typically called “private” when its route table has:

  • 0.0.0.0/0 -> NAT Gateway (for outbound internet access)
  • or no 0.0.0.0/0 route at all (isolated, no internet)

That is the whole distinction. The subnet itself did not change. Your routing did.

Why NAT Gateway exists

Most real systems want this shape:

  • App servers should be able to fetch OS updates, call third-party APIs, pull containers, talk to AWS services.
  • But app servers should not be directly reachable from the internet.

That is exactly what a NAT Gateway gives you: outbound-only internet access for resources in private subnets.

The NAT GW sits in a public subnet, has an Elastic IP (for the “public NAT” flavour), and private subnets route their internet-bound traffic to it. NAT GW then translates the source IP to its public IP and sends it out via the IGW.

Importantly:

  • NATGW does not allow unsolicited inbound connections from the internet to your private instances.
  • It supports connections initiated from the inside (stateful return traffic is allowed).
  • If your private instance cannot reach the internet, NATGW is often involved, but it is rarely the root cause. Routing usually is.

The canonical layout

Here’s the common pattern you see in production VPCs:

Public subnets:

  • Internet-facing load balancer (ALB/NLB) or bastion (if you hate yourself)
  • NAT Gateways (one per AZ if you like resilient systems)

Private subnets:

  • ECS tasks, EC2 app servers, internal ALBs, Lambda ENIs

Isolated subnets:

  • Databases, caches, anything you want to keep firmly away from the internet

That pattern works because you constrain inbound access to a small, controlled surface (usually the load balancer), while still letting private workloads initiate outbound connections.

Route tables: the bit everyone hand-waves and then regrets

If you remember nothing else, remember this:

  • Public subnet route table: 0.0.0.0/0 -> IGW
  • Private subnet route table: 0.0.0.0/0 -> NATGW
  • Isolated subnet route table: no default route

When people say “move it into a private subnet”, what they usually mean is “change its route table association”.

A Terraform example you can actually use

Here’s a minimal, conventional setup: VPC, IGW, public and private route tables, a NATGW, and routes.

resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "demo-vpc" }
}

resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = { Name = "demo-igw" }
}

# Public subnet (example)
resource "aws_subnet" "public_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.0.0/24"
availability_zone = "eu-west-2a"
map_public_ip_on_launch = true
tags = { Name = "public-a" }
}

# Private subnet (example)
resource "aws_subnet" "private_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.10.0/24"
availability_zone = "eu-west-2a"
tags = { Name = "private-a" }
}

# Public route table: default route to IGW
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
tags = { Name = "rt-public" }
}

resource "aws_route" "public_internet" {
route_table_id = aws_route_table.public.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}

resource "aws_route_table_association" "public_a" {
subnet_id = aws_subnet.public_a.id
route_table_id = aws_route_table.public.id
}

# NAT Gateway lives in a public subnet and needs an EIP
resource "aws_eip" "nat" {
domain = "vpc"
tags = { Name = "eip-nat" }
}

resource "aws_nat_gateway" "nat_a" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public_a.id
tags = { Name = "nat-a" }
depends_on = [aws_internet_gateway.igw]
}

# Private route table: default route to NATGW
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
tags = { Name = "rt-private" }
}

resource "aws_route" "private_internet" {
route_table_id = aws_route_table.private.id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.nat_a.id
}

resource "aws_route_table_association" "private_a" {
subnet_id = aws_subnet.private_a.id
route_table_id = aws_route_table.private.id
}

If you deploy that, instances in public_a can reach the internet directly (assuming they have a public IP and security allows it). Instances in private_a can reach the internet via NAT GW, but are not directly reachable inbound.

High availability: one NAT GW is not a strategy

NAT Gateway is an AZ-scoped service. If you run private subnets in multiple AZs (you should), you usually want:

One NATGW per AZ

  • Private subnet in AZ-a routes to NATGW-a
  • Private subnet in AZ-b routes to NATGW-b

Why?

  • Resilience: an AZ issue should not take out outbound connectivity for everything.
  • Cost: routing private subnet traffic across AZs to reach a NATGW in another AZ can create cross-AZ data processing charges.

It is one of those things that feels “over-engineered” until the day you are trying to patch instances during an incident and they cannot reach package repos.

Cost and when to avoid NAT GW

NAT Gateway is convenient, and also very good at quietly becoming a meaningful line item.

NAT GW is typically charged:

  • per hour it exists
  • plus per GB processed

So a sensible question is: do you actually need internet egress from those private subnets?

Common ways to reduce or eliminate NATGW usage:

VPC Endpoints

  • Gateway endpoints for S3 and DynamoDB
  • Interface endpoints (PrivateLink) for services like ECR, CloudWatch Logs, SSM, STS, Secrets Manager, KMS, etc.

Use SSM Session Manager instead of bastions

  • Less inbound exposure, fewer moving parts

Isolate workloads that do not need egress

  • Databases rarely need to talk to the public internet

IPv6 design

  • If you go IPv6-first, you do not NAT in the same way. For outbound-only IPv6, AWS provides an egress-only internet gateway. (Different tool, different model.)

The best setup is often: endpoints for AWS services, NATGW only for the genuinely external stuff.

Troubleshooting checklist (the stuff that bites people daily)

When “private instance can’t reach the internet” happens, work through this in order:

Does the private subnet route table have 0.0.0.0/0 -> NATGW?

Is the NATGW in a public subnet with 0.0.0.0/0 -> IGW?

Does the NATGW have an EIP (for public NATGW)?

Are NACLs blocking ephemeral ports?

  • Return traffic uses ephemeral ports. NACLs are stateless. They will happily ruin your day.

Are Security Groups blocking egress?

  • Less common, but it happens in “locked down” accounts.

Is DNS working?

People think “no internet”, but it is actually “no DNS”. Check VPC DNS support and resolver rules.

Are you accidentally routing to the wrong place?

Multiple route tables, wrong association, copy-paste subnet IDs. Classic.

If you only do one diagnostic step, check the route table association for the subnet. Half of AWS networking is “you associated the wrong thing to the wrong table”.

The one-line summary

  • IGW is how a VPC talks to the internet.
  • A “public subnet” is a subnet whose route table points 0.0.0.0/0 at the IGW.
  • NAT GW is how private subnets initiate outbound internet access without being reachable inbound.
  • Most problems are routing, not gateways.

Once you stop treating “public/private subnet” as a mystical property and start treating it as a route table choice, AWS networking becomes far less of a guessing game.

Gary Worthington is a software engineer, delivery consultant, and fractional CTO who helps teams move fast, learn faster, and scale when it matters. He writes about modern engineering, product thinking, and helping teams ship things that matter.

Through his consultancy, More Than Monkeys, Gary helps startups and scaleups improve how they build software, from tech strategy and agile delivery to product validation and team development.

Visit morethanmonkeys.co.uk to learn how we can help you build better, faster.

Follow Gary on LinkedIn for practical insights into engineering leadership, agile delivery, and team performance