VPC Design
Designing secure, scalable Virtual Private Clouds on AWS and GCP — CIDR planning, subnet architecture, routing, security controls, and connectivity patterns.
AWS AWS VPC
CIDR Planning Best Practices
- Use /16 for each VPC — gives 65,536 addresses with room to grow
- Never overlap with on-premises ranges — plan a master IP address management (IPAM) scheme
- Reserve entire /8 blocks per cloud (e.g. 10.0.0.0/8 for AWS, 10.128.0.0/9 for GCP)
- Leave gaps between VPC CIDRs for future peering expansion
- Avoid RFC 1918 ranges already used on-prem — use AWS IPAM for centralized management
# Example enterprise IP plan
# AWS Production VPCs (10.0.0.0/13 → 10.0.0.0 – 10.7.255.255)
10.0.0.0/16 → prod-us-east-1 VPC
10.1.0.0/16 → prod-eu-west-1 VPC
10.2.0.0/16 → prod-ap-southeast-1 VPC
10.3.0.0/16 → prod-us-west-2 VPC
# AWS Non-prod VPCs (10.8.0.0/13)
10.8.0.0/16 → staging-us-east-1 VPC
10.9.0.0/16 → dev-us-east-1 VPC
# Shared Services / Hub
10.100.0.0/16 → shared-services VPC (DNS, bastion, monitoring)
# On-premises (avoid in cloud)
192.168.0.0/16 → HQ LAN (never use this in cloud VPCs)
172.16.0.0/12 → DC network
# GCP VPCs (10.128.0.0/9)
10.128.0.0/16 → gcp-prod subnet region us-central1
10.132.0.0/16 → gcp-prod subnet region europe-west1
3-Tier Subnet Architecture (Multi-AZ)
# prod-us-east-1 VPC: 10.0.0.0/16
# PUBLIC TIER — Internet-facing (ALB, NAT GW, Bastion)
10.0.0.0/24 → public-us-east-1a (254 hosts)
10.0.1.0/24 → public-us-east-1b (254 hosts)
10.0.2.0/24 → public-us-east-1c (254 hosts)
# PRIVATE APP TIER — Application servers, ECS tasks, Lambda VPC
10.0.10.0/23 → private-app-us-east-1a (510 hosts — /23 for larger fleets)
10.0.12.0/23 → private-app-us-east-1b (510 hosts)
10.0.14.0/23 → private-app-us-east-1c (510 hosts)
# PRIVATE DATA TIER — RDS, ElastiCache, OpenSearch
10.0.20.0/24 → private-data-us-east-1a (254 hosts)
10.0.21.0/24 → private-data-us-east-1b (254 hosts)
10.0.22.0/24 → private-data-us-east-1c (254 hosts)
# RESERVED — future use, VPC endpoints, Transit GW attachments
10.0.30.0/24 → reserved-us-east-1a
10.0.31.0/24 → reserved-us-east-1b
10.0.32.0/24 → reserved-us-east-1c
# NOTE: AWS reserves 5 IPs per subnet:
# .0 Network address, .1 VPC router, .2 DNS, .3 Future, .255 Broadcast
Internet Gateway & NAT Gateway
| Component | Purpose | HA consideration | Cost |
|---|---|---|---|
| Internet Gateway (IGW) | Bidirectional internet access for public subnets | Highly available by default (AWS-managed, no AZ affinity) | Free (data transfer charges apply) |
| NAT Gateway (managed) | Outbound-only internet for private subnets | AZ-specific — deploy one per AZ for HA | ~$0.045/hr + $0.045/GB processed |
| NAT Instance (self-managed) | Same as NAT GW but on EC2 | Must configure your own HA (ASG) | EC2 cost only (cheaper at scale) |
Route Tables
# Public subnet route table
Destination Target
0.0.0.0/0 igw-0abc123def456789a # Default route → Internet Gateway
10.0.0.0/16 local # VPC local (automatic)
10.100.0.0/16 tgw-0xyz987 # Shared services via Transit Gateway
# Private app subnet route table (AZ-a)
Destination Target
0.0.0.0/0 nat-0aaa111bbb222ccc33 # NAT Gateway in same AZ
10.0.0.0/16 local
10.100.0.0/16 tgw-0xyz987
192.168.0.0/16 tgw-0xyz987 # On-premises via TGW+DX
# Blackhole route — drop traffic to decommissioned VPC
10.5.0.0/16 blackhole
# Route propagation from VGW (BGP learned routes from Direct Connect)
# Enable in route table: Actions → Edit route propagation → Enable
Security Groups
Security Group Characteristics
Stateful — return traffic automatically allowed. Applied at the ENI (instance) level. No explicit deny rules — only allow rules. Rules evaluated collectively (all rules checked, most permissive wins). Can reference other security groups as source/destination.
# Web tier SG (ALB-facing)
# Inbound:
Type Protocol Port Source
HTTPS TCP 443 0.0.0.0/0, ::/0 # Internet HTTPS
HTTP TCP 80 0.0.0.0/0, ::/0 # HTTP (redirect to HTTPS)
# Outbound:
Type Protocol Port Destination
Custom TCP 8080 sg-app-tier # Forward to app tier
# App tier SG (EC2/ECS)
# Inbound:
Type Protocol Port Source
Custom TCP 8080 sg-web-tier # Only from web tier SG
# Outbound:
Type Protocol Port Destination
Custom TCP 5432 sg-data-tier # PostgreSQL
Custom TCP 6379 sg-data-tier # Redis
HTTPS TCP 443 0.0.0.0/0 # AWS APIs, package repos
# Data tier SG (RDS, ElastiCache)
# Inbound:
Type Protocol Port Source
Custom TCP 5432 sg-app-tier # PostgreSQL from app only
Custom TCP 6379 sg-app-tier # Redis from app only
Network ACLs
# Network ACL for public subnet
# INBOUND rules (numbered — lower = higher priority)
Rule Type Protocol Port Range Source Action
100 HTTPS TCP 443 0.0.0.0/0 ALLOW
110 HTTP TCP 80 0.0.0.0/0 ALLOW
120 Custom TCP 1024-65535 0.0.0.0/0 ALLOW # ephemeral return
130 SSH TCP 22 10.100.0.0/16 ALLOW # from bastion VPC
* All All All 0.0.0.0/0 DENY # implicit deny
# OUTBOUND rules
Rule Type Protocol Port Range Destination Action
100 HTTPS TCP 443 0.0.0.0/0 ALLOW
110 HTTP TCP 80 0.0.0.0/0 ALLOW
120 Custom TCP 1024-65535 0.0.0.0/0 ALLOW # ephemeral ports
* All All All 0.0.0.0/0 DENY
VPC Endpoints
| Type | Mechanism | Services | Cost |
|---|---|---|---|
| Gateway Endpoint | Route table entry, no ENI | S3, DynamoDB only | Free |
| Interface Endpoint (PrivateLink) | ENI with private IP in subnet | 100+ AWS services, custom services | ~$0.01/hr + $0.01/GB |
# Gateway endpoint for S3 (route table based)
# Add to route table:
Destination Target
pl-63a5400a (S3) vpce-0123456789abcdef0 # S3 prefix list → endpoint
# Interface endpoint — private DNS enabled
# EC2 in private subnet can reach s3.amazonaws.com without internet
# Endpoint DNS: vpce-xxx.s3.us-east-1.vpce.amazonaws.com (private)
# With private DNS enabled: s3.amazonaws.com resolves to endpoint IP
# Endpoint policy (restrict S3 access to specific bucket)
{
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-prod-bucket/*"
}]
}
VPC Peering
# VPC Peering characteristics:
# - Non-transitive: A↔B, B↔C does NOT mean A↔C
# - No overlapping CIDR blocks allowed
# - Works cross-account and cross-region
# - DNS resolution for peered VPCs must be explicitly enabled
# Setup (Terraform)
resource "aws_vpc_peering_connection" "prod_to_shared" {
vpc_id = aws_vpc.prod.id
peer_vpc_id = aws_vpc.shared.id
peer_owner_id = var.shared_account_id
auto_accept = false
tags = { Name = "prod-to-shared" }
}
# Route table update required on BOTH sides
resource "aws_route" "prod_to_shared" {
route_table_id = aws_route_table.private_app.id
destination_cidr_block = "10.100.0.0/16"
vpc_peering_connection_id = aws_vpc_peering_connection.prod_to_shared.id
}
AWS Transit Gateway
Hub-and-Spoke Topology
Transit Gateway (TGW) acts as a regional hub — attach multiple VPCs, VPNs, and Direct Connect GWs to a single TGW instead of a full mesh of VPC peering. Supports up to 5,000 VPC attachments. Transitive routing enabled. Route isolation via separate TGW route tables.
# TGW route table design — route isolation
# Separate TGW route tables per security domain:
# prod-rt:
# Associates: prod VPCs
# Propagates from: prod VPCs, shared-services, on-prem (via DX)
# Static: none
# dev-rt:
# Associates: dev/staging VPCs
# Propagates from: dev VPCs, shared-services
# DOES NOT propagate from: prod (isolation)
# shared-services-rt:
# Associates: shared-services VPC
# Propagates from: ALL VPCs (can reach anywhere)
resource "aws_ec2_transit_gateway" "main" {
description = "Main TGW hub"
default_route_table_association = "disable" # use custom route tables
default_route_table_propagation = "disable"
auto_accept_shared_attachments = "enable"
dns_support = "enable"
vpn_ecmp_support = "enable"
tags = { Name = "main-tgw" }
}
resource "aws_ec2_transit_gateway_vpc_attachment" "prod" {
subnet_ids = aws_subnet.private_tgw[*].id
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = aws_vpc.prod.id
tags = { Name = "prod-vpc-attachment" }
}
Terraform — Complete Multi-AZ VPC
# vpc/main.tf — Production VPC with full 3-tier architecture
variable "vpc_cidr" { default = "10.0.0.0/16" }
variable "environment" { default = "prod" }
variable "azs" { default = ["us-east-1a", "us-east-1b", "us-east-1c"] }
locals {
public_cidrs = ["10.0.0.0/24", "10.0.1.0/24", "10.0.2.0/24"]
app_cidrs = ["10.0.10.0/23", "10.0.12.0/23", "10.0.14.0/23"]
data_cidrs = ["10.0.20.0/24", "10.0.21.0/24", "10.0.22.0/24"]
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "${var.environment}-vpc" }
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = { Name = "${var.environment}-igw" }
}
resource "aws_subnet" "public" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = local.public_cidrs[count.index]
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = { Name = "public-${var.azs[count.index]}", Tier = "public" }
}
resource "aws_subnet" "private_app" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = local.app_cidrs[count.index]
availability_zone = var.azs[count.index]
tags = { Name = "private-app-${var.azs[count.index]}", Tier = "app" }
}
resource "aws_subnet" "private_data" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = local.data_cidrs[count.index]
availability_zone = var.azs[count.index]
tags = { Name = "private-data-${var.azs[count.index]}", Tier = "data" }
}
# NAT Gateway per AZ (HA)
resource "aws_eip" "nat" {
count = 3
domain = "vpc"
tags = { Name = "nat-eip-${var.azs[count.index]}" }
}
resource "aws_nat_gateway" "main" {
count = 3
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
depends_on = [aws_internet_gateway.main]
tags = { Name = "nat-gw-${var.azs[count.index]}" }
}
# Route tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = { Name = "public-rt" }
}
resource "aws_route_table" "private_app" {
count = 3
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = { Name = "private-app-rt-${var.azs[count.index]}" }
}
resource "aws_route_table_association" "public" {
count = 3
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private_app" {
count = 3
subnet_id = aws_subnet.private_app[count.index].id
route_table_id = aws_route_table.private_app[count.index].id
}
# VPC Flow Logs
resource "aws_flow_log" "main" {
vpc_id = aws_vpc.main.id
traffic_type = "ALL"
iam_role_arn = aws_iam_role.flow_log.arn
log_destination = aws_cloudwatch_log_group.vpc_flow.arn
}
GCP GCP VPC
Global VPC Concept
GCP vs AWS VPC Architecture
GCP VPC is global — a single VPC spans all regions. Subnets are regional (not AZ-specific). A VM in us-central1 and a VM in europe-west1 can be in the same VPC and communicate over private IPs using Google's global backbone, without VPC peering or Transit Gateway.
AWS VPC is regional — one VPC per region. Cross-region requires VPC peering or Transit Gateway.
Subnet Modes & Secondary Ranges
# Auto mode VPC: GCP automatically creates one subnet per region (10.128.0.0/9)
# Custom mode VPC: You define all subnets — recommended for production
# Create custom mode VPC
gcloud compute networks create prod-vpc \
--subnet-mode=custom \
--bgp-routing-mode=global \
--mtu=1460
# Create regional subnet
gcloud compute networks subnets create prod-us-central1 \
--network=prod-vpc \
--region=us-central1 \
--range=10.0.0.0/20 \
--enable-private-ip-google-access
# Secondary IP ranges (required for GKE Pods and Services)
gcloud compute networks subnets create gke-nodes \
--network=prod-vpc \
--region=us-central1 \
--range=10.10.0.0/20 \
--secondary-range pods=10.100.0.0/14,services=10.96.0.0/20
# pods range: /14 = 262,144 addresses (GKE assigns /24 per node = 100+ nodes)
# services range: /20 = 4,096 ClusterIPs
Shared VPC
Shared VPC Architecture
One host project owns the VPC and subnets. Multiple service projects deploy resources into shared subnets. Centralizes network management while allowing team autonomy. Requires Organization-level Shared VPC Admin role.
# Enable Shared VPC on host project
gcloud compute shared-vpc enable HOST_PROJECT_ID
# Associate service project
gcloud compute shared-vpc associated-projects add SERVICE_PROJECT_ID \
--host-project=HOST_PROJECT_ID
# Grant subnet access to service project's service account
gcloud projects add-iam-policy-binding HOST_PROJECT_ID \
--member="serviceAccount:[email protected]" \
--role="roles/compute.networkUser"
# Grant per-subnet access (more granular)
gcloud compute networks subnets add-iam-policy-binding prod-us-central1 \
--region=us-central1 \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/compute.networkUser" \
--project=HOST_PROJECT_ID
# Terraform Shared VPC
resource "google_compute_shared_vpc_host_project" "host" {
project = var.host_project_id
}
resource "google_compute_shared_vpc_service_project" "service" {
host_project = var.host_project_id
service_project = var.service_project_id
}
GCP VPC Peering
# GCP VPC Peering: non-transitive (same as AWS)
# DNS is NOT shared by default — must configure DNS peering separately
# CIDR must not overlap
gcloud compute networks peerings create prod-to-shared \
--network=prod-vpc \
--peer-project=shared-services-project \
--peer-network=shared-vpc \
--import-custom-routes \
--export-custom-routes
# DNS peering (to resolve shared-services DNS zones)
gcloud dns managed-zones create shared-peering \
--dns-name="internal.example.com." \
--description="Peering to shared services DNS" \
--networks=prod-vpc \
--target-network=shared-vpc \
--target-project=shared-services-project \
--visibility=private
Cloud NAT
# Cloud NAT: fully managed, no NAT instance to maintain
# Region-level (covers all subnets in a region by default)
# Can assign static external IPs (for IP allowlisting)
gcloud compute routers create prod-router \
--network=prod-vpc \
--region=us-central1
gcloud compute routers nats create prod-nat \
--router=prod-router \
--region=us-central1 \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips \
--min-ports-per-vm=64 \
--enable-logging
# Static external IPs for NAT (for IP allowlisting with partners)
gcloud compute addresses create nat-ip-1 --region=us-central1
gcloud compute routers nats update prod-nat \
--router=prod-router \
--nat-external-ip-pool=nat-ip-1
Private Google Access & VPC Service Controls
# Private Google Access: VMs without public IPs can reach Google APIs
# Enable per subnet:
gcloud compute networks subnets update prod-us-central1 \
--enable-private-ip-google-access \
--region=us-central1
# For GKE, Cloud SQL Proxy, Pub/Sub — no internet needed with PGA enabled
# DNS: restricted.googleapis.com (199.36.153.4/30) — only allow allowlisted APIs
# Add route:
gcloud compute routes create private-google-access \
--network=prod-vpc \
--destination-range=199.36.153.4/30 \
--next-hop-gateway=default-internet-gateway
# VPC Service Controls — perimeter around Google APIs
# Service perimeter: group projects + allowed APIs
# Prevents data exfiltration even with stolen credentials
gcloud access-context-manager perimeters create prod-perimeter \
--title="Production Perimeter" \
--resources="projects/123456" \
--restricted-services="bigquery.googleapis.com,storage.googleapis.com" \
--policy=POLICY_NAME
GCP VPC — Terraform Example
# gcp_vpc/main.tf
resource "google_compute_network" "prod" {
name = "prod-vpc"
auto_create_subnetworks = false
routing_mode = "GLOBAL"
mtu = 1460
}
resource "google_compute_subnetwork" "app" {
name = "app-us-central1"
ip_cidr_range = "10.0.0.0/20"
region = "us-central1"
network = google_compute_network.prod.id
private_ip_google_access = true
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.100.0.0/14"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.96.0.0/20"
}
}
resource "google_compute_router" "prod" {
name = "prod-router"
region = "us-central1"
network = google_compute_network.prod.id
}
resource "google_compute_router_nat" "prod" {
name = "prod-nat"
router = google_compute_router.prod.name
region = "us-central1"
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
log_config {
enable = true
filter = "ERRORS_ONLY"
}
}
# Firewall rules (GCP VPC-level, unlike AWS SG which is instance-level)
resource "google_compute_firewall" "allow_internal" {
name = "allow-internal"
network = google_compute_network.prod.id
allow {
protocol = "tcp"
ports = ["0-65535"]
}
allow { protocol = "udp"; ports = ["0-65535"] }
allow { protocol = "icmp" }
source_ranges = ["10.0.0.0/8"]
}
resource "google_compute_firewall" "allow_https" {
name = "allow-https-lb"
network = google_compute_network.prod.id
allow { protocol = "tcp"; ports = ["443", "80"] }
source_ranges = ["0.0.0.0/0"]
target_tags = ["web-server"]
}
- Non-overlapping CIDRs across all environments and on-premises
- 3-tier subnet design: public / private-app / private-data per AZ
- NAT Gateway per AZ (AWS) or Cloud NAT per region (GCP)
- VPC Flow Logs enabled for security and troubleshooting
- Security Groups follow least-privilege with SG-to-SG references
- VPC Endpoints for AWS S3/DynamoDB to avoid NAT costs
- Transit Gateway / Shared VPC for centralized connectivity