Who's Calling the Internet? Hunting Unknown Egress with VPC Flow Logs and R53 Resolver Logs
Learn how to enable VPC flow logs on your NAT gateway ENI, store them in S3, and query with Athena to identify which services in your private cluster are making unexpected external DNS calls.
Your private EKS cluster is making calls to the public internet and you have no idea which service is doing it — or why.
- Your services run in private subnets with no direct internet access, so you assume they’re safe
- Egress goes through a NAT gateway, masking individual pod IPs behind a single IP
- No alerts fire; traffic looks normal, but the calls are happening silently
- The fix: enable VPC flow logs on the NAT gateway’s ENI, ship logs to S3, and query with Athena to see every external connection — including pods bypassing your VPC resolver with direct DNS calls
TL;DR
- Prerequisite: Use AWS VPC CNI with
AWS_VPC_K8S_CNI_EXTERNALSNAT=trueso pods get real VPC IPs and SNAT happens at the NAT gateway, not the node.- Enable ENI-level VPC flow logs on the NAT gateway’s network interface → send to S3 (not CloudWatch — too expensive).
- Query with Athena to see every external IP your pods are hitting (
flow_direction = 'ingress'). Filter ondstport = 53to catch pods bypassing the VPC resolver with direct public DNS calls.- Enable Route 53 Resolver query logging → same S3 bucket. Join
pkt_dstaddrfrom flow logs againstanswers.Rdatain resolver logs to resolve external IPs to domain names.- Caveat: R53 Resolver logs show
172.20.0.10(CoreDNS ClusterIP) assrcaddr, not the pod IP — use flow logs for pod attribution and resolver logs only for IP → domain mapping.
The Problem
I first noticed this during a security review. We were auditing outbound traffic from our private EKS clusters and found connections to external IPs we couldn’t attribute to any known service. No alerts had fired, no deployments had gone out recently, and the APM dashboards looked clean. But something was clearly talking to the internet.
The tricky part: in a private subnet setup, all outbound traffic exits through the NAT gateway. Every pod and VM funnels through the same gateway IP. Once traffic leaves through the NAT, external systems only see the NAT’s public IP — the pod IP is gone. Standard flow logs at the VPC or subnet level capture this post-NAT traffic, so you can’t tell which pod originated the connection. That’s where the NAT gateway ENI’s pkt-srcaddr field comes in — covered in Step 5.
This is a common blind spot for platform and security teams. Developers add SDK calls, telemetry agents, or third-party libraries that phone home during initialization. In a monolith, you’d catch it in a network scan. In EKS with dozens of microservices, it goes unnoticed for months.
Symptoms to Watch For
If you’re seeing any of these, you likely have the same problem:
- NAT gateway data processing charges are higher than expected for the volume of traffic your services nominally generate
- Unknown destination IPs in VPC flow logs at the VPC or subnet level, but you can’t trace them back to a specific service
- Port 53 traffic flowing outbound through the NAT — DNS queries to public resolvers (like
8.8.8.8) rather than your Route 53 private resolver - Security scanner alerts showing your private workloads reaching external domains you didn’t whitelist
The port 53 traffic is the one most people miss. If your pods are using a custom DNS config or if a sidecar is bypassing the cluster DNS, those queries go out to the public internet through the NAT gateway. You’d never see it in CloudWatch metrics or standard monitoring.
The Solution: Flow Logs on the NAT Gateway ENI
The NAT gateway has its own elastic network interface (ENI). Flow logs at this specific ENI capture every connection that passes through the NAT. Crucially, they include the pkt-srcaddr field — the original pod IP before SNAT — alongside the external destination IP. That’s exactly the visibility we need.
Sending to S3 instead of CloudWatch Logs is deliberate. A busy NAT gateway can generate millions of flow records per day. CloudWatch Logs ingestion at $0.50/GB adds up fast. S3 at a fraction of that cost, combined with Athena’s pay-per-query model, makes this practical to run continuously.
Prerequisite — AWS VPC CNI with external SNAT enabled
This approach only works if your pods have real VPC CIDR IPs and SNAT happens at the NAT gateway, not at the node. Two things need to be true:
Use AWS VPC CNI, not an overlay CNI like Calico or Flannel. Overlay CNIs assign pod IPs from a separate private range (e.g.
192.168.0.0/16) and perform SNAT at the node before traffic ever reaches the NAT gateway. The NAT ENI flow logs then only see the node IP — pod-level attribution is lost.Set
AWS_VPC_K8S_CNI_EXTERNALSNAT=trueon the VPC CNI DaemonSet. By default this isfalse, which means the VPC CNI itself does SNAT at the node (replacing the pod IP with the node IP). Setting it totruedisables node-level SNAT and delegates it to the NAT gateway — so the pod’s real VPC IP is preserved all the way to the NAT ENI.
1 kubectl describe daemonset aws-node -n kube-system | grep AWS_VPC_K8S_CNI_EXTERNALSNATWith this in place, each pod gets an IP directly from the VPC CIDR, and
pkt-srcaddrin the NAT gateway ENI flow logs will be the actual pod IP.
Other pitfalls to watch for:
- Flow logs have up to a 10-minute aggregation window — this isn’t real-time monitoring
- Parquet format for flow logs is faster to query in Athena and uses 20% less storage — use it from the start, retrofitting is painful
Step 1: Find Your NAT Gateway ENI
1
2
3
4
5
# List NAT gateways in your VPC
aws ec2 describe-nat-gateways \
--filter "Name=vpc-id,Values=vpc-XXXXXXXXXXXXXXXXX" \
--query "NatGateways[*].{NatGatewayId:NatGatewayId, State:State, NetworkInterfaceId:NatGatewayAddresses[0].NetworkInterfaceId}" \
--output table
Note the NetworkInterfaceId — this is the ENI ID you’ll enable flow logs on. It looks like eni-XXXXXXXXXXXXXXXXX.
Step 2: Create an S3 Bucket for Flow Logs
1
2
3
4
5
6
7
8
9
10
# Create the bucket (use a unique name)
aws s3api create-bucket \
--bucket my-vpc-flow-logs-bucket \
--region <region> \
--create-bucket-configuration LocationConstraint=<region>
# Block all public access
aws s3api put-public-access-block \
--bucket my-vpc-flow-logs-bucket \
--public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
Add a lifecycle rule to auto-expire old logs (optional but recommended):
1
2
3
4
5
6
7
8
9
10
aws s3api put-bucket-lifecycle-configuration \
--bucket my-vpc-flow-logs-bucket \
--lifecycle-configuration '{
"Rules": [{
"ID": "expire-flow-logs-90-days",
"Status": "Enabled",
"Filter": {"Prefix": "nat-gateway-flow-logs/"},
"Expiration": {"Days": 90}
}]
}'
Step 3: Enable Flow Logs on the NAT Gateway ENI
AWS automatically attaches the required S3 bucket policy when you create the flow log — no manual bucket policy setup needed if you own the bucket.
1
2
3
4
5
6
7
8
aws ec2 create-flow-logs \
--resource-type NetworkInterface \
--resource-ids eni-XXXXXXXXXXXXXXXXX \
--traffic-type ALL \
--log-destination-type s3 \
--log-destination "arn:aws:s3:::my-vpc-flow-logs-bucket/nat-gateway-flow-logs/" \
--log-format '${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${flow-direction} ${traffic-path} ${pkt-srcaddr} ${pkt-dstaddr}' \
--max-aggregation-interval 60
Key choices here:
--max-aggregation-interval 60gives 1-minute granularity instead of the default 10 minutes- The custom
--log-formataddspkt-srcaddrandpkt-dstaddr(v3 fields) which capture the original pod IP before NAT translation —srcaddralone would only show the NAT gateway’s own private IP flow-directionandtraffic-pathfields help distinguish inbound vs. outbound- We’re logging
ALLtraffic, not justACCEPT, so we catch rejected egress attempts too
Verify the flow log was created:
1
2
3
4
aws ec2 describe-flow-logs \
--filter "Name=resource-id,Values=eni-XXXXXXXXXXXXXXXXX" \
--query "FlowLogs[*].{FlowLogId:FlowLogId, Status:FlowLogStatus, Destination:LogDestination}" \
--output table
It takes a few minutes to start collecting data. Check your S3 bucket after ~5 minutes:
1
aws s3 ls s3://my-vpc-flow-logs-bucket/nat-gateway-flow-logs/ --recursive | head -20
Step 4: Set Up Athena to Query the Logs
First, create an Athena results bucket if you don’t have one:
1
2
3
4
aws s3api create-bucket \
--bucket my-athena-query-results \
--region <region> \
--create-bucket-configuration LocationConstraint=<region>
Set the query result location in Athena (Console: Athena → Settings → Manage → Query result location):
1
s3://my-athena-query-results/
Now create the Athena table. Run this in the Athena query editor:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs_nat (
version INT,
account_id STRING,
interface_id STRING,
srcaddr STRING,
dstaddr STRING,
srcport INT,
dstport INT,
protocol BIGINT,
packets BIGINT,
bytes BIGINT,
`start` BIGINT,
`end` BIGINT,
action STRING,
log_status STRING,
vpc_id STRING,
subnet_id STRING,
flow_direction STRING,
traffic_path STRING,
pkt_srcaddr STRING,
pkt_dstaddr STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://my-vpc-flow-logs-bucket/nat-gateway-flow-logs/AWSLogs/<account-id>/vpcflowlogs/<region>/'
TBLPROPERTIES (
'skip.header.line.count'='1'
);
Replace <account-id> and <region> with your actual values. AWS writes flow logs to this path structure automatically.
Step 5: Query the Flow Logs
srcaddrvspkt-srcaddron a NAT gateway ENI:srcaddrrecords the IP of the intermediate layer — on a NAT gateway ENI that is the NAT’s own private IP.pkt-srcaddrrecords the original packet-level source IP before SNAT, which is the real pod IP. The AWS docs explicitly call out NAT gateways and EKS pods as the primary use case for this field. Always usepkt_srcaddr/pkt_dstaddrwhen querying NAT gateway flow logs.
Traffic arriving at the NAT ENI from private subnet pods is flow_direction = 'ingress' — at that point the pod IP has not yet been translated. Use ingress records to get the real pod IP in pkt_srcaddr.
All external calls from pods
Start here. This gives you every external IP your pods are reaching, ranked by connection count. No port filter — you want the full picture first.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT
pkt_srcaddr AS pod_ip,
pkt_dstaddr AS external_ip,
dstport,
protocol,
COUNT(*) AS connection_count,
SUM(bytes) AS total_bytes,
MIN(from_unixtime("start")) AS first_seen,
MAX(from_unixtime("end")) AS last_seen
FROM vpc_flow_logs_nat
WHERE
flow_direction = 'ingress'
AND action = 'ACCEPT'
AND pkt_dstaddr NOT LIKE '10.%'
AND pkt_dstaddr NOT LIKE '172.16.%'
AND pkt_dstaddr NOT LIKE '192.168.%'
GROUP BY pkt_srcaddr, pkt_dstaddr, dstport, protocol
ORDER BY connection_count DESC
LIMIT 100;
Take the pkt_dstaddr values from this result into Step 6 — matching them against the Rdata field in Route 53 Resolver query logs will resolve these IPs to domain names.
To map a pod IP back to a Kubernetes workload:
1
kubectl get pods --all-namespaces -o wide | grep "<pkt_srcaddr-from-query>"
Check for pods bypassing the VPC resolver (direct external DNS)
As a separate check, filter specifically on port 53. If any pods show up here, they are sending DNS queries directly to a public resolver (e.g. 8.8.8.8, 1.1.1.1) instead of going through CoreDNS and the VPC resolver. This is a misconfiguration worth fixing — it bypasses your private hosted zone resolution and any DNS Firewall rules you have in place.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
pkt_srcaddr AS pod_ip,
pkt_dstaddr AS external_dns_resolver,
COUNT(*) AS query_count,
SUM(bytes) AS total_bytes,
MIN(from_unixtime("start")) AS first_seen,
MAX(from_unixtime("end")) AS last_seen
FROM vpc_flow_logs_nat
WHERE
dstport = 53
AND action = 'ACCEPT'
AND flow_direction = 'ingress'
AND pkt_dstaddr NOT LIKE '10.%'
AND pkt_dstaddr NOT LIKE '172.16.%'
AND pkt_dstaddr NOT LIKE '192.168.%'
GROUP BY pkt_srcaddr, pkt_dstaddr
ORDER BY query_count DESC
LIMIT 50;
Step 6: Resolve IPs to Domain Names with Route 53 Resolver Query Logs
VPC flow logs tell you that pod 10.0.1.45 connected to 142.250.80.46 on port 443 — but they don’t tell you the domain name. To get the actual hostname, you need Route 53 Resolver query logging. This is different from Route 53 public hosted zone query logging — it captures DNS queries made by resources inside your VPC, including every pod in your EKS cluster.
Each resolver query log entry is a JSON record that includes the domain name queried, the DNS response code, and the resolved IP in the answers field. This lets you map an external IP back to its domain name — which is the gap VPC flow logs leave.
EKS caveat: The
srcaddrin Resolver logs is the CoreDNS ClusterIP (172.20.0.10), not the originating pod IP — see Lessons Learned for the full explanation.
Caching note: Resolver query logging only records unique queries. Subsequent lookups served from the VPC resolver’s cache (within TTL) are not logged, so you may not see every call.
Enable Resolver Query Logging
Unlike VPC flow logs (which auto-attach the required S3 policy), Route 53 Resolver requires you to add the bucket policy manually first — otherwise you get RSLVR-01605 Missing permission to log destination.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Save this as resolver-bucket-policy.json
cat > resolver-bucket-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AWSLogDeliveryWrite",
"Effect": "Allow",
"Principal": { "Service": "delivery.logs.amazonaws.com" },
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-vpc-flow-logs-bucket/resolver-query-logs/AWSLogs/<account-id>/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control",
"aws:SourceAccount": "<account-id>"
}
}
},
{
"Sid": "AWSLogDeliveryAclCheck",
"Effect": "Allow",
"Principal": { "Service": "delivery.logs.amazonaws.com" },
"Action": "s3:GetBucketAcl",
"Resource": "arn:aws:s3:::my-vpc-flow-logs-bucket",
"Condition": {
"StringEquals": { "aws:SourceAccount": "<account-id>" }
}
}
]
}
EOF
# Apply the policy (replace <account-id> with your actual account ID)
aws s3api put-bucket-policy \
--bucket my-vpc-flow-logs-bucket \
--policy file://resolver-bucket-policy.json
Now create the logging config:
1
2
3
4
aws route53resolver create-resolver-query-log-config \
--name eks-vpc-resolver-logs \
--destination-arn "arn:aws:s3:::my-vpc-flow-logs-bucket/resolver-query-logs/" \
--creator-request-id "eks-resolver-logs-$(date +%s)"
Note the Id from the response — you’ll need it to associate with your VPC:
1
2
3
4
# Associate the config with your VPC
aws route53resolver associate-resolver-query-log-config \
--resolver-query-log-config-id rqlc-XXXXXXXXXXXXXXXXX \
--resource-id vpc-XXXXXXXXXXXXXXXXX
Verify the association is active:
1
2
3
4
aws route53resolver list-resolver-query-log-config-associations \
--filters Name=Status,Values=ACTIVE \
--query "ResolverQueryLogConfigAssociations[*].{ConfigId:ResolverQueryLogConfigId,VPC:ResourceId,Status:Status}" \
--output table
Create the Athena Table
Resolver logs land in S3 as newline-delimited JSON (one record per line), so the table uses the OpenX JSON SerDe:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE EXTERNAL TABLE IF NOT EXISTS resolver_query_logs (
version STRING,
account_id STRING,
region STRING,
vpc_id STRING,
query_timestamp STRING,
query_name STRING,
query_type STRING,
query_class STRING,
rcode STRING,
answers ARRAY<STRUCT<
Rdata: STRING,
Type: STRING,
Class: STRING
>>,
srcaddr STRING,
srcport STRING,
transport STRING,
srcids STRUCT<
instance: STRING,
resolver_endpoint: STRING
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://my-vpc-flow-logs-bucket/resolver-query-logs/AWSLogs/<account-id>/vpcdnsquerylogs/<vpc-id>/';
Resolve an IP to Its Domain Name
Given an external IP you spotted via pkt_dstaddr in VPC flow logs (e.g. 142.250.80.46), find which domain it resolved from:
1
2
3
4
5
6
7
8
9
10
11
SELECT
query_name AS domain,
query_type,
rcode,
answer.Rdata AS resolved_ip,
query_timestamp
FROM resolver_query_logs
CROSS JOIN UNNEST(answers) AS t(answer)
WHERE answer.Rdata = '142.250.80.46'
ORDER BY query_timestamp DESC
LIMIT 20;
Join Flow Logs and Resolver Logs for the Full Picture
Join on the resolved IP (answer.Rdata = fl.pkt_dstaddr) — not on srcaddr, because the source in Resolver logs is CoreDNS, not the individual pod:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT
fl.pkt_srcaddr AS pod_ip,
rl.query_name AS domain,
fl.pkt_dstaddr AS external_ip,
fl.dstport,
SUM(fl.bytes) AS total_bytes,
COUNT(*) AS connection_count
FROM vpc_flow_logs_nat fl
JOIN (
SELECT query_name, answer.Rdata AS resolved_ip
FROM resolver_query_logs
CROSS JOIN UNNEST(answers) AS t(answer)
) rl ON fl.pkt_dstaddr = rl.resolved_ip
WHERE
fl.flow_direction = 'ingress'
AND fl.action = 'ACCEPT'
AND fl.pkt_dstaddr NOT LIKE '10.%'
GROUP BY fl.pkt_srcaddr, rl.query_name, fl.pkt_dstaddr, fl.dstport
ORDER BY total_bytes DESC
LIMIT 50;
This gives you the pod IP from VPC flow logs, the domain name behind the destination IP from Resolver logs, and how much data was transferred. You still get the full picture — you just can’t trace the DNS query itself back to the individual pod without CoreDNS logs.
Lessons Learned
A few things I got wrong or had to figure out the hard way while setting this up.
Route 53 Resolver logs give you CoreDNS’s IP, not the pod IP
When you look at the srcaddr field in Route 53 Resolver query logs, you’ll see something like 172.20.0.10 as the source for almost every DNS query. That’s not a pod — it’s the ClusterIP of the CoreDNS (kube-dns) service in EKS, which is the default internal address (172.20.0.10) assigned when the cluster is created.
The actual DNS resolution chain in EKS, and how the pod then uses the resolved IP to reach the external service:
flowchart LR
subgraph dns[" DNS Resolution "]
direction TB
pod1([" EKS Pod 10.0.1.45 "]):::podStyle
coredns([" CoreDNS 172.20.0.10 "]):::corednsStyle
vpc([" VPC Resolver 169.254.169.253 "]):::resolverStyle
r53([" Route 53 Resolver "]):::r53Style
pod1 -- "① query: api.example.com" --> coredns
coredns -- "② forward external" --> vpc
vpc -- "③ resolve" --> r53
r53 -. "④ 142.250.80.46" .-> vpc
vpc -. "⑤" .-> coredns
coredns -. "⑥ return IP to pod" .-> pod1
end
subgraph conn[" TCP Connection "]
direction TB
pod2([" EKS Pod 10.0.1.45 "]):::podStyle
nat([" NAT Gateway ENI pkt-srcaddr = pod IP "]):::natStyle
ext([" External Service 142.250.80.46 "]):::extStyle
pod2 -- "⑦ connect :443" --> nat
nat -- "⑧ SNAT → EIP" --> ext
ext -. "⑨ response" .-> nat
nat -. "⑩ DNAT → pod" .-> pod2
end
dns -. " resolved IP triggers connection " .-> conn
classDef podStyle fill:#eef2ff,stroke:#6366f1,color:#3730a3
classDef corednsStyle fill:#f5f3ff,stroke:#8b5cf6,color:#5b21b6
classDef resolverStyle fill:#ecfeff,stroke:#06b6d4,color:#155e75
classDef r53Style fill:#fffbeb,stroke:#f59e0b,color:#92400e
classDef natStyle fill:#ecfdf5,stroke:#10b981,color:#064e3b
classDef extStyle fill:#f8fafc,stroke:#64748b,color:#334155
By the time the query reaches the Route 53 Resolver, the original pod IP is gone — CoreDNS is the caller. So Resolver logs are useful for mapping IPs to domain names (via the answers.Rdata field), but they cannot tell you which pod initiated the DNS lookup. That’s why the join query in Step 6 joins on pkt_dstaddr = resolved_ip, not on source IP.
CoreDNS / node-local-dns logs can give you pod IP + domain, but with trade-offs
If you need to trace a specific pod to a specific DNS query, CoreDNS query logs are the right tool — they log the client pod IP and the queried domain directly. You can enable them by editing the CoreDNS ConfigMap:
1
kubectl edit configmap coredns -n kube-system
Add log to the Corefile block:
1
2
3
4
5
6
.:53 {
log # <-- adds per-query logging
errors
health
...
}
But be aware of the trade-offs before you flip this on in production:
- Scope: CoreDNS logs only cover workloads inside the EKS cluster. VMs, Lambda functions, ECS tasks, or any other compute that routes through the same NAT gateway won’t appear here.
- Resource cost: Enabling query-level logging (
logplugin) significantly increases CoreDNS CPU and memory usage under load. In a busy cluster, DNS query volume is high — every pod startup, every service call, every health check generates DNS queries. This has caused CoreDNS OOM kills in clusters that weren’t sized for it. - Log volume: CoreDNS debug logs are verbose. Without a log aggregation pipeline that can handle the volume, you’ll either drop logs or run up a large CloudWatch bill.
If you just need occasional ad-hoc investigation (not continuous monitoring), a better approach is to temporarily enable CoreDNS logging, capture what you need, then disable it — rather than leaving it on permanently.
Conclusion
VPC flow logs on the NAT gateway ENI — queried via pkt-srcaddr — tell you which pod is talking to which external IP and how much. Route 53 Resolver query logs tell you the domain name behind that IP. R53 Resolver logs won’t expose individual pod IPs (CoreDNS is the caller), but joining on the destination IP closes the loop: you go from “something is hitting 142.250.80.46” to “pods are connecting to telemetry.vendor.com” in a single Athena query.
Related Resources
- VPC Flow Log Record Fields Reference
- Creating Flow Logs to Amazon S3
- VPC Flow Logs Basics
- Querying VPC Flow Logs with Athena
- Route 53 Resolver Query Logging