How identify and capture tcpdump on pods in TKGI env.

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

To collect tcpdump on pods to verify the traffic and review the network status open ports sessions and other useful information on pod interfaces

Environment

TKGI

Resolution

Steps to collect the tcpdump on pods running in tkgi env

Find the worker node IP where pod is running.

kubectl -n namespace get pods -o wide

ubuntu@opsmanager-3-0:~$ kubectl -n kube-system get pods -o wide | grep -I coredns

coredns-85dd57f7df-h7hp4                                          1/1     Running   0          9h    10.200.7.9     7152a27b-3de3-4a04-9dbe-02538602c3e7   <none>           <none>

Find the IP of the node 

ubuntu@opsmanager-3-0:~$ kubectl get nodes -o wide | grep -i 7152a27b-3de3-4a04-9dbe-02538602c3e7
7152a27b-3de3-4a04-9dbe-02538602c3e7   Ready    <none>   9h    v1.30.7+vmware.1   192.168.4.19   192.168.4.19   Ubuntu 22.04.5 LTS   5.15.0-134-generic   containerd://1.7.23+vmware.2

Identify the bosh worker node with IP

ubuntu@opsmanager-3-0:~$ bosh vms | grep -i 192.168.4.19
worker/241b0dd3-dafd-4680-b606-5cc37a3ec089	running	az2	192.168.4.19	vm-94d6bcc5-bdc9-4256-ae60-e663003b0434	xlarge.disk	true	bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.803

ssh to the node

 bosh -d service-instance_ee4b47b0-4445-43e5-8804-2cd60b659a66 ssh worker/241b0dd3-dafd-4680-b606-5cc37a3ec089

find the pod name - "crictl ps | grep -I podname"

worker/241b0dd3-dafd-4680-b606-5cc37a3ec089:~# crictl ps | grep -i coredns
cc36a7e81dd07       7a745fb0f1cfd       10 hours ago        Running             coredns                     0                   940b1e255bc5a       coredns-85dd57f7df-h7hp4

find the pid - "crictl inspect containeridfromlastcommand | grep -I pid"

worker/241b0dd3-dafd-4680-b606-5cc37a3ec089:~# crictl inspect cc36a7e81dd07 | grep -i pid
    "pid": 11643,

Enter into the namespacewhere pod is running - "nsenter -n -t pid"

worker/241b0dd3-dafd-4680-b606-5cc37a3ec089:~# nsenter -n -t 11643

To confirm if we are in the pod namespace verify the pod IP

worker/241b0dd3-dafd-4680-b606-5cc37a3ec089:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether aa:72:a5:ac:d6:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.200.7.9/24 brd 10.200.7.255 scope global eth0
       valid_lft forever preferred_lft forever

Then capture the tcpdumps

Additional Information

Alternatively using same approach to identify the Pod and worker node and ssh to the worker node from above but instead of nsenter exec into the CNI interface directly

ip a |awk -F ": " {'print $2'} | awk -F "@" '{if (NF > 1) print $1}' | while read iface; do echo $iface; ns=$(ip a show "$iface" | grep -o cni[^\s]*) && sudo ip netns exec "$ns" ip addr show dev eth0 && echo "ns is $ns"  ; done

This command works on NSX and Antrea CNI Identify the IP assigned to the pod and then exec to the network namespace here is sample output from Antrea:

2: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether e2:c1:f0:e4:8b:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.xxx.0.24/24 brd 10.xxx.0.255 scope global eth0
       valid_lft forever preferred_lft forever
ns is cni-cc2f32c2-972b-0224-2225-6aba15c261fb
fluent-b-3bc209
2: eth0@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether c6:82:fa:c1:11:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.xxx.0.25/24 brd 10.200.0.255 scope global eth0
       valid_lft forever preferred_lft forever
ns is cni-f3e6f625-a798-52fa-471a-08b2657cb7a3

Exec into the NameSpace on the container

ip netns exec  cni-cc2f32c2-972b-0224-2225-6aba15c261fb bash

Confirm the IP of the Pod

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether e2:c1:f0:e4:8b:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.xxx.0.24/24 brd 10.xxx.0.255 scope global eth0
       valid_lft forever preferred_lft forever

Run your tcpdump command

tcpdump -i any -n

Run other commands related to the same network:

netstat -putan

Once the troubleshooting is complete exit the namespace with exit

Below is an output sample from above command using NSX as CNI:

251: eth0@if252: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:50:56:00:34:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.xx.xxx.4/24 scope global eth0
       valid_lft forever preferred_lft forever
ns is cni-1f5a5494-2a94-88d4-5c45-ba9803ddccfc
49e66171594c042
253: eth0@if254: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:50:56:00:bc:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.xx.xxx.3/24 scope global eth0
       valid_lft forever preferred_lft forever
ns is cni-54bcbec3-76f7-bc99-a138-1b40eb046711