Antrea cluster application not accessible when using NodePortLocal
search cancel

Antrea cluster application not accessible when using NodePortLocal

book

Article ID: 415757

calendar_today

Updated On:

Products

VMware NSX VMware Tanzu Kubernetes Grid VMware vSphere Kubernetes Service VMware Avi Load Balancer

Issue/Introduction

K8S ingress service became inaccessible with a "Secure Connection failed" error with the following setup:

  • 1. Antrea CNI is used in a K8S cluster
  • 2. NodePortLocal (NPL) is enabled as part of the Antrea Agent
  • 3. External Load balancer (i.e, ALB)  consumes NPL port mappings published by the Antrea Agent 

In the Avi Web UI (Applications > Pools > Servers), the target pool displays a mixed health status, containing both healthy (green) and unhealthy (red/orange) pods.

Antrea agent logs reports missing POD IP address:

# kubectl -n kube-system logs antrea-agent-xxxxxx
I0918 13:26:09.700085       1 npl_controller.go:404] IP address not set for Pod: <NAMESPACE>/<POD_NAME>

The worker node iptables output has entries below:   DNAT rule without endpoint POD IP address

# iptables -t nat -S ANTREA-NODE-PORT-LOCAL
-A ANTREA-NODE-PORT-LOCAL -p tcp -m tcp --dport <port> -j DNAT --to-destination <MISSING POD IP>:<port>

Environment

  • vSphere Kubernetes Service
  • Antrea CNI (NodePortLocal)

Cause

NodePortLocal (NPL) rules missed the POD IP address after the Antrea Agent starts, hence traffic gets dropped at the worker node.

Resolution

The issue was fixed in VKR 1.33.6/1.32.10 and also backported into VKR 1.31.14

The workarounds can be either of the following:

  • Restart the antrea-agent pod
  • Recreate the endpoint Pods

Additional Information