Intermittent 503 Responses from contour with envoy in a VKS Guest Cluster
search cancel

Intermittent 503 Responses from contour with envoy in a VKS Guest Cluster

book

Article ID: 433847

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Intermittent 503 timeouts are observed when connecting to a healthy upstream service on a VKS cluster.
  • Grafana dashboards show that an application is sending 503 responses.
  • The vpxd logs indicate MAC address registrations for an IP that overlaps with a TKG worker node:
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (<mac address>, vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (<mac address>, vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (<mac address>, vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (<mac address>, vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (VM-Name-01:<IP Address 1 >, vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (VM-Name-02:<IP Address 2 > , vm-####): newId:true
    YYYY-MM-DDT10:34:20.996+##:## verbose vpxd[#73##1] [Originator@6876 sub=InvtId opID=DbParallelLoad-#######] Register (VM-Name -03:<IP Address 3 >, vm-####): newId:true
  • The envoy logs show entries suggesting the errors are being obtained from the backend service: 
    kubectl logs envoy-#### -n tanzu-system-ingress -c envoy | grep -i "503 "
    YYYY-MM-DDTHH:17:22.1832]"GET /api/.../... HTTP/2" 503 UF 0 91 2000 - "<Client IP>" "curl/7.76.1" "###-2431-4c17-96b1-###" "service-name . domain-name .com" "<Backend IP>: <PORT>"
    YYYY-MM-DDTHH:30:11.2252]"GET /api/.../... HTTP/2" 503 UF 0 91 2000 - "<Client IP>" "curl/7.76.1" "###-db09-4f37-abf8-###" "service-name.domain-name.com" "<Backend IP>: <PORT>"
    YYYY-MM-DDTHH4: 30:15.0222]"GET /api/.../... HTTP/2" 503 UF 0 91 2000 - "<Client IP>" "curl/7.76.1" "###-9308-4462-ae93-###" "service-name. domain-name.com" "<Backend IP >: <PORT>"

Environment

VMware vSphere Kubernetes Service
Contour with Envoy Tanzu Package 

Cause

This misconfiguration occurs when the vDS network configured as the Supervisor Workload Network is utilized for non-supervisor managed VMs.

A non-supervisor managed VM  shares an IP address with a TKG worker node on the same VLAN.
This duplicate IP assignment causes the non-supervisor VM to respond to ARP queries for the worker node, resulting in tunneled Antrea overlay traffic intended for the TKG worker node being misrouted to the incorrect interface.

Resolution

 

  • Identify the non-supervisor managed VM responding to ARP requests for the TKG worker node IP on the Supervisor Workload Network.

  • Remove the Network Interface Card (NIC) from the VM containing the conflicting IP address.

  • Validate that the vDS network configured as the Supervisor Workload Network is completely isolated from non-supervisor managed VMs.

 

Additional Information

Requirements for Cluster Supervisor Deployment with Avi Load Balancer and VDS Networking