Cluster Bootstrap Fails with SSL_ERROR_SYSCALL due to NSX Edge TEP MTU Mismatch
search cancel

Cluster Bootstrap Fails with SSL_ERROR_SYSCALL due to NSX Edge TEP MTU Mismatch

book

Article ID: 429885

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service VMware NSX

Issue/Introduction

A newly create Guest cluster Control Plane node hangs during bootstrap initialization.

The status of the machine object remains Provisioned.

Inside the node, /var/log/cloud-init-output.log shows repeated connection timeouts or OpenSSL SSL_connect: SSL_ERROR_SYSCALL errors when attempting to reach the Supervisor VIP (Kubernetes API).

Standard connectivity tests (small ping) may succeed, but TLS handshakes fail.

Prerequisite checks:

Use the following checks to validate the MTU and isolate the failure domain.

Phase 1 - Verify Issue from the Control Plane Node (Guest OS) by perfomring a ping test to determine if large packets are being dropped.

SSH into the failing Control Plane node using the username - vmware-system-user.

Then test connectivity with a Standard payload (Total size 1500 bytes):

ping -c 4 -M do -s 1472 <Supervisor_VIP_IP>

Result: If this fails (100% loss), packet fragmentation is occurring.

Test connectivity with a Reduced payload (Total size 1400 bytes):

ping -c 4 -M do -s 1372 <Supervisor_VIP_IP>

Result: If this succeeds, the issue is confirmed as MTU restriction.

Phase 2 - Isolate the Physical Bottleneck (ESXi Host)
Use vmkping to determine if the issue is local (Host Uplink) or remote (Path to Edge).

1. Identify IPs:

Local TEP Gateway: Run 'esxcli network ip route ipv4 list -N vxlan' to find the gateway for the vxlan stack.

Destination Edge TEP: In NSX Manager, go to System > Fabric > Nodes > Edge Transport Nodes > [Node] > Tunnels to find the Edge TEP IP.

2. Test Local Uplink Support (Source Check)
Verify your ESXi host can put a large packet on the wire.

Note - '-s 1572' = 1600 Bytes Total (Jumbo Frame check)

vmkping ++netstack=vxlan -I vmk10 -s 1572 -d <Local_TEP_Gateway_IP>

Success: Local Physical Switch and Host NIC are correctly configured for Jumbo Frames.

Failure: The issue is the local physical switch port.

3. Test Path to Edge Node (Destination Check)
Verify the packet can reach the Edge Node where the Supervisor VIP lives.

vmkping ++netstack=vxlan -I vmk10 -s 1572 -d <Edge_TEP_IP>
Failure: Confirms the physical network drops Jumbo Frames somewhere between the Host and the Edge Node.

 

 

Environment

VCF 9.0.1

VMware NSX-T Data Center 

VKS 3.4.1+v1.33

Cause

The issue is caused by a path MTU (Maximum Transmission Unit) Mismatch.
NSX-T uses Geneve encapsulation, which adds approximately 100 bytes of overhead to every packet.
A standard 1500-byte payload sent by a TKG VM becomes ~1600 bytes on the physical wire.

If the physical network path between the Workload ESXi Host TEPs and the NSX Edge Node TEPs does not support Jumbo Frames (MTU 1600+), these large packets are dropped silently.

Resolution

Configure the physical network devices (Switches and Routers) between the ESXi TEP VLAN and Edge TEP VLAN to support MTU of 1600 or Jumbo Frames (MTU 9000).