ESXi is not able to connect with VASA Provider due to TCP Failure
book
Article ID: 412434
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
When a VASA Provider experiences an unexpected restart due to infrastructure events such as power failures, system upgrades, or operational maintenance, all connected ESXi hosts must simultaneously re-establish their connections with the VASA Provider.
Another scenario which can cause large number of concurrent VASA Provider `setContext` such as certificate rotation events or network infrastructure resets, the VASA Provider may become overwhelmed by the sudden influx of parallel `setContext` from multiple ESXi hosts.
You would notice the below or similar setContext failure in /var/run/vvold logs in such case
The root cause stems from inadequate admission control mechanisms within the VASA Provider architecture.
When the provider lacks proper request throttling and queuing mechanisms, it accepts connection requests beyond its processing capacity.
This results in resource exhaustion and system unresponsiveness, where the provider cannot effectively manage the concurrent load generated by multiple ESXi hosts attempting simultaneous reconnection.
Resolution
There is no code fix for this issue.
The below work around can be applied
In environments where the VASA Provider lacks built-in admission control capabilities, manual intervention is required to implement controlled connection management.
The recommended approach involves temporarily stopping the VVOL daemon on all ESXi hosts and implementing a phased restart strategy to prevent connection storms.
Implementation Steps:
1. Stop VVOL daemon on all ESXi hosts: /etc/init.d/vvold stop
2. Implement phased restart strategy:
Restart VVOL daemons in small batches (recommended: 5-10 hosts per batch)
Allow 2-3 minutes between each batch to ensure stable connections
Monitor VASA Provider response times before proceeding to next batch
3. Start VVOL daemon on each batch: /etc/init.d/vvold start
This connection storm phenomenon has been observed with specific VASA Provider implementations that lack sophisticated load balancing and admission control features.