Users may observe widespread virtual machine backup failures across multiple ESXi clusters managed by a single vCenter Server Appliance (VCSA). Common indicators include:
info vpxd[4085937] [Originator@6876 sub=vmomi.soapStub[774949] opID=7ac8f6ad] SOAP request returned HTTP failure; <<io_obj p:0x00007f740d5e8b48, h:250, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-150117/sdk>, method: fileManagement; code: 500(Internal Server Error); fault: (vim.fault.NotAuthenticated) {--> faultCause = (vmodl.MethodFault) null,--> faultMessage = <unset>,--> object = 'vim.Datastore:62aef28d-fe0e61a8-3da4-9efa524000e2',--> privilegeId = "Datastore.FileManagement"--> msg = "Received SOAP response fault from [...]: fileManagement. The session is not authenticated."--> }vCenter Server 8.x
The issue is caused by management plane resource exhaustion and session token cache desynchronization within the vpxd and envoy-hgw services.
In vSphere 8.x, third-party backup proxies using VADP authenticate via vCenter, which proxies data requests through the Envoy Host Gateway. A massive "API storm" (caused by concurrent heavy automation scripts and high-volume backup streams) can exceed session limits. When snapshots are bulk-requested, session tokens may sit idle in the proxy queue beyond the default 5-minute (300-second) timeout. Once these tokens expire, the management plane rejects the connections, potentially leading to a self-inflicted denial-of-service that hangs SSH/WinSCP sessions.
To resolve this issue, perform a full restart of the vCenter server services by running the following command:
service-control --stop --all && service-control --start --all
To resolve this without a full vCenter reboot, implement the following optimizations:
1. Stagger Operational Windows
Ensure that high-volume automation tasks (e.g., BladeLogic) and backup schedules do not run concurrently. Separate these windows to reduce the peak API transaction load.
2. Configure Client-Side Throttling
3. Targeted Service Remediation
If the NotAuthenticated state reoccurs, recover the proxy pipeline by restarting the specific services via SSH:
service-control --restart vmware-vpxd vmware-envoy-hgw
4. Proactive Alerting
Configure an alert in Aria Operations for Logs (formerly Log Insight) to monitor for proxy failure thresholds: