Symptoms:
While Veritas cluster is monitoring the VMDKs on ESXi, the communication fails with error.
Error: 2025/03/10 18:40:01 VCS ERROR V-16-10061-22516 VMwareDisks:FAPI_INT_Vmware_disks:monitor:Failed to check if the disk '[Datastore] VM_Name/VM_name.vmdk' is valid on ESX 10.##.##.## with error 'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]"Error observed by underlying SSL/TLS BIO: Connection reset by peer"Detail: SSL_connect error in tcp_connect()
VMware ESXi 7.0
VMware ESXi 8.0
When the TCP connection reset occur between the virtual machine and ESXi, it would cause the communication failure and causing the errors on the Veritas cluster VM.
Packet capture capture and review shows the below healthy and failure events.
- When the VCS is logging into the ESXi host and querying the disks, we see it is completing successfully some time. The successful network stack looks as below.
68525 276.411417 10.##.##.196 10.##.##.87 TCP 74 43297 → 443 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM TSval=1886951776 TSecr=0 WS=12868526 276.411431 10.##.##.87 10.##.##.196 TCP 74 443 → 43297 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=512 SACK_PERM TSval=2509415166 TSecr=188695177668527 276.411564 10.##.##.196 10.##.##.87 TCP 66 43297 → 443 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=1886951776 TSecr=250941516668528 276.411565 10.##.##.196 10.##.##.87 TLSv1.2 583 Client Hello68529 276.413838 10.##.##.87 10.##.##.196 TLSv1.2 1632 Server Hello, Certificate, Server Key Exchange, Server Hello Done68530 276.414396 10.##.##.196 10.##.##.87 TCP 66 43297 → 443 [ACK] Seq=518 Ack=1567 Win=17536 Len=0 TSval=1886951778 TSecr=250941516768531 276.414397 10.##.##.196 10.##.##.87 TLSv1.2 192 Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message68532 276.415002 10.##.##.87 10.##.##.196 TLSv1.2 117 Change Cipher Spec, Encrypted Handshake Message68533 276.415188 10.##.##.196 10.##.##.87 TLSv1.2 1325 Application Data68538 276.416322 10.##.##.87 10.##.##.196 TLSv1.2 854 Application Data68539 276.416474 10.##.##.87 10.##.##.196 TCP 66 443 → 43297 [FIN, ACK] Seq=2406 Ack=1903 Win=66560 Len=0 TSval=2509415167 TSecr=188695178068540 276.416637 10.##.##.196 10.##.##.87 TLSv1.2 97 Encrypted Alert68541 276.416637 10.##.##.196 10.##.##.87 TCP 66 43297 → 443 [FIN, ACK] Seq=1934 Ack=2406 Win=20480 Len=0 TSval=1886951781 TSecr=2509415167
- When we see the failure in the communication causing the reset (RST) and causing the SSL handshake failure returning the error on guest as SSL error.
68542 276.416659 10.##.##.87 10.##.##.196 TCP 60 443 → 43297 [RST] Seq=2406 Win=0 Len=068543 276.416663 10.##.##.87 10.##.##.196 TCP 60 443 → 43297 [RST] Seq=2406 Win=0 Len=068544 276.416665 10.##.##.196 10.##.##.87 TCP 66 43297 → 443 [ACK] Seq=1935 Ack=2407 Win=20480 Len=0 TSval=1886951781 TSecr=250941516768545 276.416671 10.##.##.87 10.##.##.196 TCP 60 443 → 43297 [RST] Seq=2407 Win=0 Len=0
The above reset (RST) confirms that there was a reset of connection and broken communication.
This needs network investigation between the virtual machine (Veritas VM and ESXi host). Please involve network engineer to look into the issue.
Since the network reset is causing the SSL errors, we should investigate from network between host and client.