When migrating a SimpliVity hyperconverged environment from standard virtual switches (vSS) to distributed virtual switches (vDS), ESXi hosts may become unresponsive or experience unexpected reboots. Symptoms include:
esxcli network nic list
fail with "Connection failed"This issue affects the entire hyperconverged infrastructure, potentially causing widespread VM inaccessibility and service disruption.
The logs from affected systems typically show a specific sequence of events:
NetPort: disabled port (PORTID) Net: disconnected client from port
(PORTID)
PCIPassthru: Freeing intr cookies of device 0000:##:00.0 for type:4, devIntrtype:4, devSts:0)
WARNING: CpuSched: Automatic relation removal from ######(vmx-vcpu-0:OmniStackVC-##-##-##-##, zombie) to ######(LSI-######:0)
NFS: Status:File system timeout (Ok to retry). Retrying synchronous write I/O 3 of 25 times
SunRPC: Synchronous RPC cancel for client 0x########## IP ##.##.##.##.#.# proc 1 xid 0x###### attempt 1 of 3
NFS: Status:No connection. Retrying synchronous write I/O 1 of 25 times
NFS: Status:No connection. Retrying synchronous write I/O 2 of 25 times
BC: write to host-####-hb (#### ## ######## ########
######## ########
######## ######## ######## ########) 8 bytes failed: File system timeout (Ok to retry) Log: Generating backtrace for ######: worker
BC: write to host-####-hb (#### ## ######## ########
######## ########
######## ######## ######## ########) 8 bytes
failed: No connection Log: Generating backtrace for ######: fdm
ALERT: BC: File host-####-hb closed with dirty buffers. Possible data loss.
WARNING: NFSLock: Unable to remove expired or lost primary lockfile .lck-############
Daemon amsd deactivated.
Daemon ntpd deactivated.
The migration from standard virtual switches to distributed virtual switches creates a circular dependency failure when not performed in the correct sequence:
This occurs because in a hyperconverged environment, the storage services are provided by VMs running on the same hosts that depend on that storage, creating a "chicken-and-egg" scenario when network changes affect both simultaneously.
To resolve this issue for affected hosts:
Verify basic network connectivity by confirming you can ping and SSH to the affected hosts.
For hosts that are operational but showing certificate errors:
Change the vpxd.certmgmt.mode from vmca to thumbprint mode in vCenter Server to allow all host certificates.
Regenerate certificates on the affected hosts.
Restart the management agents.
For hosts that are completely unresponsive:
Access the host via SSH.
Check and verify the endpoint.conf file is correctly configured.
Regenerate host certificates using the following command:
/sbin/generate-certificates
d. Restart management agents:
/etc/init.d/hostd restart
/etc/init.d/vpxa restart
After making these changes, wait approximately 30 minutes for the services to fully restart and establish connections.
Once hosts reconnect to vCenter, restart the OmniStack Virtual Controller (OVC) VMs.
Allow sufficient time (may be several hours) for the OVCs to synchronize and restore data services.
Review HPE SimpliVity documentation for the proper sequence of migrating hyperconverged environments to distributed virtual switches.
Ensure OVC connectivity is maintained throughout the migration by:
Creating the distributed switch and port groups first.
Migrating one physical uplink at a time.
Validating connectivity at each step.
Always keeping at least one management network connection active.