Recent MTU changes were made on the Virtual Distributed Switch (VDS) just prior to the outage.
NSX Manager is offline or unreachable via its management interface. Attempts to ping or SSH into NSX Manager fail.
ESXi hosts show disconnected or inaccessible datastores, including those where NSX VMs' file are located.
The MTU on the Virtual Distributed Switch (VDS) was decreased (from 9000 to 1500, for example), resulting in a mismatch with the VMkernel interfaces and NFS storage configured for jumbo frames. Traffic between ESXi hosts and the storage backend began traversing a VDS path that could not accommodate the larger frame size, leading to packet fragmentation or drops. This disrupted access to datastores hosting NSX Manager and vCenter Server VMs, rendering them unreachable.
Temporary Network Reconfiguration
*Many of the steps below can be done with UI access to an ESXi host or by using command line on ESXi.
*References for similar CLI steps at Configuring Standard vSwitch (vSS) or virtual Distributed Switch (vDS) from the command line in ESXi
Create a Temporary Virtual Standard Switch (VSS) if one does not already exist
Configure with MTU 9000
Create a temporary portgroup (PG) on the the VSS.
Move a VMNIC from the VDS to the new VSS PG.
Reconfigure the vmk that was was being used to connect ESXi to the network backed storage. It will have to be deleted from the VDS first before adding a new one on the temporary VSS PG.
To remove a vmk from the VDS by command line:
esxcli network ip interface remove --interface-name=vmk<VMK_NUMBER>
To recreate the vmk on the VSS PG by command line:
# 1. Create a VMkernel interface on the temporary port group
esxcli network ip interface add --interface-name=vmk<VMK_NUMBER> --portgroup-name=<TEMP_PG_NAME>
# 2. Assign a static IPv4 address and netmask to the interface
esxcli network ip interface ipv4 set --interface-name=vmk<VMK_NUMBER> --ipv4=<IP_ADDRESS> --netmask=<NETMASK> --type=static
# 3. Add a static route for the specified network
esxcli network ip route ipv4 add --gateway <GATEWAY_IP> --network <NETWORK_CIDR> --interface=vmk<VMK_NUMBER>
Set the VLAN on the Temporary Portgroup, if applicable, ensuring it matches the NFS network configuration.
Verify backend storage is configured properly and will be able to communicate with the ESXi host
Test Connectivity from ESXi to the network backed storage server
vmkping -I vmk<VMK_NUMBER> -d <StorageServer_IP_OR_FQDN>
vmkping -I vmk<VMK_NUMBER> -s 9000 <StorageServer_IP_OR_FQDN>
Mount the NFS Datastore (if needed)
Verify vCenter VM Visibility and Network Access
Confirm vCenter files are visible
Ensure the VM is registered and can power on
If network access fails, edit the vCenter's VM settings to attach a NIC to the temporary VSS portgroup
Restore VDS MTU to 9000 via vCenter UI
*This should re-establish storage access for other ESXi hosts
Final Cleanup
Migrate back to VDS and remove the temporary VSS
Place the host with the temporary VSS into Maintenance Mode
Use the vCenter UI to migrate vmnic2 and vmk3 back to the VDS as they were previously
*Refer to Migrate VMkernel Adapters to a vSphere Distributed Switch
Validate the network configuration before finally removing the temporary VSS through the vCenter or host UI and exiting Maintenance Mode
Refer also to NSX documentation, Guidance to Set Maximum Transmission Unit
Handling Log Bundles for offline review with Broadcom support: