Virtual Machines network interface is blocked on ESXi host

Products

VMware NSX

Issue/Introduction

Virtual Machines lost network connectivity when vMotioned.
VMs enter blocked status, which directly impacts production workloads.
VMs may show NIC disconnected inside guest OS.

On the ESXi host running the affected VM, the DVS shows the switchport for that VM is "blocked".

Retrieve the switchport ID with the command: net-stats -l

PortNum        Type SubType SwitchName       MACAddress         ClientName
####123           4       0 DvsPortset-#     00:50:56:##:##:##  ########.eth0

Identify the switchport status with the command: net-dvs -l

com.vmware.common.port.volatile.status = inUse linkUp blocked portID=####123 Port blocked by admin propType = RUNTIME

Log lines similar to the below are encountered on the ESXi host in /var/log/nsx-syslog.log
The HOSTD_ATTACH_PORT call is followed by repeated SYNC_ATTACH_PORT for the impacted VM until the issue is remediated or the VM is migrated off the ESXi host.

nsx-opsagent[3287033]: NSX 3287033 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="24216973" level="INFO"] [DoVifPortOperation] request=[opId:[CdrsLoadBalancer-] op:[HOSTD_ATTACH_PORT(1)] vif:[########-####-####-####-############] ls:[########-####-####-####-############] vmx:[/vmfs/volumes/########-####-####-####-############/########/########.vmx] lp:
nsx-opsagent[3287033]: NSX 3287033 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="3287483" level="INFO"] [DoVifPortOperation] request=[opId:[sync-attach-5] op:[SYNC_ATTACH_PORT(1001)] vif:[########-####-####-####-############] ls:[########-####-####-####-############] vmx:[/vmfs/volumes/########-####-####-####-############/########/########.vmx] lp:
nsx-opsagent[3287033]: NSX 3287033 - [nsx@6876 comp="nsx-esx" subcomp="mpa-client" tid="3287483" level="INFO"] [SwitchingVertical] SendRequest: To Master APH,  type (com.vmware.nsx.switching.VifMsg) correlationId () trackingIdStr (########-####-####-####-########12ab) Success.
nsx-proxy[3286484]: NSX 3286484 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="3286484" level="INFO"] MessagingClientService: Heartbeat message received in FrameworkUnifiedMsg from endpoint: ssl://###.###.##.##:1234 client_id: ########-####-####-####-############

Log lines similar to the below are encountered on the NSX Manager in /var/log/proton/nsxapi.log
In the previous step, we identified the trackingIdStr ########-####-####-####-########12ab and the IP address ###.###.##.## of the NSX Manager assigned to this ESXi Transport node.

ERROR GmleClientBlockingOpsThread-1 Lease 77237 - [nsx@6876 comp="nsx-manager" errorCode="GML206" level="ERROR" s2comp="lease" subcomp="manager"] Unable to get LeadershipLease for service POLICY_SVC_ROUTING on member ########-####-####-####-########12ab of group ########-####-####-####-############.
org.bouncycastle.crypto.fips.FipsOperationError: proportionate test failed

Log lines similar to the below are encountered on the NSX Manager in /var/log/syslog

NSX 78039 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="manager"] Handler dispatch failed; nested exception is org.bouncycastle.crypto.fips.FipsOperationError: proportionate test failed

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 9.x
VMware NSX 4.x

Cause

Modules running on NSX Manager are FIPS compliant and use BCFIPS module to maintain this compliance.

The error org.bouncycastle.crypto.fips.FipsOperationError: proportionate test failed indicates that the BouncyCastle's FIPS-certified cryptographic module failed its continuous self-testing requirements. This is a built-in safety mechanism in FIPS 140-2/140-3 certified cryptographic modules. When this self test fails, modules running on NSX Manager initialize but get into an error state.
As all the modules on the NSX Manager are in error state, when a VM would move/power-on to an ESXi host connected to an NSX Manager in this state, the port attach calls would fail to be serviced, resulting in the VM port to go in a blocked state.

Resolution

This is a known issue impacting VMware NSX.

Workaround:
Reboot of the NSX Manager node where you observe the error org.bouncycastle.crypto.fips.FipsOperationError (files and exact syntax in symptom section).
The reboot will resolve the instance but the issue may re-occur.