NSX Host Status Degraded due to Mellanox InfiniBand pNIC Down State
search cancel

NSX Host Status Degraded due to Mellanox InfiniBand pNIC Down State

book

Article ID: 438539

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

In VMware Cloud Foundation (VCF) and Telco Cloud environments utilizing Mellanox InfiniBand adapters, NSX Manager may report an ESXi host status as Degraded. This occurs even when the InfiniBand adapters are not being used for NSX-managed traffic

Symptoms:

  • NSX Manager UI displays a "Degraded" or "Down" status for specific Transport Nodes.
  • The status is linked to a physical NIC (pNIC) being down.
  • The affected pNIC is a Mellanox InfiniBand adapter configured for SR-IOV.
  • RoCE (RDMA over Converged Ethernet) is disabled on the physical switch, resulting in a continuous "Link Down" state at the Ethernet layer.

Environment

VMware NSX

 

Cause

NSX doesn't naturally distinguish between a port used for NSX traffic (like your VM networks) and a port used for specialized tasks (like your InfiniBand SR-IOV).  Because Mellanox adapters require a vSwitch/VDS uplink to facilitate SR-IOV Virtual Functions (VFs), the "Link Down" state of the InfiniBand port (due to lack of RoCE/Ethernet link) is erroneously treated by NSX as a failure in the host's networking fabric, regardless of whether the pNIC is part of an NSX Transport Node Profile.

Resolution

Currently, this is expected behavior based on the NSX health monitoring architecture. Follow the below steps to verify that the port NSX is reporting about is indeed the InfiniBand port and not a real Ethernet failure.

Steps to do this: 

  1. Verify in the ESXi CLI using esxcli network nic list that the down interface corresponds to the InfiniBand adapter.
  2. Confirm the pNIC is not assigned to an NSX Uplink Profile or used by any NSX Segment.
  3. Once you have verified that this is related to a  InfiniBand adapter, in the NSX Manager UI, navigate to the Host Transport Node dashboard and acknowledge the degraded status if it is confirmed to be isolated to the non-NSX InfiniBand adapter.