You see the following error when attempting to add storage controller.
The task "Discover NVMe over Fabrics controllers" fails with the error "An error occurred during host configuration. Operation failed, diagnostics report: Unable to discover"
ESXi 7.x
ESXi 8.X
ESX 9.x
ESXi hosts are unable to vmkping the NVMe-oF (TCP or RDMA) controller IP addresses on a storage array.
Connectivity fails with both standard (1500) and jumbo (9000) frame sizes.
NVMe Discovery service fails to return any targets.
Status of the NVMe-oF adapter in the vSphere Client shows as "Online" but no paths are discovered.
Ensure the physical path is active before testing the logical stack.
Verify the physical uplink (vmnic) is "Up" and negotiated at the correct speed: esxcli network nic get -n vmnicX
Confirm the VMkernel port (vmk) is associated with the correct Virtual Switch and physical uplink.
Check the ARP table to see if the ESXi host has learned the MAC address of the storage array: esxcli network ip neighbor list
Note: If the MAC address is "Incomplete" or missing, the issue is at Layer 2 (VLAN tagging, cabling, or switch port configuration).
NVMe-oF is highly sensitive to MTU mismatches. Even if "Jumbo Frames" are enabled, a single device in the path (Host, Switch, or Array) at 1500 MTU will drop packets.
Perform a "Do Not Fragment" ping test to find the failure point:
Standard (1500 MTU): vmkping -I vmkX -d -s 1472 <Array_IP>
Jumbo (9000 MTU): vmkping -I vmkX -d -s 8972 <Array_IP>
This is the most reliable way to check if a specific port is open on the storage array from a specific VMkernel interface.
For NVMe Discovery Service (Port 8009): nc -i vmkX-z <Array_IP> 8009
nc -i vmkX-z <Array_IP> 4420Success: Connection to <Array_IP> 8009 port [tcp/*] succeeded!
Failure: Connection refused or Connection timed out.
Unlike iSCSI, some NVMe controllers will not respond to ICMP pings until a valid NQN (NVMe Qualified Name) handshake or discovery attempt is initiated, or if the host is not yet mapped on the array side.
Retrieve Host NQN: esxcli nvme info get
Ensure this NQN is whitelisted/registered on the storage array.
Verify the NVMe-TCP or RDMA modules are loaded: esxcli system module list | grep nvme
If using NVMe-over-RDMA, standard pings may fail if the RDMA fabric is not "Lossless."
Ensure Priority Flow Control (PFC) or Global Pause is configured on all physical switches.
Check if the RDMA device is active: esxcli network nic rdma device list
Once these steps are completed and verified, the NVMe Controllers/Drives will appear in the vSphere Client under Storage Adapters or as selectable PCI Devices.
Running the command esxcli nvme controller list should now return a list of active controllers, and the paths to the NVMe namespaces (LUNs) will be present and "Active" in the storage path view.
IF STEPS FAIL: Reach out to the customer's networking administrator to investigate potential firewall blocks or switch-level VLAN mismatches.
Port 8009: Ensure the NVMe-oF Discovery service port (default 8009) is not blocked by physical firewalls between the host and array.
VLAN Tagging: If using a tagged VLAN, ensure the VMkernel port has the ID set and the switch port is in "Trunk" mode.
Helpful Commands
|
Action |
Command |
Expected Result |
|
Verify Port |
|
Connection succeeded |
|
Verify MTU |
|
0% packet loss Jumbo Frame |
|
Verify Session |
|
ESTABLISHED |
|
Discovery |
|
esxcli nvme fabric discovery add |
|
Controller |
|
esxcli nvme controller list |
|
Device |
|
esxcli nvme namespace list |