An ESXi host boots without the NSX vdrPort connected and logical routing does not function
search cancel

An ESXi host boots without the NSX vdrPort connected and logical routing does not function

book

Article ID: 321278

calendar_today

Updated On:

Products

VMware NSX for vSphere

Issue/Introduction

Symptoms:

  • After rebooting an ESXi host, East/West DLR routing does not function and VMs cannot communicate between Logical Switches.
  • Creation of the vdrPort fails.
  • In the /var/log/netcpa.log file of the ESXi host, you see entries similar to:

    2018-02-08T23:44:19.431Z error netcpa[3FFEDA29700] [Originator@6876 sub=Default] Failed to add vdr port on dvs 96 ff 2c 50 ## ## ## ##-## ## ## ## 90 19 41 5d, Not found

    Note: This happens because the locally cached configuration of the vSphere Distributed Switch can not be found on the host.
     
  • In the /var/log/hostd.log file of the ESXi host, you see the DVS registration occurs after the attempt to add the vdrPort with entries similar to:

    2018-02-08T23:44:28.389Z info hostd[4F540B70] [Originator@6876 sub=Hostsvc.DvsTracker] Registered Dvs 96 ff 2c 50 ## ## ## ##-## ## ## ## 90 19 41 5d
     
  • Running the net-vdr -C -l command on the ESXi host fails to list any vdrPorts connected.

    For example:

    [root@ESXi00002:~] net-vdr -C -l
    Host locale Id: ########-####-####-####-##########97
    Connection Information:
    -----------------------
    DvsName VdrPort NumLifs VdrVmac
    ------- ------- ------- -------


    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.



Environment

VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

This issue occurs due to a race condition where the Distributed Switch initialization activities occur after the host attempts to connect the vdrPort. The operation fails and is not retried. The registration of the ESXi host with the distributed switch can be delayed for a number of reasons including slow vCenter Server performance or the varying speed of the post-boot activities of host profiles on stateless auto-deploy ESXi hosts.

Resolution

This issue is resolved in:



Workaround:
To work around this issue if you do not want to upgrade, restart the control plane service (netcpa) on the ESXi host after the DVS is registered. This triggers the connection of the vdrPort.

This can be done using the /etc/init.d/netcpad restart command.

Additional Information

Impact/Risks:
When the vdrPort does not exist on the ESXi host, all traffic destined to the DLR module on the host will be dropped. This will result in non-functional DLR routing. If an ESG exists on the host and a Logical Switch is used as a transit network between DLR and ESG, the North/South traffic flows may also be impacted.