The netcpa agent on an ESXi host fails to communicate with NSX controller(s) in VMware NSX for vSphere 6.x
search cancel

The netcpa agent on an ESXi host fails to communicate with NSX controller(s) in VMware NSX for vSphere 6.x

book

Article ID: 339193

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • Routes from dynamic routing protocols may not be present in VMware NSX for vSphere 6.x Edge Services Gateway (ESG) or Distributed Logical Router (DLR) when the Control VM is running on affected ESXi host
  • Virtual machines on the affected ESXi host fails to communicate with other virtual machines running on other ESXi hosts
  • Running the esxcli network vswitch dvs vmware vxlan network list --vds-name=Name_VDS command on the ESXi host displays the VNIs as down

    For example:

    ~ # esxcli network vswitch dvs vmware vxlan network list --vds-name=Compute_VDS
    VXLAN ID Multicast IP Control Plane Controller Connection Port Count MAC Entry Count ARP Entry Count
    -------- ------------------------- ----------------------------------- --------------------- ---------- --------------- ---------------
    5001 N/A (headend replication) Enabled (multicast proxy,ARP proxy) 192.168.110.203 (down) 1 1 0
    5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy) 192.168.110.201 (down) 1 0 0
  • In the /var/log/cloudnet/cloudnet_java-vnet-controller.***.log file on the NSX controller node, you see entries similar to:

    2015-07-12 11:51:35,323 7017610548 [worker 1] INFO com.vmware.controller.apps.core.CoreApp - Close Connection [ip=10.1.8.71, cnnId=609] due to keepalive timeout
  • In the /var/log/netcpa.log file on the affected ESXi host, you see entries are similar to:

    2015-07-12T11:51:12.005Z [668CEB70 warning 'ThreadPool'] Thread pool usage : Total:[Working: 22/28 (Running: 23)] Task:[Working: 20/20 (Peak: 20), Queued: 2 (Peak: 2)]
    2015-07-12T11:51:12.133Z [668CEB70 warning 'ThreadPool'] Thread pool usage : Total:[Working: 22/28 (Running: 23)] Task:[Working: 20/20 (Peak: 20), Queued: 4 (Peak: 4)]
    2015-07-12T11:51:13.575Z [6690FB70 info 'ThreadPool'] Thread enlisted
    2015-07-12T11:51:15.031Z [6690FB70 warning 'ThreadPool'] Thread pool usage : Total:[Working: 23/28 (Running: 24)] Task:[Working: 20/20 (Peak: 20), Queued: 8 (Peak: 8)]
  • Connections to the Controller from netcpa may show CLOSED or CLOSE_WAIT status

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.2.x

Cause

This issue occurs due a known bug in the vmacore library.

Resolution

This issue is resolved in:
If you are unable to upgrade at this time, follow the workaround.

To work around this issue, restart the netcpa agent on the affected ESXi host.
  1. Log in to the ESXi host as root either using SSH or ESXi host console.
  2. Restart the netcpa agent on the affected ESXi host by running this command:

    /etc/init.d/netcpad restart


Additional Information

You can validate the connections to the Controllers from netcpa to show CLOSED or CLOSE_WAIT status by running this command:

esxcli network ip connection list |grep "1234.*netcpa*" | egrep "CLOSED|CLOSE_WAIT"
.

If netcpa has been down for a significantly long time, the connections may not be present at all. To validate this, running this command should output one connection for each Controller.

esxcli network ip connection list |grep "1234.*netcpa*" |grep ESTABLISHED.

Note: If there is any missing connection, then netcpa may not be working properly.

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box.
Network connectivity issues with new and migrated virtual machines backed by VXLAN
Netcpa issues in VMware NSX for vSphere 6.x
VMware NSX for vSphere 6.x で ESXi ホスト上の netcpa エージェントが NSX Controller との通信に失敗する
ESXi 主机上的 netcpa 代理无法与 VMware NSX for vSphere 6.x 中的 NSX Controller 通信