Any addition / deletion / modification of configuration on the NSX Edge / DLR fails.
NSX Manager logs begin reporting the following messages for the edge/DLR:
2018-07-16 08:03:34.361 AEST INFO messagingTaskExecutor-7 VseRpcResponseHandler:111 - Received empty response for request f3244e8c-918c-####-####-####-########c88 from appliance: 501d5055-8dbd-####-####-####-########49c, vm vm-2879 2018-07-16 08:03:34.362 AEST ERROR http-nio-127.0.0.1-7441-exec-7976 BaseRestController:452 - REST API failed : 'null' java.lang.NullPointerException
/run/vmware/vshield/cmdOut/ha.cid.debug file grows and fills up the temporary file system of DLRs and Edge VMs.
The issue affects only the Edges and DLRs configured in HA.
The ha.cid.debug file creation is triggered only after an HA event (like HA failover or split-brain), and it can take approximately four weeks for the ha.cid.debug file to fill up edge tmpfs on a DLR or compact edge. Thus, it could be weeks after the original HA event that the customer notices this issue.
No datapath impact is expected when the file system is full, only new configuration changes would fail for the affected edges.
This issue is resolved in:
VMware NSX for vSphere 6.3.7, available at Support Documents and Downloads (broadcom.com).
VMware NSX for vSphere 6.4.2, available at Support Documents and Downloads (broadcom.com).
Workaround: The Active Edge VM must be rebooted to get it back into a working state.
The file ha.cid.debug does not exist until an HA event happens, i.e. a failover or split-brain recovery.
If you deploy a fresh Edge with HA enabled, the file does not exist until you trigger a failover.
/run/vmware/vshield/cmdOut/ha.cid.Out file filled with following logs on Edge appliances.
This issue is seen in NSX 6.3.6, 6.4.x.