/var/log/messages (which contains information about the HCX tunnel, HA failover, BFD events, and other relevant services), you may find that there is no log information/relevant information./var/log/messages, a similar output is displayed:<4>1 <timestamps> <hostname> kernel - - syslog-ng invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
<6>1 <timestamps> <hostname> kernel - - syslog-ng cpuset=/ mems_allowed=0
<4>1 <timestamps> <hostname> kernel - - CPU: 0 PID: 20039 Comm: syslog-ng Tainted: G OE 4.19.245-1.ph3-esx #1-photon
<4>1 <timestamps> <hostname> kernel - - Call Trace:
<4>1 <timestamps> <hostname> kernel - - dump_stack+0x6d/0x8b
<4>1 <timestamps> <hostname> kernel - - dump_header+0x65/0x275
<4>1 <timestamps> <hostname> kernel - - ? __delayacct_freepages_end+0x25/0x30
<4>1 <timestamps> <hostname> kernel - - oom_kill_process+0x26b/0x2a0
<4>1 <timestamps> <hostname> kernel - - ? oom_badness.part.6+0xd/0x110
<4>1 <timestamps> <hostname> kernel - - out_of_memory+0xf3/0x2b0
<4>1 <timestamps> <hostname> kernel - - __alloc_pages_nodemask+0x87e/0xd40
<4>1 <timestamps> <hostname> kernel - - filemap_fault+0x342/0x660
<4>1 <timestamps> <hostname> kernel - - ext4_filemap_fault+0x2c/0x40
<4>1 <timestamps> <hostname> kernel - - __do_fault+0x32/0xa0
<4>1 <timestamps> <hostname> kernel - - do_fault+0x121/0x6b0
<4>1 <timestamps> <hostname> kernel - - ? ep_read_events_proc+0xb0/0xb0
<4>1 <timestamps> <hostname> kernel - - __handle_mm_fault+0x5de/0x680
<4>1 <timestamps> <hostname> kernel - - handle_mm_fault+0x10a/0x200
<4>1 <timestamps> <hostname> kernel - - __do_page_fault+0x1fa/0x3f0
<4>1 <timestamps> <hostname> kernel - - do_page_fault+0x22/0x30
<4>1 <timestamps> <hostname> kernel - - ? page_fault+0x8/0x30
<4>1 <timestamps> <hostname> kernel - - page_fault+0x1e/0x30
<4>1 <timestamps> <hostname> kernel - - RIP: 0033:0x7f686411e4d0
<132>1 <timestamps> <hostname> cgw 1104 - - [Warning-ops] : Memory usage is probably high (free: %3)
Memory usage is high):VMware HCX
A memory leak affecting the ndd process has been found on the HCX Fleet Appliances.
This causes high memory usage, and the Fleet Appliance is unable to allocate resources.
This issue is resolved in VMware HCX 4.11.1, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Note: If you are experiencing this issue in HCX Fleet appliances 4.11.1 or higher, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.
admin user.cclilistgo # (where # is the NE appliance ID)sshsystemctl stop nddsystemctl disable nddNote: After disabling the ndd service on the NE Appliance VM, there will be no impact on the system from a traffic forwarding and stability perspective. However, the Transport Analytics feature will be non-functional for those NE Appliances. On-demand bandwidth testing can be used as an alternative to the Transport Analytics feature instead.
Note: If you are running HCX 4.11.0 or below, we recommend proactively implementing Workaround 2 to all appliances to prevent this issue in the future - this needs to be implemented on both the HCX NE-I (source/Initiator) and NE-R (target/receiver) appliances.
The /var/log/messages outputs are fundamental for troubleshooting complex issues. The absence of information logged to /var/log/messages due to a syslog issue will significantly affect the ability to provide a root cause.
VMware HCX 4.11.1 Release Notes
Fixed Issue 3528977: Long running Network Detection Daemon (ndd) process can cause the system to run out of memory on Network Extension (NE) and Interconnect (IX) appliances.
When the system is kept running for a long time, the ndd process will continue consuming memory and can eventually consume all available memory leading to system kernel errors.