HCX NE tunnels down due to high memory usage
search cancel

HCX NE tunnels down due to high memory usage

book

Article ID: 420839

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • HCX 4.11.0

  • HCX NE tunnels report status as down.




  • There are no traceroute events logged in the messages logs for the NE appliance.

  • /var/log/message.log in the NE appliance reports following error

    <131>1 2025-12-01T03:31:25+00:00 ServiceMesh-NE-I1 cgw 1459 - - [Err-Tasker] : cmd (/usr/local/sbin/conntrack -I -p esp -s 192.##.##.## -d 192.##.##.## -m 34 -t 600) failed </usr/local/sbin/conntrack -I -p esp -s 192.##.##.## -d 192.##.##.## -m 34 -t 600: exit status 1:conntrack v1.4.6 (conntrack-tools): Operation failed: Such conntrack exists, try -U to update>
    <134>1 2025-12-01T03:31:25+00:00 ServiceMesh-NE-I1 cgw 1459 - - [Info-opsEvent] : new system event: SystemEvent[2025-12-01T03:31:24Z, 2025-12-01T03:31:24Z, 60002, critical, Memory usage is high, map[balloon:0] MB cache:14688256 free:97075200 total:3109163008 used:3012087808]]

  • Rebooting the NE appliance reporting the above entries bring the NE tunnels up temporarily.

Environment

VMware HCX

Cause

The failure of NE appliance tunnels and other functions (such as extension, unextension, and enabling MON) is caused by a known memory leak in the ndd process of HCX 4.11.0. The leak results in high memory usage, leaving the NE appliance unable to allocate critical resources for these tasks.

Resolution

Upgrade the HCX Manager and service mesh to 4.11.1  where this issue is resolved or the latest version ie 4.11.3

Additional Information

HCX Network tasks failing due to high memory usage by the "ndd" process