HCX NE/IX Appliances tunnel flapping due CPU/Memory contention in the host

search cancel

HCX NE/IX Appliances tunnel flapping due CPU/Memory contention in the host

book

Article ID: 313630

calendar_today

Updated On:

Products

VMware Cloud on AWS

Issue/Introduction

To understand to root cause of HCX IX/NE & other Appliances tunnel flapping/unstable issue and fix the issue. Also, need to analyze high memory alerts on all hosts.

Symptoms:

HCX IX/NE & other Appliances tunnel flapping/unstable and High memory usage alerts on all hosts in the cluster.

HCX manager logs show the appliances had insufficient memory, making it unstable.

system_events

{"id":60002,"level":6,"timestamp":1690140168,"UTC":"<date> <time> +0000 UTC","message":"Memory usage is high","metadata":{"cache":" ","free":" ","total":" ","used":" "}}

NE appliance messages log:

NE-R1 GatewayLogs[1052]: [Info-opsEvent] : new system event: SystemEvent[<date> <time> +0000 UTC, <date> <time> +0000 UTC, 60002, critical, Memory usage is high, map[cache: free: total: used: ]]

NE-R1 GatewayLogs[1052]: [Warning-ops] : Memory usage is probably high (free: %3)

SM-IX-R1 GatewayLogs[1136]: [Info-opsEvent] : new system event: SystemEvent[<date> <time> +0000 UTC, <date> <time> +0000 UTC, 60002, critical, Memory usage is high, map[cache: free: total: used: ]]

Cause

Caused due to a memory contention in the host where the appliance are running. Memory contention could be due to high memory usage by the workload VMs, impacting the HCX appliances stability.

Resolution

The "tunnel flapping" issue may re-appear if memory condition is unstable on ESXi hosts. Please ensure to have host memory under less contention to avoid getting HCX appliances impacted.

Additional Information

Impact/Risks:

Critical impact on the extended segments using HCX NE, which results in affecting the workloads connectivity. Extending new segments might fail due to NE Tunnel flapping.

Feedback

thumb_up Yes

thumb_down No