NSX Manager cluster instability due to Proton (Manager) service Out Of Memory
search cancel

NSX Manager cluster instability due to Proton (Manager) service Out Of Memory

book

Article ID: 438137

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX UI is unavailable intermittently:
    Some appliance components are not functioning properly.
    Component health: MANAGER:DOWN, SEARCH:DOWN, NODE_MGMT:UP, UI:UP.
    Error code: 101
  • When you check System > Appliances, the cluster is degraded and one or more managers has:
    1. MANAGER down
    2. HTTP down
  • An alarm is present: Application on NSX node has crashed.
  • You might have an open Edge global ARP table usage high alarm.
    • You can confirm this from an NSX Manager support bundle if an alarm has not triggered by reviewing /controller/adaptor-ufo/adaptor_ufo_dump.
      • Count the entries with either of the following commands:
        grep -w arp_entry_type adaptor_ufo_dump | wc -l
        
        or
        
        grep -w entity_id adaptor_ufo_dump | wc -l
  • You might have an open Edge NIC link status down alarm.
  • The NSX Manager generates a core dump in /image/core:
    proton_oom.hprof.gz
  • Manager instability prevents configuration realization, resulting in VM deployment failures.
  • On a manager node in /var/log/proton-tomcat-wrapper.log:
    STATUS | wrapper  | ####/##/## ##:##:## | The JVM has run out of memory.  Requesting thread dump. 

Environment

VMware NSX 4.1.2.7

Cause

The maximum supported number of ARP entries per logical router is 50,000. Consequently, an Out of Memory (OOM) condition is expected behavior due to the supported limits being significantly exceeded.

Resolution

Segment the IP address range across multiple logical routers to stay within supported limits
For further recommendations, review Edge global ARP table usage high alarm.

Additional Information

Configuration Maximum