NSX Load Balancer: All Virtual Servers on Edge go into a "Down" state.
search cancel

NSX Load Balancer: All Virtual Servers on Edge go into a "Down" state.

book

Article ID: 388179

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • Edge Memory alarm is triggered for the mbuf_pool_socket_0 memory pool.
  • Load Balancer alarm is triggered for all Virtual Servers on an Edge.
  • All traffic traversing the Virtual Servers is affected.
  • Edge syslog shows the KNI is out of memory.

/var/log/syslog.log
[Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" level="WARN"] KNI: Out of memory

[Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" level="WARN"]  message repeated 83 times: [KNI: Out of memory]

[Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" level="WARN"] KNI: Out of memory

[Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" level="WARN"]  message repeated 362 times: [KNI: Out of memory]


  • Edge syslog shows the mbuf_pool_socket_0 memory pool is exhausted.

    /var/log/syslog.log
    [Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="stats" level="INFO"] mempool exhausted, usage: 99, threshold: 85, pool: mbuf_pool_socket_0

    [Timestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="stats" level="INFO"] mempool exhausted, usage: 99, threshold: 85, pool: mbuf_pool_socket_0

    [TImestamp] NSX 5088 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="stats" level="INFO"] mempool exhausted, usage: 100, threshold: 85, pool: mbuf_pool_socket_0

  • Edge kernel logs shows the following.

    /var/log/kern.log
    [Timestamp] [FQDN] kernel - - - [9851859.669838] rte_kni: Fail to enqueue mbuf into tx_q
    [Timestamp] [FQDN] kernel - - - [9851859.670701] rte_kni: Fail to enqueue mbuf into tx_q
    [Timestamp] [FQDN] kernel - - - [9851860.367296] rte_kni: Fail to enqueue mbuf into tx_q
    [Timestamp] [FQDN] kernel - - - [9851860.369680] rte_kni: Fail to enqueue mbuf into tx_q
    [Timestamp] [FQDN] kernel - - - [9851865.276970] rte_kni: Fail to enqueue mbuf into tx_q
    [Timestamp] [FQDN] kernel - - - [9851865.277896] rte_kni: Fail to enqueue mbuf into tx_q

 

Environment

VMware NSX-T Data Center 3.2.x
VMware NSX 4.x

Cause

There is a memory leak in the KNI interface when traffic from the kernel to datapath is too heavy. This eventually leads to failure in communication between the LB service and the server pool members.

Resolution

This is a known issue impacting VMware NSX.

Workaround:

Place the affected Edge in/out of maintenance mode.

To prevent the issue; if the alarm Edge Datapath mempool usage high for mbuf_pool_socket_0 is triggered put the Edge in/out of memory.