Dataplane is not able to start after increasing ring buffer size
search cancel

Dataplane is not able to start after increasing ring buffer size

book

Article ID: 314225

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The purpose of this KB is to provide a way to figure out why dataplane service fails to start and how to resolve this issue.

Symptoms:
After increasing ring buffer size to 4096 on BM edge, dataplane service cannot start successfully.

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x
VMware NSX-T Data Center 4.x

Cause

This happens due to lack of heap memory on socket 0 as ring buffer consumes larger hugepage memory.

$ cat ./edge/memory-malloc-heap
[
    {
        "Alloc_count": 37373,
        "Alloc_size": 34357822400,
        "Free_count": 1439,
        "Free_size": 1915968, >>>
        "Greatest_free_size": 15232,
        "Heap id": 0,
        "Heap name": "socket_0",
        "Heap_size": 34359738368 >>>
    },

Resolution

NSX 3.2.3.2 and 4.1.1 start supporting 128GB hugepage memory for BME compared to 64GB in earlier versions, and customer won't see this issue with 4K rx/tx ring buffer.

Please refer to workaround section for other versions

Workaround:
You have to monitor if heap memory is enough and decrease ring buffer size to either 2048 or 1024.

Additional Information

Impact/Risks:
If rte_heap_memory is exhausted, edge triggers enter MM and exit MM, and systemd restarts all edge services. This operation is trying to mitigate rte_heap_memory exhaustion impact which depends on the amount of memory still available, and the configuration of the edge.


When a few percentage of memory is still available, most operations will still work fine. Datapath packet forwarding does not use the rte_heap, so it will continue to work. However, configuration changes and state synchronization may use the heap and may start to fail for services like firewall or LB.