NSX Manager /nonconfig Partition Full Due to Unpurged IDS Event Data.
search cancel

NSX Manager /nonconfig Partition Full Due to Unpurged IDS Event Data.

book

Article ID: 417465

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

In NSX version 3.2.x, the /nonconfig partition on NSX Manager may become full or nearly full due to the ids_event_data table not being purged.

This occurs when the purge job fails to fetch Corfu records, leading to a continuous accumulation of IDS/IPS event data. As a result, the Search Indexing process fails with Java OutOfMemory errors, and system performance can degrade significantly.

Environment

VMware NSX

vDefend Firewall

Cause

In NSX 3.2.x releases, IDS/IPS event retention is configured for 14 days or a maximum of 1.5 million records, whichever is reached first.

There are two purge jobs responsible for event cleanup:

  1. One that monitors event count and purges older records when the count exceeds the threshold.

  2. Another that deletes events older than 14 days.

In affected versions (including 3.2.2.1):

  • The 1.5 million event threshold is a soft limit, only triggering an alarm.

  • The purge job can encounter an OutOfMemory (OOM) exception when fetching the keyset for large datasets (~1.5M+ records).

  • Once this failure occurs, the purge job continuously fails, allowing events to accumulate uncontrollably (observed cases reached 40M+ events).

  • Logs show failures acquiring the distributed lock, preventing the purge job from proceeding:

log/idps-reporting/idps.log

INFO DistributedLockThread DistributedLockImpl 7418 - [nsx@6876 comp="distributed-lock" level="INFO" subcomp="DistributedLockImpl"] Unable to acquire distributed lock ids_events_purge_distributed_lock due to com.vmware.nsx.platform.clustering.persistence.exceptions.DuplicateObjectException

Additionally, Search Indexing failures are observed due to memory exhaustion:

log/idps-reporting/idps.log

ERROR pool-103-thread-1 UfoIndexingServiceImpl 6367 - [nsx@6876 comp="nsx-manager" errorCode="MP60503" level="ERROR" subcomp="idps-reporting"] [Indexing:ProcessTable] Exception during indexing table ids_event_data
java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.resize(HashMap.java:705)
    at java.util.HashMap.putVal(HashMap.java:664)
    at java.util.HashMap.put(HashMap.java:613)
    at java.util.HashSet.add(HashSet.java:220)
  at org.corfudb.runtime.collections.PersistedStreamingMap.keySet(PersistedStreamingMap.java:249)

Resolution

For systems already affected where the /nonconfig partition is full, follow the below steps carefully to manually clean up and restore functionality.

Sample logs from the affected managers where the usage is almost full:

nsx_manager_********_20251001_093540/system/df_-alT:/dev/mapper/nsx-secondary   ext4   >102G   100% /nonconfig
nsx_manager_********_20251001_093543/system/df_-alT:/dev/mapper/nsx-secondary   ext4   >102G    78% /nonconfig
nsx_manager_********_20251001_094622/system/df_-alT:/dev/mapper/nsx-secondary   ext4   >102G    88% /nonconfig

Workaround:

Step-by-Step Procedure:

  1. Take a Manager backup before proceeding.

  2. Monitor /nonconfig partition usage on all NSX Managers:

    df -h | grep nonconf
    du -sh /nonconfig/*

    Example output: 

    # df -h | grep nonconfig
    /dev/mapper/nsx-secondary     98G  1.6G   92G   2% /nonconfig

    # du -sh /nonconfig/*
    4.0K    /nonconfig/browser
    98M    /nonconfig/corfu
    736M    /nonconfig/diskonlycorfutable
    16K    /nonconfig/lost+found
    689M    /nonconfig/search
  3. Stop IDPS Reporting Service on all 3 Managers:

    /etc/init.d/idps-reporting-service stop
  4. Stop Corfu Nonconfig Server on all 3 Managers:

    /etc/init.d/corfu-nonconfig-server stop
  5. Manually clear nonconfig data (run on all 3 Managers): 

    Before deleting any files, verify the Corfu layout file consistency:

    • Open /nonconfig/corfu/corfu/LAYOUT_CURRENT.ds and ensure that there is only one entry in the "segments" array, with:

      "start": 0,
      "end": -1
    • This ensures that deleting the data avoids unnecessary hole fills and state transfers when the Corfu nonconfig servers restart.

    Once confirmed, proceed to clear accumulated data:

    rm -rf /nonconfig/corfu/corfu/*SEGMENT*.ds
    rm -rf /nonconfig/corfu/corfu/log/*
    rm -rf /nonconfig/browser/*
    rm -rf /nonconfig/diskonlycorfutable/idps/*
  6. Start Corfu Nonconfig Server:

    /etc/init.d/corfu-nonconfig-server start
  7. Start IDPS Reporting Service

    /etc/init.d/idps-reporting-service start
  8. Verify service status: 

    su admin -c get cluster status
  9. Resync IDPS Reporting Search Index:

    su admin -c start search resync idps-reporting
  10. Re-check /nonconfig partition usage to ensure cleanup succeeded:

    df -h | grep nonconf
    du -sh /nonconfig/*

    Example post-cleanup output:

    # df -h | grep nonconfi
    /dev/mapper/nsx-secondary     98G  1.2G   92G   2% /nonconfig

    # du -sh /nonconfig/*
    4.0K    /nonconfig/browser
    840K    /nonconfig/corfu
    478M    /nonconfig/diskonlycorfutable
    16K    /nonconfig/lost+found
    527M    /nonconfig/search

Additional Information

For environments where disk usage is moderate and database cleanup is possible, refer to the official KB for clearing older IDPS data:

Broadcom KB 385626 – Clearing Old IDPS Events from Database