NSX NAPP Platform Disk Usage High Alarm
search cancel

NSX NAPP Platform Disk Usage High Alarm

book

Article ID: 324174

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NAPP version is 4.0.1
  • There is an alarm on NSX UI stating that the Disk Storage is above 75%
  • Some NAPP services might not be working as expected
  • The problem can be identified by running the disk-usage command on any of the minio-* nodes in the nsxi-platform namespace.
Get to the minio-0 pod (as an example)
  $ kubectl -n nsxi-platform exec -it minio-0 -- /bin/bash
Run the disk-usage command
  $ du -ah --max-depth=1 /data/minio
  20G /data/minio/druid
  549M /data/minio/feature-service
  22M /data/minio/llanta
  79G /data/minio/iccheckpoints  <----- Unexpected Large File
  4.0K /data/minio/events
  4.0K /data/minio/icfeatures
  514M /data/minio/processing-checkpoints
  16K /data/minio/lost+found
  12K /data/minio/ntaflow-checkpoints
  59M /data/minio/.minio.sys
  2.6G /data/minio/data-service
  102G /data/minio



Environment

VMware NSX 4.0.0.1

Cause

Minio disk gradually fills up with spark data checkpoints from InfraClassifier(IC) that runs every hour. When the data-size of the disk grows too large, it can impact the ability of other Intelligence services that also use Minio.

Resolution

This issue is resolved in NSX NAPP 4.1.1.

Workaround:
Install the attached yaml file, containing a cronjob and configmap, which will clean up the iccheckpoints each evening at midnight local time.

To apply this yaml file, the customer will need to do so with this command:
$kubectl -n nsxi-platform apply -f clean-checkpoints-with-annotations.yaml

Additional Information

Impact/Risks:
If the iccheckpoints continue filling up the disk, the processing pipeline will eventually be stalled and Intelligence will not be able to process data.

Attachments

clean-checkpoints-with-annotations get_app