There is an alarm on NSX UI stating that the Disk Storage is above 75%
Some NAPP services might not be working as expected
The problem can be identified by running the disk-usage command on any of the minio-* nodes in the nsxi-platform namespace.
Get to the minio-0 pod (as an example) $ kubectl -n nsxi-platform exec -it minio-0 -- /bin/bash Run the disk-usage command $ du -ah --max-depth=1 /data/minio 20G /data/minio/druid 549M /data/minio/feature-service 22M /data/minio/llanta 79G /data/minio/iccheckpoints <----- Unexpected Large File 4.0K /data/minio/events 4.0K /data/minio/icfeatures 514M /data/minio/processing-checkpoints 16K /data/minio/lost+found 12K /data/minio/ntaflow-checkpoints 59M /data/minio/.minio.sys 2.6G /data/minio/data-service 102G /data/minio
Environment
VMware NSX 4.0.0.1
Cause
Minio disk gradually fills up with spark data checkpoints from InfraClassifier(IC) that runs every hour. When the data-size of the disk grows too large, it can impact the ability of other Intelligence services that also use Minio.
Resolution
This issue is resolved in NSX NAPP 4.1.1.
Workaround: Install the attached yaml file, containing a cronjob and configmap, which will clean up the iccheckpoints each evening at midnight local time.
To apply this yaml file, the customer will need to do so with this command: $kubectl -n nsxi-platform apply -f clean-checkpoints-with-annotations.yaml
Additional Information
Impact/Risks: If the iccheckpoints continue filling up the disk, the processing pipeline will eventually be stalled and Intelligence will not be able to process data.