Minio disk gradually fills up with spark data checkpoints from InfraClassifier (IC) that run every hour. When the data-size of the disk grows too large, it can impact the ability of other Intelligence services that also use Minio.
NAPP will also have an alarm about high disk usage for Data Storage, but nothing about Analytics.
The Storage usage can be seen in alarms and by reviewing Core Services tab in NAPP.
The problem can be identified by running the disk-usage command on any of the minio-* nodes in the nsxi-platform namespace.
The IC checkpoints directory grows quite large.
export KUBECONFIG=/config/vmware/napps/.kube/config
napp-k exec -it minio-0 -- /bin/bash
20G /data/minio/druid
549M /data/minio/feature-service
22M /data/minio/llanta
79G /data/minio/iccheckpoints <-------------- NOTE: LARGE SIZE 79G!
4.0K /data/minio/events
4.0K /data/minio/icfeatures
514M /data/minio/processing-checkpoints
16K /data/minio/lost+found
12K /data/minio/ntaflow-checkpoints
59M /data/minio/.minio.sys
2.6G /data/minio/data-service
102G /data/minio
VMware NSX 4.0.0.1
This issue is resolved in NSX Intelligence 4.1.1
Workaround:
To workaround this issue, iccheckpoints need to be cleaned up. This can be achieved without impact to InfraClassifier or any other services.
The attached yaml file contains a cronjob and configmap, which will clean up the iccheckpoints each evening at midnight local time.
clean-checkpoints-with-annotations.yaml
. clean-checkpoints-with-annotations.yaml
$ napp-k apply -f clean-checkpoints-with-annotations.yaml