Minio disk gradually fills up with spark data checkpoints from InfraClassifier (IC) that run every hour. When the data-size of the disk grows too large, it can impact the ability of other Intelligence services that also use Minio.
NAPP will also have an alarm about high disk usage for Data Storage, but nothing about Analytics.
The Storage usage can be seen in alarms and by reviewing Core Services tab in NAPP.
The problem can be identified by running the disk-usage command on any of the minio-* nodes in the nsxi-platform namespace.
The IC checkpoints directory grows quite large.
export KUBECONFIG=/config/vmware/napps/.kube/confignapp-k exec -it minio-0 -- /bin/bash 20G /data/minio/druid 549M /data/minio/feature-service 22M /data/minio/llanta 79G /data/minio/iccheckpoints <-------------- NOTE: LARGE SIZE 79G! 4.0K /data/minio/events 4.0K /data/minio/icfeatures 514M /data/minio/processing-checkpoints 16K /data/minio/lost+found 12K /data/minio/ntaflow-checkpoints 59M /data/minio/.minio.sys 2.6G /data/minio/data-service 102G /data/minio
VMware NSX 4.0.0.1
This issue is resolved in NSX Intelligence 4.1.1
Workaround:
To workaround this issue, iccheckpoints need to be cleaned up. This can be achieved without impact to InfraClassifier or any other services.
The attached yaml file contains a cronjob and configmap, which will clean up the iccheckpoints each evening at midnight local time.
clean-checkpoints-with-annotations.yaml. clean-checkpoints-with-annotations.yaml$ napp-k apply -f clean-checkpoints-with-annotations.yaml