NSX Application Platform Data Storage Service(Minio) crash after scale out
Under Napp - Core Services -- Datas Storage Services will be DOWN
NAPP 3.2.0 and above
During scale out, new hosts will be added along with new Minio pods. However, if we scale out Minio more than once, the same hosts will be added again. As a result, the same host will be added multiple times and crash Minio pods.
Remove extra hosts from the Minio statefulset
1. Run the command the edit the Minio statefulset:
ssh to NSX manager
run "export KUBE_EDITOR=/usr/bin/vim.tiny"
run "napp-k edit sts minio"
2. Locate the error host list. Sample error host list:
Args:
server
--console-address=:9001
https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
Here in above output we see duplicates hosts entries :
{0...3}
{4...7}
{8...11}
3. Remove the duplicated host list. We can use command "dd" to remove one line
Sample correct host list after removal:
Args:
server
--console-address=:9001
https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
If we make some mistakes and want to quit without saving the change, we can run the command ":q!"
4. Save the change and quit with the command:
:wq!
5. Minio pods will be rolling updated and back to healthy state in 10 to 15 minutes. To check minio pods status:
ssh to the NSX manager
run "napp-k get pod | grep minio"
If the Data Storage service is completely scaled out, the alarm of Data Storage down or degraded should disappear, and the NSX Application Platform status should be "stable" under System -> Nsx Application Platform.