NSX Application Platform Data Storage Service(Minio) crash after scale out
search cancel

NSX Application Platform Data Storage Service(Minio) crash after scale out

book

Article ID: 375740

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

NSX Application Platform Data Storage Service(Minio) crash after scale out

Under Napp - Core Services -- Datas Storage Services will be DOWN

Environment

NAPP 3.2.0 and above 

Cause

During scale out, new hosts will be added along with new Minio pods. However, if we scale out Minio more than once, the same hosts will be added again. As a result, the same host will be added multiple times and crash Minio pods.

Resolution

Remove extra hosts from the Minio statefulset

1. Run the command the edit the Minio statefulset:

ssh to NSX manager

run "export KUBE_EDITOR=/usr/bin/vim.tiny"
run "napp-k edit sts minio"

2. Locate the error host list. Sample error host list:

Args:
    server
    --console-address=:9001
    https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio

 

Here in above output we see duplicates hosts entries :

{0...3}

{4...7}

{8...11}

 

3. Remove the duplicated host list. We can use command "dd" to remove one line

Sample correct host list after removal:

Args:
    server
    --console-address=:9001
    https://minio-{0...3}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{4...7}.minio-headless.nsxi-platform.svc.cluster.local/data/minio
    https://minio-{8...11}.minio-headless.nsxi-platform.svc.cluster.local/data/minio

If we make some mistakes and want to quit without saving the change, we can run the command ":q!"

4. Save the change and quit with the command:
  
:wq!

5. Minio pods will be rolling updated and back to healthy state in 10 to 15 minutes. To check minio pods status:

ssh to the NSX manager
run "napp-k get pod | grep minio"

If the Data Storage service is completely scaled out, the alarm of Data Storage down or degraded should disappear, and the NSX Application Platform status should be "stable" under System -> Nsx Application Platform.