Worker Nodes are unhealthy after migrating to a new Storage Policy in vSphere with Tanzu
search cancel

Worker Nodes are unhealthy after migrating to a new Storage Policy in vSphere with Tanzu

book

Article ID: 383814

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Supervisor Worker Nodes, ESXi hosts,   are  showing as unhealthy after migrating Storage policies in vsphere with Tanzu. They are still trying to use both the old and new Storage Policies and Datastores based on the spherelet configuration. The old Storage Policy and Datastore are showing as unhealthy.

Conditions:
  Type           Status  LastHeartbeatTime                 LastTransitionTime                Reason                         Message
  ----           ------  -----------------                 ------------------                ------                         -------
  Ready          True    Fri, 15 Nov 2024 11:25:57 +0000   Fri, 15 Nov 2024 11:25:57 +0000   KubeletReady                   Spherelet is ready.
  DiskPressure   True    Mon, 01 Jan 0001 00:00:00 +0000   Mon, 01 Jan 0001 00:00:00 +0000   Host can't access datastores   failed to find any accessible datastores for storage policy <old-storage-policy-uid> datastore URls: [ds:///vmfs/volumes/<old-datatsore-id>/]
Addresses:

 

Environment

vSphere with Tanzu 8.0. U3d and above

Cause

The old Storage Policy remains in the Supervisor desired configuration.

In vCenter, under Workload Management > Supervisor > Configuration > Storage policy, you see that the new storage-policy is assigned to "Control Plane nodes", but no other policies.

get a list of the storage policies available

 dcli +server 'http://localhost/api' com vmware vcenter  storage policies list

|------------------------------------------|-------------------------------------------------------------------------------------------------------------|
|name                                      |description                                                               |policy                            |
|------------------------------------------|--------------------------------------------------------------------------|----------------------------------|
|old-storage-policy                        |                                                                          |old-storage-policy-uid            |
|new-storage-policy                        |                                                                          |new-storage-plolicy-uid           |
|------------------------------------------|--------------------------------------------------------------------------|----------------------------------|

Checking the storage uid assigned in wcp desired configuration for the Supervisor and you find a match for the  old-storage-policy under EphemeralStoragePolicy and ImageStorage.StoragePolicy:
 
/usr/lib/vmware-wcp/wcp-db-dump.py | grep -i storage
          "ImageStorage": {
            "StoragePolicy": "<old-storage-policy-uid>"
          "MasterStoragePolicy": "<new-storage-policy-uid>",
          "EphemeralStoragePolicy": "<old-storage-policy-uid>",
        "storage_svcacct_pwd": "CENSORED",
        "last_storage_pwd_rotation_timestamp": xxxxxxxxxxx,
 

 Also you will find in the Spherelet configmap in the kube-system  namespace that details of both the old and new Storage Policies are present.

kubectl get cm -n kube-system spherelet -o yaml | grep datastores:


  datastores: '{"<new-storage-policy-uid>":["<new-datatsore-url>"],"<old-storage-policy-uid>":["<old-datatsore-url>"]}'

Resolution

Before you start, check the storage uid assigned in wcp

    /usr/lib/vmware-wcp/wcp-db-dump.py | grep -i storage
    
          "ImageStorage": {
            "StoragePolicy": "<old-storage-policy-uid>"
          "MasterStoragePolicy": "<new-storage-policy-uid>",
          "EphemeralStoragePolicy": "<old-storage-policy-uid>",
        "storage_svcacct_pwd": "CENSORED",
        "last_storage_pwd_rotation_timestamp": 1733324403,
    

On the vCenter appliance shell session, log into dcli in interactive mode

dcli +i

1. List clusters and take note of the cluster moid, <domain-cxx>

    com vmware vcenter namespacemanagement clusters list

2. List storage policies and check the <new-storage-policy>, and take note of the uid, <new-storage-policy-uid> 

    storage policies list

3. get cluster output  and check for   master_storage_policy,  ephemeral_storage_policy and image_storage -> storage_policy

    namespacemanagement software clusters get --cluster  <domain-cxx> 

4. Update ephemeral_storage_policy and image_storage to use the uid for uid for the target MasterStoragePolicy

    namespacemanagement clusters update --cluster  <domain-cxx> --ephemeral-storage-policy <new-storage-policy-uid>

    namespacemanagement clusters update --cluster  <domain-cxx>   --image-storage-storage-policy <new-storage-policy-uid>

5. Restart WCP service on vCenter (this will reconfigure/reconcile the cluster)

    service-control --stop wcp && service-control --start wcp

    
6. After you make the change, check that the new-storage-policy-uid 

    /usr/lib/vmware-wcp/wcp-db-dump.py | grep -i storage
    
          "ImageStorage": {
            "StoragePolicy": "<new-storage-policy-uid>"
          "MasterStoragePolicy": "<new-storage-policy-uid>",
          "EphemeralStoragePolicy": "<new-storage-policy-uid>",
        "storage_svcacct_pwd": "CENSORED",
        "last_storage_pwd_rotation_timestamp": 1733324403,


On the Supervisor, 

7. Check if storage class, tkgs-storage-policy, is still in use.

kubectl get sc -A

kubectl get cm -n kubesystem spherelet -o yaml

8. Check that the Spherelet configmap in the kube-system  namespace only reference the new Storage Policy and datastore

kubectl get cm -n kube-system spherelet -o yaml | grep datastores:


  datastores: '{"<new-storage-policy-uid>":["<new-datatsore-url>"]}'

 

If the old storage class  is gone. It should be safe to remove the old Storage policy from vCenter.