Troubleshooting Pod Eviction Failures for Portworx Storage Pods in Kubernetes

search cancel

Troubleshooting Pod Eviction Failures for Portworx Storage Pods in Kubernetes—Diagnosis and Identification

book

Article ID: 418658

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Kubernetes administrators may encounter situations where cluster nodes are unable to drain successfully because certain pods cannot be evicted, leaving nodes stuck in maintenance, scheduling-disabled, or deletion states.
This scenario often arises when Portworx storage pods are protected by strict Pod Disruption Budgets (PDBs), and eviction would violate minimum availability constraints.
Understanding how to identify and resolve Portworx-related pod eviction failures helps maintain cluster stability and efficient lifecycle operations.

Environment

VMware vSphere Kubernetes Service

Cause

When node maintenance, upgrades, or remediation is triggered (e.g., due to health issues or infrastructure events), Kubernetes attempts to drain affected nodes by evicting hosted pods.
Portworx pods, deployed as part of a storage cluster, are governed by PDBs which enforce a minimum number of healthy pods.
If draining a node would reduce the number of available storage pods below the PDB threshold, Kubernetes refuses to evict those pods, blocking node deletion or replacement.
This situation is common with Portworx clusters when storage pods are almost at or at their minimum required count.

Resolution

Identification Steps:

Describe the Problematic Pod:

Run: kubectl describe pod <pod-name> -n <namespace> on pods stuck during node drain.

2. Look for messages indicating:

cannot evict pod as it would violate the pod's disruption budget

3. Review ownerReferences in the pod spec; Portworx pods typically have:

ownerReferences:
apiVersion: core.libopenstorage.org/v1
kind: StorageCluster

4. Check the PDB:

List PDBs in the namespace: kubectl get pdb -n portworx

5. Look for entries like:

NAME MIN AVAILABLE CURRENT AVAILABLE AGE
px-storage 7 7 20d

6. If current available equals min available, eviction is blocked.

7. Check Pod Scheduling:

Use: kubectl get pods -n portworx -o wide to list on which nodes Portworx pods are running.

8. Nodes marked for deletion or in maintenance mode but still hosting Portworx pods will be stuck until the pods can be safely evicted.

Remediation Steps:

Below steps could be performed in order for the Nodes to be drained successfully.

Work with Storage Vendor Support (e.g., Portworx Support) to gracefully disable or remove the affected nodes from the Portworx cluster, allowing pods to terminate or migrate without violating the PDB.
Increase Node Count (Optional):
- Temporarily scale up cluster nodes to allow new pods to schedule elsewhere and maintain PDB health.

Once Portworx confirms node removal and pods can be safely evicted, retry drain/remediation operations, which should now complete successfully.

Additional Information

This workflow ensures administrators can identify Portworx pod eviction failures and confirm their linkage to StorageCluster ownership and PDBs. Ensuring close collaboration with the storage team and validating pod and PDB status before node drain or maintenance minimizes disruption and preserves data availability in stateful Kubernetes environments.

Feedback

thumb_up Yes

thumb_down No