Volume Node Affinity Conflict - Incorrect Node Labels During Tanzu Hub Upgrade When VMs are Recreated
search cancel

Volume Node Affinity Conflict - Incorrect Node Labels During Tanzu Hub Upgrade When VMs are Recreated

book

Article ID: 430293

calendar_today

Updated On:

Products

VMware Tanzu Platform - Hub

Issue/Introduction

During an upgrade where we have BOSH VM changes, (Stemcells, or any other change that recreates the bosh VMs), previous nodes are drained and VMs are recreated during the upgrade. 

In such cases, newly created VMs do not retain the existing labels, and the errand assigns labels based on the new VM IP address. As a result, the the stateful pods (anything dependent on node label: kubernetes.io/hostname) remain in a Pending state.

For example, the postgres pods and PV's.

The PV for postgres ql-pg-data-postgresql-0 has a nodeAffinity required flag that points to the node label of "kubernetes.io/hostname=<IP>"

nodeAffinity:
  required:
    nodeSelectorTerms:
    - matchExpressions:
       - key: kubernetes.1o/hostname
         operator: In values:
         - 10.###.##.##

If the IP address changes for the postgres VM, this label incorrectly points to the new IP in the new node, but the PV still references the old IP, causing a mismatch. The pod will hang in Pending state with a scheduling error:

"1 node(s) had volume node affinity conflict".

Environment

Tanzu Hub 10.3.0

Cause

The kubelet manifest does not assign stable hostnames and bosh set hostname as IP address of the VM. If during VM recreation, the IP address changes, the hostname changes as well, and the existing PV cannot be rebound to the node. The PV node affinity is hardcoded to hostname in local path provisioner.

Resolution

This will be fixed in a future release of Tanzu Hub.

---

Manual Steps to Fix

  • Set BOSH deployment ID
    export BOSH_DEPLOYMENT=$(bosh deployments | grep -E '^hub-[0-9a-f]{20}' | awk '{print $1}')
  • List all VM with IP address and attached PV
    bosh instances --details --json | jq -r '.Tables[0].Rows[] | select(.disk_cids != "") | "\(.instance) \(.ips)"' | while read inst ip; do echo "=== $inst ($ip) ==="; bosh ssh "$inst" -c "ls -l /var/vcap/store" | grep -E 'pvc-|==='; done
  • Log in to the Registry VM
    bosh ssh registry
  • Identify which PVs are bound to non-existing (IP changed) nodes
    existing=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); kubectl get pv -o json | jq -r --arg nodes "$existing" '.items[] | select(.spec.nodeAffinity != null) | .metadata.name as $pv | .spec.nodeAffinity.required.nodeSelectorTerms[].matchExpressions[] | select(.key == "kubernetes.io/hostname") | .values[] | select(. as $v | ($nodes | split(" ") | index($v)) == null) | "\($pv) -> \(.)"'
  • IMPORTANT: Make sure the reclaim policy of all affected PVs is 'Retain'
    kubectl get pv
  • For each affected PV:
    • Export the PV configuration stripping unnecessary fields
      kubectl get pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 -o json | jq 'del(.metadata.creationTimestamp,.metadata.resourceVersion,.metadata.uid,.metadata.managedFields,.spec.claimRef,.status)' >pv.json
    • Edit PV configuration changing the IP address of the node from the list in first step:
      vi pv.json
    • delete PV and finalizers
      kubectl delete pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 --wait=false
      kubectl patch pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 -p '{"metadata":{"finalizers":null}}' --type=merge
      kubectl wait --for=delete pv/pvc-116dde38-9943-49c3-b2b4-4f410e23eba6
    • Re-create the PV with new configuration
      kubectl apply -f pv.json
    • Make sure the corresponding PVC is back to 'Bound' state
      kubectl get pvc -n tanzusm