During an upgrade where we have BOSH VM changes, (Stemcells, or any other change that recreates the bosh VMs), previous nodes are drained and VMs are recreated during the upgrade.
In such cases, newly created VMs do not retain the existing labels, and the errand assigns labels based on the new VM IP address. As a result, the the stateful pods (anything dependent on node label: kubernetes.io/hostname) remain in a Pending state.
For example, the postgres pods and PV's.
The PV for postgres ql-pg-data-postgresql-0 has a nodeAffinity required flag that points to the node label of "kubernetes.io/hostname=<IP>"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.1o/hostname
operator: In values:
- 10.###.##.##
If the IP address changes for the postgres VM, this label incorrectly points to the new IP in the new node, but the PV still references the old IP, causing a mismatch. The pod will hang in Pending state with a scheduling error:
"1 node(s) had volume node affinity conflict".
Tanzu Hub 10.3.0
The kubelet manifest does not assign stable hostnames and bosh set hostname as IP address of the VM. If during VM recreation, the IP address changes, the hostname changes as well, and the existing PV cannot be rebound to the node. The PV node affinity is hardcoded to hostname in local path provisioner.
This will be fixed in a future release of Tanzu Hub.
---
export BOSH_DEPLOYMENT=$(bosh deployments | grep -E '^hub-[0-9a-f]{20}' | awk '{print $1}')
bosh instances --details --json | jq -r '.Tables[0].Rows[] | select(.disk_cids != "") | "\(.instance) \(.ips)"' | while read inst ip; do echo "=== $inst ($ip) ==="; bosh ssh "$inst" -c "ls -l /var/vcap/store" | grep -E 'pvc-|==='; done
bosh ssh registry
existing=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); kubectl get pv -o json | jq -r --arg nodes "$existing" '.items[] | select(.spec.nodeAffinity != null) | .metadata.name as $pv | .spec.nodeAffinity.required.nodeSelectorTerms[].matchExpressions[] | select(.key == "kubernetes.io/hostname") | .values[] | select(. as $v | ($nodes | split(" ") | index($v)) == null) | "\($pv) -> \(.)"'
kubectl get pv
kubectl get pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 -o json | jq 'del(.metadata.creationTimestamp,.metadata.resourceVersion,.metadata.uid,.metadata.managedFields,.spec.claimRef,.status)' >pv.json
vi pv.json
kubectl delete pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 --wait=false
kubectl patch pv pvc-116dde38-9943-49c3-b2b4-4f410e23eba6 -p '{"metadata":{"finalizers":null}}' --type=merge
kubectl wait --for=delete pv/pvc-116dde38-9943-49c3-b2b4-4f410e23eba6
kubectl apply -f pv.json
kubectl get pvc -n tanzusm