Troubleshooting Kubernetes pods not starting after changing the virtual machine hostname
search cancel

Troubleshooting Kubernetes pods not starting after changing the virtual machine hostname

book

Article ID: 326025

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Identify the symptoms you may encounter if FQDN swaps to shortname.

Symptoms:
  • You deploy and install a new Aria Automation 8.x instance.
  • Hostname of the VM changes from FQDN to shortname
  • You find a duplicate K8s node listed when running kubectl get nodes -o wide - the FQDN (original, good) & the shortname (bad)
  • The original K8s node (the master / control-plane) was listed as in a NotReady status, while the bad node was listed as Ready.
  • As a result of the hostname change, the critical kube-system pods were not functional.
  • Pods are trying to use the bad / ghost K8s node instead of the proper, original node in FQDN format.
  • The prelude pods are trying to start but are also failing.
  • You try to remove the bad / ghost node, but it always returns after a VM reboot, or after restarting the kubelet service.


Environment

VMware vRealize Automation 8.x
VMware Aria Automation 8.x

Resolution

  1. Changed the hostname from the shortname back to the FQDN
    hostnamectl set-hostname fqdn
  2. Stop services
    /opt/scripts/deploy.sh --shutdown
  3. Attempt to remove the bad / ghost node:
    kubectl drain shortname --ignore-daemonsets --delete-local-data && kubectl delete shortname
  4. Restart kubelet and check the bad / ghost node is removed.
  5. Rebooted the VM.
  6. After the VM rebooted you notice the bad / ghost node has not returned.
  7. The original K8s node is now listed in a Ready state
  8. Start services
    /opt/scripts/deploy.sh
  9. kube-system pods are now reporting healthy.
  10. The kubelet service is now reporting as healthy.
  11. The prelude pods started coming up after the amount of time needed (about 10-15 mis)
  12. Aria Automation user interface is now again accessible.