Symptoms:
etcd.service
& kube-apiserver.service
will not start, even if restarted with systemctl. You may see these repeated failures in the tty (VM console):kubelet
cannot connect to the k8s system (it will also fail if restarted) and deploy.sh
cannot run.docker
service is running, however listing the active images with docker ps
yields no results.failed to detect default host (could not find default route)
the server is already initialized as member before, starting as etcd member...
/health error; no leader (status code 503)
curl: (22) The requested URL returned error: 503
VMware Aria Automation 8.x
This appears to occur in clustered environments where the former etcd leader is offline and a "leadership election" can't occur.
This may be seen in the systemd journal for etcd using the journalctl
command:
Jan 01 08:54:32 vranode.example.com etcd[721]: failed to detect default host (could not find default route)
Jan 01 08:54:32 vranode.example.com etcd[721]: the server is already initialized as member before, starting as etcd member...
If the leader is inaccessible, the other nodes can get stuck in this loop, waiting:
Jan 01 08:54:34 vranode.example.com etcd[721]: /health error; no leader (status code 503)
Jan 01 08:54:34 vranode.example.com etcd[721]: curl: (22) The requested URL returned error: 503
The currently-offline node may see the following error in its journal, indicating that the VM has no connected NIC and the OS itself cannot do anything about this:
unexpected command output Device "eth0" does not exist
This issue has been seen to occur on the VM-physical level (i.e. the VM's virtual hardware in vSphere).
The following workaround steps may help to resolve the issue:
curl -kv telnet://<NODE_FQDN>:<PORT_NUMBER>
rm -f /var/vmware/prelude/docker/last-cleanup
etcd.service
& kube-apiserver.service
still do not start up, please try the following alternative workarounds: