Symptoms:
vracli and kubectl commands returned errors contacting localhost / cluster on port 6443 (k8s)
The connection to the server vra-k8s.local:6443 was refused - did you specify the right host or port?"kubectl -n prelude get pods -o wide on another node may show 3 pods scheduled for each service, but the affected node's pods show 0/# containers started and node name <none>
systemctl status dockersystemctl restart dockerfailed to start daemon: failed to dial"/run/containerd/containerd.sock": unknown servicecontainerd.services.namespaces.v1.Namespaces: not implemented
dockerd --debugjournalctl -xeu dockersystemctl status containerd/etc/hosts file looks fine on this nodedf -h shows good disk space on all filesystemsVMware Aria Automation 8.x
There was an issue with the docker service which keeps it from starting on this node.
As a precaution, it is best to take a simultaneous non-memory snapshot of all Automation nodes. This can be done in vSphere if the task fails in Aria Lifecycle.
If only one node in the cluster is affected, reboot the affected node only.
There are 2 ways you can do this:
rebootWith no obvious config issue on the system, Docker can be expected to start successfully on reboot.
If all nodes in the cluster face this issue, cluster services can be restarted with /opt/scripts/deploy.sh with an expected downtime of about 30 minutes.