Docker fails to start with "failed to dial "/run/containerd/containerd.sock"" in TKGI
search cancel

Docker fails to start with "failed to dial "/run/containerd/containerd.sock"" in TKGI

book

Article ID: 327474

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:

Docker and kubelet are failing to start on TKGi nodes.

kubelet is failing due to unavailability of docker and docker is logging this error message in docker.stderr.log:

failed to start daemon: failed to dial "/run/containerd/containerd.sock": failed to dial "/run/containerd/containerd.sock": context deadline exceeded


This can be experienced on TKGi v1.11, 1.12 and 1.13.


Environment

VMware Tanzu Kubernetes Grid Integrated Edition 1.x

Cause

There is another kubernetes workload on the cluster that is mounting /run/containerd/containerd.sock and this is in turn preventing docker from mounting it and starting up.

Resolution

This is a known issue and fixed in TKGi v1.12.5 and v1.13.3


Workaround:
To workaround the issue, remove the directory on the impacted Worker node.

bosh -d service-instance_<GUID> ssh worker/<ID>
rmdir /run/containerd/containerd.sock


Confirm docker has started ok

monit summary
tail -f /var/vcap/sys/log/docker/docker.stderr.log