TKGi cluster VMs failing with "no such host" ETCD lookup errors in /var/vcap/sys/log/ jobs logs
search cancel

TKGi cluster VMs failing with "no such host" ETCD lookup errors in /var/vcap/sys/log/ jobs logs

book

Article ID: 368396

calendar_today

Updated On:

Products

VMware PKS 1.x VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

VMs in a cluster go into failing state with several Bosh jobs failing.

Failing jobs report errors during masters' etcd FQDN resolution in /var/vcap/sys/log/

SubChannel #101 grpc: addrConn.createTransport failed to connect to {Addr: "master-0.etcd.<>:2379", ServerName: "master-0.etcd.<>", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup master-0.etcd.<> on <DNS-IP>:53: no such host"

nslookup shows connection errors to the bosh-dns server configured in /etc/resolv.conf.

Cause

This is likely caused by expired bosh-dns leaf certificates in the cluster.

Resolution

Follow Rotate bosh-dns leaf certificates using maestro KB to rotate already expired bosh-dns certificates.