TKGI Worker node in failing state in bosh due to csi-node-registrar in "Does not exist" status
search cancel

TKGI Worker node in failing state in bosh due to csi-node-registrar in "Does not exist" status

book

Article ID: 386780

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

When sshed into a worker node and run monit summary you see: 

Process 'containerd'                running
Process 'kubelet'                   running
Process 'kube-proxy'                running
Process 'disk-pressure-watch'       running
Process 'csi-node-registrar'        Does not exist
Process 'csi-node'                  running
Process 'csi-livenessprobe'         running
Process 'blackbox'                  running
Process 'nsx-node-agent'            running
Process 'ovsdb-server'              running
Process 'ovs-vswitchd'              running
Process 'nsx-kube-proxy'            running
Process 'telegraf'                  running
Process 'node_exporter'             running
Process 'bosh-dns'                  running
Process 'bosh-dns-resolvconf'       running
Process 'bosh-dns-healthcheck'      running
Process 'system-metrics-agent'      running

 

The log  /var/vcap/sys/log/csi-node-service/csi-node-driver-registrar.stderr.log you see this error: 

I0127 20:07:29.138316  338074 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to fetch node object with name "######-####-####-####-#########". Error: nodes "######-####-####-####-#########" not found,}
E0127 20:07:29.138333  338074 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to fetch node object with name "######-####-####-####-#########". Error: nodes "######-####-####-####-#########" not found, restarting registration container.

Environment

TKGI: 1.19.x

Cause

Network disconnects between nodes might cause this condition. 

Resolution

Run monit restart csi-node and csi-registrar will come back up.

 

If it does not come back up please open a SR with Broadcom support team.