vSphere CSI Driver Auto Integration Failing due to csi-node-registrar
search cancel

vSphere CSI Driver Auto Integration Failing due to csi-node-registrar

book

Article ID: 326386

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

Symptoms:

vSphere CSI Driver Integration doc - https://docs.pivotal.io/tkgi/1-12/vsphere-cns.html#uninstall-csi.

After enabling the vSphere CSI Driver Integration on TKGI tile and applying change successfully, when we try to upgrade a cluster, it fails on a worker node with csi-node-registrar being down. Below is the error it shows in csi-node-registrar stderr logs:

I0721 08:29:18.240065 10505 main.go:113] Version: v2.1.0-0-g80d42f2
 I0721 08:29:18.240719 10505 main.go:137] Attempting to open a gRPC connection with: "/var/vcap/data/kubelet/plugins/csi.vsphere.vmware.com/csi.sock"
 I0721 08:29:18.240737 10505 connection.go:153] Connecting to unix:///var/vcap/data/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
 I0721 08:29:18.241191 10505 main.go:144] Calling CSI driver to discover driver name
 I0721 08:29:18.241213 10505 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
 I0721 08:29:18.241219 10505 connection.go:183] GRPC request: {}
 I0721 08:29:18.243326 10505 connection.go:185] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v2.3.0"}
 I0721 08:29:18.243381 10505 connection.go:186] GRPC error: <nil>
 I0721 08:29:18.243389 10505 main.go:154] CSI driver name: "csi.vsphere.vmware.com"
 I0721 08:29:18.243462 10505 node_register.go:52] Starting Registration Server at: /var/vcap/data/kubelet/plugins_registry/csi.vsphere.vmware.com-reg.sock
 I0721 08:29:18.243607 10505 node_register.go:61] Registration Server started at: /var/vcap/data/kubelet/plugins_registry/csi.vsphere.vmware.com-reg.sock
 I0721 08:29:18.243660 10505 node_register.go:86] Starting healthz server at HTTP endpoint: :9809
 F0721 08:29:18.248887 10505 node_register.go:105] listen tcp :9809: bind: address already in use
 goroutine 4 [running]:


Environment

VMware Tanzu Kubernetes Grid Integrated Edition 1.x

Cause

If a manual CSI driver is being used, it'll occupy port 9809 which is the default port for csi-node-registrar. So the error "listen tcp :9809: bind: address already in use" is expected because the manual CSI installation conflicts with the automatic CSI installation.
 

Resolution

Follow the steps below to workaround this issue:

1. Change the port 9809 to a different port value (such as 9909) in the manual CSI yaml file - https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v2.3.1/manifests/vanilla/vsphere-csi-driver.yaml.
Please note that there are two places you need to update.
2. Remove the livenessProbe section from the manual CSI manifest. (below section to be removed)
        - name: liveness-probe
          image: quay.io/k8scsi/livenessprobe:v2.2.0
          args:
            - "--v=4"
            - "--csi-address=/csi/csi.sock"
          volumeMounts:
            - name: plugin-dir
              mountPath: /csi
3. Apply the manifest after making the above changes.
4. Switch the manual CSI installation to automatic CSI installation per the guide https://docs.pivotal.io/tkgi/1-12/vsphere-cns.html#uninstall-csi

 


Additional Information

Impact/Risks:

It won't let the csi-node-registrar process to start. The node status will be showing as "Failing" and it'll not let the upgrade to be completed.