TCA 2.1 Management Cluster vmconfig-operator Pod/Service in CrashLoop State with Null Pointer Exception
search cancel

TCA 2.1 Management Cluster vmconfig-operator Pod/Service in CrashLoop State with Null Pointer Exception

book

Article ID: 314263

calendar_today

Updated On:

Products

VMware VMware Telco Cloud Automation

Issue/Introduction

Symptoms:
Nodepool deployments may fail with error message "invalid memory address or nil pointer dereference"​ in the vmconfig-operator logs.

Environment

VMware Telco Cloud Automation 2.1.1
VMware Telco Cloud Automation 2.1
VMware Telco Cloud Automation 2.0
VMware Telco Cloud Automation 2.0.1

Cause

During nodepool deployment, TCA validates ESXi servers in the cellsite group are properly configured. The validation results in a vmconfig-operator panic if ESXi servers are unresponsive.

Example vmconfig-operator log entry:​
2022-10-15T19:53:35.581Z INFO esxinfo-plugins.esxinfoStatusHandlerPlugin
Received esxi host configuration from vc successfully
{"esxinfo": "tca-system/esxi_srv001.corp.domain.net"}​
E1015 19:53:35.581226       1 runtime.go:78]
Observed a panic: "invalid memory address or nil pointer dereference"
(runtime error: invalid memory address or nil pointer dereference)


Resolution

This issue is addressed in a code fix in TCA 2.2

Workaround:
There are two workaround options for recovery after isolating the unresponsive ESXi servers in the vmconfig-operator log:​

1.       Reconnect the unresponsive host, then delete the vmconfig-operator pod on tkg-mgmt. The vmconfig-operator will be rescheduled and started, then the nodepool deployment can proceed​
2.       If the ESXi host connectivity issue cannot be resolved: The host validation can be prevented by replacing the host CR profile entry with an empty string.​
a.       Save the existing host CR to backup:​
kubectl get esxinfo -n tca-system <hostname-fqdn> -o yaml > /tmp/<hostname-fqdn>.backup​
b.       Open the host CR with the following command, identify the hostprofile name line and change it to empty quotes ("")​
kubectl edit esxinfo -n tca-system <hostname-fqdn> -o yaml​
c.       Restart the vmconfig-operator pod with the kubectl delete pod –n <namespace> <pod> command