Installing Local Consumption Interface (LCI) fails with error "Failed to determine ingress IP"

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When trying to install Local Consumption Interface 1.0.2 service in supervisor, it fails with "Failed to determine ingress IP"

Environment

When performing a curl from outside of the Supervisor Control Plane VMs', using the Floating IP, it returns a "502 Bad Gateway" instead of the JSON output. The nginx on port 443 of the master node has a pre-existing forward towards /appplatform[0-9].

$ curl -k -i https://<Supervisor IP>/appplatform1/plugin.json

HTTP/1.1 200 Connection established

HTTP/1.1 502 Bad Gateway
Server: nginx/1.22.0
Date: <day<, <date> <time> <timezone>
Content-Type: text/html
Content-Length: 157
Connection: keep-alive

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.22.0/center>
</body>
</html>

# kubectl events --namespace svc-cci-service-domain-c9

12m Warning VCUIPpluginBackendError VCUIPplugin/cci-ns-plugin backend of vcuPlugin named cci-ns-plugin is not ready
7m12s (x2410 over 21h) Warning VCUIPpluginBackendError VCUIPplugin/cci-ns-plugin backend of vcuPlugin named cci-ns-plugin is not ready. Reason: timeout 
7m25s (x2410 over 21h) Warning VCUIPpluginBackendError VCUIPplugin/cci-ns-plugin backend of vcuPlugin named cci-ns-plugin is not ready. Reason: Failed to determine ingress IP

We are unable to see the vSphere Kubernetes service plugin in vCenter UI under administration -> Client plugins
We see below messages from vmware-system-appplatform-operator logs:

I0919 06:07:42.861878 1 deleg.go:130] vcuiplugin "msg"="MasterProxy Role already exists" "name"="masterproxy-tkgs-plugin" "namespace"="svc-tkg-domain-c9"
I0919 06:07:42.861899 1 deleg.go:130] vcuiplugin "msg"="MasterProxy ServiceAccount already exists" "name"="masterproxy-tkgs-plugin" "namespace"="svc-tkg-domain-c9"
I0919 06:07:42.861915 1 deleg.go:130] vcuiplugin "msg"="MasterProxy RoleBinding already exists" "name"="masterproxy-tkgs-plugin" "namespace"="svc-tkg-domain-c9"
I0919 06:07:42.861946 1 deleg.go:130] vcuiplugin "msg"="Nginx DaemonSet already exists" "name"="masterproxy-tkgs-plugin" "namespace"="svc-tkg-domain-c9"
I0919 06:07:42.861965 1 deleg.go:130] vcuiplugin "msg"="Updating DaemonSet" "name"="masterproxy-tkgs-plugin" "namespace"="svc-tkg-domain-c9"
I0919 06:07:42.867230 1 deleg.go:130] vcuiplugin "msg"="UI backend service is not ready yet" "error"={}
E0919 06:07:42.867272 1 controller.go:304] controller-runtime/manager/controller/vcuiplugin-controller "msg"="Reconciler error" "error"="failed to determine ingress IP" "name"="tkgs-plugin" "namespace"="svc-tkg-domain-c9"

Cause

This is known issue when supervisor is deployed with DHCP. The issue is not seen with supervisors deployed with static IP.
management_network_floating_ip in kube-system/wcp-cluster-config is empty in DHCP fip mode, where Appplaform is reading from that field and missing the field causes the error "failed to determine ingress IP" for UI plugin.

Resolution

Issue will be resolved in future releases of vCenter/supervisor

Workaround:

1. SSH to control plane VM of the supervisoe

2. kubectl edit cm wcp-cluster-config -n kube-system, and change the value of management_network_floating_ip to
:
management_network_floating_ip: <supervisor_IP>

3. This would persist in the configmap and appplatform is also caching the change. If problem not resolved, check the output in 5 minutes: "k get cm wcp-cluster-config -n kube-system -o yaml | grep management_network_floating_ip", to see if the value gets reset to empty.

(NOTE: This is likely not needed and not preferred!!)

If the value can't persist in configmap, we can change the code in /usr/lib/vmware-wcp/update-controller/wcp_cluster_setting.py:

        data[VC_TRUST_BUNDLE] = nodeCfgObj.decode_vc_trust_bundle(
            data.get(VC_TRUST_BUNDLE))
        data["management_network_floating_ip"] = "<supervisor_IP>" ### <== add this line
        logging.debug("setting management ip to <supervisor_IP>")  ### <== add this line

Change the file on all 3 CPVMs and run "systemctl restart wcp-sync"