How to recover supervisor and guest cluster control plane LB IP hosted on Avi Load balancer
search cancel

How to recover supervisor and guest cluster control plane LB IP hosted on Avi Load balancer

book

Article ID: 390201

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

User is unable to connect to control-plane service with Error : "Error occurred during HTTP request: Get "https://<redacted>/wcp/loginbanner": tls: failed to verify certificate: x509: certificate is valid for <redacted>, not <redacted>"

[root@worker1:~# kubectl vsphere login --server=https://<redacted>
ERRO[2025-04-08 14:11:41.262] Error occurred during HTTP request: Get "https://<redacted>/wcp/loginbanner": tls: failed to verify certificate: x509: certificate is valid for <redacted>, not <redacted>
There was an error when trying to connect to the server.
Please check the server URL and try again.

 

IP addresses for Supervisor Cluster kube-apiserver, vsphere-csi-controller and Guest Cluster control plane services have changed.

Environment

vSphere Supervisor v1.28 and lower

AVI-AKO

Cause

This is a known issue caused by the environment's AVI-AKO build version included with the vCenter and Supervisor cluster.

There are a couple of scenarios when the user can land into an issue leading to a change in IP address of the control-plane services. 

  •  If kube-apiserver-lb-svc service gets impacted, it can lead AKO to delete the config on Avi. At a later point, when the service comes back up healthy, AKO will be notified to recreate the services on Avi. As all the virtual services get deleted and recreated by AKO, the IPs for virtual services (hosting control-plane svc) can change.

  •  In a SDDC-managed infra, if multiple vCenter Servers are sharing a single NSX cloud and a new cluster is created in a new workload domain, it can lead to the deletion of existing virtual services on Avi present for another cluster. This happens as the initials of the cluster name (domain-<>)  is auto-created and can end up being the same for two different clusters (part of different workload domain). This is currently an unsupported configuration, and the newly created cluster must be deleted followed by AKO restart on supervisor to recover the virtual services for the old cluster. 
    Reference document: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere-supervisor/8-0/vsphere-supervisor-concepts-and-planning/supervisor-architecture-and-components/supervisor-networking.html 

 



Resolution

Resolution

This issue is resolved in vCenter 8.0u3E and Supervisor cluster version 1.29.7.

 

Recovery Workaround

The IP addresses for the services will need to be reverted to the original IPs, and virtual services deleted to be recreated within AVI.

Steps to restore IP in both VDS and NSX WCP deployments:

    1. Connect into the Supervisor cluster context

    2. Run below command on supervisor cluster to get the currently assigned IPs for the control plane services. 
      kubectl get service -A | awk '/LoadBalancer/ {print $1,$2,$5}' > svcs.txt

       

    3. The above "svcs.txt" file will need to be manually corrected to the original, expected IPs.
      • Steps to Validate from AVI Web UI:

        1. In the AVI web UI, navigate to Operations

        2. Click on Config Audit Trail on the left

        3. With the magnifying glass, search for the name of the affected service
        4. Locate a recent CONFIG_DELETE event and open its details to find the original IP address
          • The next CONFIG_CREATE event has the details of the current, incorrect IP address


      • Steps to Validate Workload Cluster control-plane-services:
        1. In the Supervisor cluster context, use the below command to get a list of the Control Plane Endpoint IPs for each affected workload cluster:

          kubectl get cluster -o yaml -A | egrep -i "cluster-name|endpoint" -A1

          This endpoint is equivalent to the External IP address of each <cluster-name>-control-plane-service LoadBalancer service for the affected workload cluster(s).

        2. The above output can be compared to the External IP address of the control-plane-service LoadBalancer services for the affected workload cluster(s):
          kubectl get svcs -A | grep "control-plane"

           

           

    4. Run below command to identify and document the number of replicas for the net-operator service 
       kubectl get deployments -n vmware-system-netop

       

    5. Scale down the netop-controller-manager deployment:
      kubectl scale deployment vmware-system-netop-controller-manager -n vmware-system-netop  --replicas=0

       

    6. The affected virtual services in the AVI will need to be deleted from the AVI web UI.
      • Navigate to the Applications tab in AVI web UI in a web browser

      • Switch to Virtual Services view in AVI web UI

      • Hit the Checkbox for each affected Virtual Service then click on DELETE in the top left

      • In the ensuing pop-up window, select Delete Virtual Services(VS) and referenced objects



    7. In the Supervisor cluster context, create the following script:
      vi correct-svcs.sh
      
      #!/usr/bin/bash
      filename="$1"
      while IFS=' ' read -r f1 f2 f3; do
      kubectl get gateway "$f2" -n "$f1" -oyaml | sed 's/addresses: \[\]/addresses:\n    - type: IPAddress\n      value: '"$f3"'/1' | kubectl replace -f -
      done < "$filename"

       

    8. Correct the above script's permissions for execution:
      chmod 777 correct-svcs.sh

       

    9. Run the above script which will iterate through the information in the svcs.txt file created in Step 2 to correct the respective gateway Kubernetes objects:
      ./correct-svcs.sh svcs.txt

       

    10. Restart the AVI-AKO pod in the Supervisor cluster:
      kubectl rollout restart deploy -n vmware-system-ako


    11. Scale the net-operator pods back up:
      kubectl scale deployment vmware-system-netop-controller-manager -n vmware-system-netop  --replicas=<count from Step 4>

       

    12. Verify that the services are pointed to the correct IPs:
      kubectl get svc -n <virtual service namespace>

      • If the services are not pointed to the correct IPs, you will need to verify what may be still holding the IPs and re-run all of the above steps for the specific services that are still showing as incorrect.
      • If the services are pointed to the correct IPs/you had updated to the correct IPs and the issue continues, it may be the case that IPAM/DNS profiles in AVi are pointed to the noted wrong Network Profile. Make sure they are pointed to the right Network Profile.