Upgrading Supervisor from VCF 9.0 configured with Foundation Load Balancer gets stuck in configuring State with error "Timed Out Waiting for LB Service Update"
search cancel

Upgrading Supervisor from VCF 9.0 configured with Foundation Load Balancer gets stuck in configuring State with error "Timed Out Waiting for LB Service Update"

book

Article ID: 406096

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Supervisor Upgrade is stuck in the configuring state indefinitely. A message saying the "Foundation Load Balancer VMs become unhealthy" is displayed to the user and the health of the Foundation Load Balancer VMs turns to Yellow. In the Administration page under Foundation Load Balancers, the health check may show the error "xDS connection is not ready. Current state: STREAM_NOT_READY".

  • In the Supervisor page, the below error message is displayed.
    Configured Load Balancer fronting the Kubernetes API Server. Timed out waiting for LB service update. This operation is part of the cluster enablement and will be retried.
  • NetOperator Pod Logs show that the error "network interface not found"

    [YYYY-MM-DDTHH:MM:SS]1 foundationloadbalancerconfig.go:91] "msg"="unable to reconcile FoundationLoadBalancerConfig" "error"="network interface not found" "logger"="controllers.LoadBalancerConfig.supervisor" "name"="Network_Interface_Name"
    [YYYY-MM-DDTHH:MM:SS]1 loadbalancerconfig_controller.go:243] "msg"="Provider Sync failed" "error"="network interface not found"
    [YYYY-MM-DDTHH:MM:SS]1 controller.go:324] "msg"="Reconciler error" "error"="network interface not found"

  • wcp logs in the vCenter server state that it is waiting on LB service

    [YYYY-MM-DDTHH:MM:SS] error wcp [apiserver/manager.go:164] [opID=###############] server closed the connection while watching kube-apiserver-lb-svc service
    [YYYY-MM-DDTHH:MM:SS] error wcp [kubelifecycle/controller.go:1943] [opID==###############]] An error occurred fetching the virtual IP: Server closed the connection while watching LB service. This operation is part of the cluster enablement and will be retried.

Environment

  • VMware vSphere Kubernetes Service

Cause

  • The operator responsible for reconciling the Foundation Load Balancer cannot find the data structures required to program the control plane. A code change migrated the label selector for the NetworkInterface objects without migrating the old selectors. When the Supervisor is upgraded, these new labels may not be present.

Resolution

  • Broadcom Engineering is aware of the issue, and it will be fixed in the next release of VCF 9.0

Workaround: Log in to the Supervisor and run the attached script (fix-flb-netwokinterface-labels.sh) to add the necessary labels to the interfaces.

    • SSH into the Supervisor Control Plane, referring to the steps in the article 323407

    • Copy the script fix-flb-netwokinterface-labels.sh to the supervisor VM.

    • Make the Script Executable 
      chmod +x /tmp/fix-flb-networkinterface-labels.sh

    • Run the Script
      ./fix-flb-networkinterface-labels.sh

    • After running the script, the Supervisor upgrade should be unblocked and will complete automatically.

Attachments

fix-flb-netwokinterface-labels.sh.txt get_app