The Envoy service in the Tanzu Kubernetes Cluster (TKC) is unable to obtain an external IP address.
The TKC upgrade was performed while the Envoy service was in a Pending state. During the node rollout phase, an external IP address was temporarily assigned to the Envoy service; however, after the upgrade completed, the Envoy service reverted to the Pending state and the Contour package remained stuck in the Reconciling phase.
The affected TKC is configured to use a static IP address in the cluster YAML
root@xxxx-xxxx-xxxx-control-plane[ / ]# k get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tanzu-system-ingress envoy LoadBalancer 10.xxx.xxx.xx <pending> 80:30029/TCP,443:31188/TCP 85d
We can see the endpoints are assigned:
root@ib2-intranet-mgmt-prod-cluster-1-control-plane-gplwz [ /var/log ]# k get endpoints -A
NAMESPACE NAME ENDPOINTS
tanzu-system-ingress envoy 192.xxx.x.x:8443,192.xxx.x.x:8443,192.xxx.x.x:8443 + 3 more... 85d
We can see the below error while describing the service:
root@xxxx-xxxx-xxxx-control-plane [ /var/log ]# k describe svc envoy -n tanzu-system-ingress
Name: envoy
Namespace: tanzu-system-ingress
:
:
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal UpdatedLoadBalancer 49m (x12 over 4h39m) service-controller Updated load balancer with new hosts
Normal EnsuringLoadBalancer 46m service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 46m service-controller Ensured load balancer
Normal UpdatedLoadBalancer 42m (x2 over 42m) service-controller Updated load balancer with new hosts
Normal Removed 33m avi-kubernetes-operator Removed virtualservice for envoy
Warning FailedToUpdateEndpointSlices 28m (x6 over 28m) endpoint-slice-controller Error updating Endpoint Slices for Service tanzu-system-ingress/envoy: skipping Pod envoy-rkcgd for Service tanzu-system-ingress/envoy: Node xxx-xxxxx-xxxxx-xxxxx-xxxxx Not Found
Container logs of Contour from control plane shows the below:
YYYY-MM-DDT07:50:39.742864015Z stderr F time="YYYY-MM-DDT07:50:39Z" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address=10.XX.XX.XXX
YYYY-MM-DDT07:50:39.743093939Z stderr F time="YYYY-MM-DDT07:50:39Z" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address=10.XX.XX.XXX
YYYY-MM-DDT07:50:39.743106316Z stderr F time="YYYY-MM-DDT07:50:39Z" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address=10.XX.XX.XXX
YYYY-MM-DDT07:50:39.743109371Z stderr F time="YYYY-MM-DDT07:50:39Z" level=info msg="received a new address for status.loadBalancer" context=loadBalancerStatusWriter loadbalancer-address= <========== no IP
Based on the CPI(guest cluster cloud provider) logs:
the last service update occurred at 07:55:31 due to node change:
I1216 07:55:31.463812 1 event.go:307] "Event occurred" object="tanzu-system-ingress/envoy" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="UpdatedLoadBalancer" message="Updated load balancer with new hosts"
VMware vCenter Server: 8.X
Tanzu Kubernetes Runtime
Based on the kapp controller logs:
This reconciliation process likely reset or cleared the Service.Status.LoadBalancer.Ingress field for the Envoy service.
The kapp-controller logs show that it updated the contour app resources (including the envoy service) at YYYY-MM-DDT07:58:54Z
{"level":"info","ts":"YYYY-MM-DDT07:58:54Z","logger":"kc.controller.app","msg":"Updating status","request":{"name":"contour","namespace":"default"},"desc":"flushing: flush all"}
As a result, the service transitioned into a Pending state, awaiting reassignment of a load balancer IP.
The Kubernetes Service Controller detected an update to the Service object; however, no synchronization with the Supervisor Service was triggered because there were no changes to attributes within the Service specification (spec). As a result, the Envoy Service status remains in a Pending state.
Restart the Cloud Provider Interface (CPI) pod in the guest cluster to trigger a resynchronization between the Supervisor vmservice and the guest cluster service.
Steps:
kubectl get deploy -A | grep guest-cluster-cloud-provider
kubectl rollout restart deploy guest-cluster-cloud-provider -n <cloud provider namespace>
LoadBalancer is configured without a static IP address, the load balancer may allocate a new IP address after a restoring namespace operation from backup, as the request is treated as a new service provisioning event.