Autoscaler pod keeps crashing every 10 seconds
search cancel

Autoscaler pod keeps crashing every 10 seconds

book

Article ID: 380761

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu VMware vSphere 7.0 with Tanzu

Issue/Introduction

Autoscaler pod keeps crashing with the error:

F0829 13:56:06.574078 1 clusterapi_provider.go:205] could not find preferred version for CAPI group "cluster.x-k8s.io": failed to get ServerGroups: Get "https://##.##.##.##/api?timeout=32s": net/http: TLS handshake timeout

Environment

vSphere with Tanzu 8.0.3  tkg-service starts from v3.0.0

Isolated networks between Supervisor and Guest cluster

 

Cause

Autoscaler pod can not talk with supervisor apiserver through a floating IP. This IP points to one Control Plane(CP) node of the supervisor randomly.

The CP node will get the package but it will try to respond through the additional NIC and that breaks the routing because that's non-symmetric routing.

Resolution

This is a known issue.  There is a workaround provided internally within this KB article. 

Fixed in:

vCenter v8.0.3 and tkg-service v3.3.0

Action:

Open a case with Broadcom Support and an Engineer will assist you with the workaround steps. 

Additional Information

  1. This workaround will still exists even after upgrading autoscaler version. 
  2. To remove the workaround after upgrading to a fixed VC or tkg-service version.

kubectl patch pkgi autoscaler -n tkg-system --type=json -p='[{"op": "remove", "path": "/metadata/annotations/ext.packaging.carvel.dev~1ytt-paths-from-secret-name.0"}]'