2.1.1 to 2.2.0 TKGm management cluster upgrade to k8s version 1.25.x fails when using Antrea CNI
search cancel

2.1.1 to 2.2.0 TKGm management cluster upgrade to k8s version 1.25.x fails when using Antrea CNI

book

Article ID: 369894

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid

Issue/Introduction

  • TKGm management cluster upgrade from 2.1.1 to 2.2.0 fails when using Antrea CNI and upgrading to k8s version 1.25.x because of issues with old EndpointSlice API
  • The control plane nodes upgrade as expected but worker nodes fail to upgrade.

Cause

  • TKGm enables Antrea CNI's EndpointSlice feature to enable dual-stack cluster support.
  • TKGm 2.1.1 includes Antrea 1.7.2, which reconciles EndpointSlice API version v1beta1. This version of the EndpointSlice resource was removed in k8s 1.25 (see the Deprecated API Migration Guide Kubernetes docs)
  • During an upgrade to TKGm 2.2.0 and k8s version 1.25.x, the API server is updated and no longer includes EndpointSlice v1beta1 while Antrea CNI pods running on worker nodes still expect to reconcile that resource version.
  • This causes the upgrade to fail.

Resolution

  • Upgrading to TKGm 2.2.0 and k8s version 1.23.x or 1.24.x will succeed.
  • This will upgrade Antrea to version 1.9.0 which reconciles EndpointSlice v1 resources.
  • The EndpointSlice v1 version is available since k8s 1.21.
  • To upgrade to k8s 1.25.x perform a 2-step upgrade, first upgrading to one of the versions mentioned above and then to 1.25.x.