TKGi upgrade fails on jobs pks-nsx-t-ncp and pks-nsx-t-prepare-master-vm
search cancel

TKGi upgrade fails on jobs pks-nsx-t-ncp and pks-nsx-t-prepare-master-vm

book

Article ID: 413539

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

TKGi cluster upgrade fails with one or more master nodes VM in failing or unknown ( - ) state. 

Failed jobs pks-nsx-t-ncp and pks-nsx-t-prepare-master-vm

Bosh cpi and TKGi networking (NCP) is configured for NSX-T Manager API. 

pks-nsx-t-prepare-master-vm log shows:

Current cluster NSX API mode: Policy
Registering client certificate
[GET /trust-management/principal-identities][500] getPrincipalIdentitiesInternalServerError  &{RelatedAPIError:{Details:Client certificate not found in trust store ErrorCode:99 ErrorData:<nil> ErrorMessage:Internal server error has occurred. ModuleName:common-services} RelatedErrors:[]}

Environment

TKGi 1.21.x

TKGi 1.22.x

Cause

The cause of this problem is still being investigated. Please collect cluster logs before apply workaround, and open a support case. 

 

Resolution

As a workaround we can redeploy the cluster using the manifest.

Download the cluster manifest:

bosh -d ServiceInstance-UID manifest > SI.yml 

Verify policy-api is set to false. 

Redeploy the cluster:

bosh -d SI-UID deploy SI.yml 

Finish the upgrade:

tkgi upgrade-cluster XXYY