During the deployment of NSX Application Platform (NAPP), the process fails at the "Deploy TKGs - Create/Update Guest Cluster" phase with the following error message: "Failed to get the Tanzu client for instance <instance_name>."
search cancel

During the deployment of NSX Application Platform (NAPP), the process fails at the "Deploy TKGs - Create/Update Guest Cluster" phase with the following error message: "Failed to get the Tanzu client for instance <instance_name>."

book

Article ID: 377325

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

During the deployment of the NSX Application Platform (NAPP) through NAPP Automation Appliance, the process fails at the "Deploy TKGs - Create/Update Guest Cluster" phase with the following error message:

"Failed to get the Tanzu client for instance <instance_name>."

Symptoms:

Sample Log from NAPP Automation Appliance:

Log in to the CLI of the NAPP Automation Appliance tool with the "root" user and navigate through the below and open the file:

/var/log/napp-automation/napp-deploy.log

{"timestamp":"2024-09-13T11:46:21Z","level":"debug","function":"WaitForVirtualMachineImages","msg":"Attempt 39: check TKR readiness and compatibility for instance default"}
{"timestamp":"2024-09-13T11:46:21Z","level":"debug","function":"isTKRReadyAndCompatibleHelper","msg":"Starting to find the best compatible TKR image for NSX 4.2.0.1.0.24210154 for instance default"}
{"timestamp":"2024-09-13T11:46:21Z","level":"debug","function":"isTKRReadyAndCompatibleHelper","msg":"Validating against nsxVersion 4.1.2.1.0"}
{"timestamp":"2024-09-13T11:46:21Z","level":"info","function":"isTKRReadyAndCompatibleHelper","msg":"failed to get the tanzu client for instance default, error: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.23.8---vmware .3-tkg.1\" not found"}
{"timestamp":"2024-09-13T11:47:21Z","level":"debug","function":"WaitForVirtualMachineImages","msg":"Attempt 40: check TKR readiness and compatibility for instance default"}
{"timestamp":"2024-09-13T11:47:21Z","level":"debug","function":"isTKRReadyAndCompatibleHelper","msg":"Starting to find the best compatible TKR image for NSX 4.2.0.1.0.24210154 for instance default"}
{"timestamp":"2024-09-13T11:47:21Z","level":"debug","function":"isTKRReadyAndCompatibleHelper","msg":"Validating against nsxVersion 4.1.2.1.0"}
{"timestamp":"2024-09-13T11:47:21Z","level":"info","function":"isTKRReadyAndCompatibleHelper","msg":"failed to get the tanzu client for instance default, error: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.23.8---vmware .3-tkg.1\" not found"}
{"timestamp":"2024-09-13T11:47:21Z","level":"debug","function":"WaitForVirtualMachineImages","msg":"TKR image readiness and compatibility check has failed, max retries exhausted for instance default"}
{"timestamp":"2024-09-13T11:47:21Z","level":"debug","function":"findTKRImageWithLatestVersionAndReady","msg":"Finding the ready/compatible TKR image in order, for instance default, NAPP version 4.2.0-0.0-24124098, NSX version 4.2.0.1.0.24210154"}
{"timestamp":"2024-09-13T11:47:21Z","level":"info","function":"findTKRImageWithLatestVersionAndReady","msg":"failed to get the tanzu client for instance default, error: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.27.6---vmware .1-fips.1-tkg.1\" not found"}
{"timestamp":"2024-09-13T11:47:21Z","level":"warning","function":"WaitForVirtualMachineImage","msg":"failed to get the tanzu client for instance default, error: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.27.6---vmware .1-fips.1-tkg.1\" not found"}
{"timestamp":"2024-09-13T11:47:21Z","level":"info","function":"RunTanzuHandler","msg":"TKG deployment is completed with error. Error was: failed to get the tanzu client for instance default, error: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.27.6---vmware .1-fips.1-tkg.1\" not found"}

The NSX version and NAPP versions may be different based on the environment customer uses. 

Also, log in to the supervisor cluster following the below steps:

(1) We need to SSH to VCenter first using root credentials

(2) Run the below command 

/usr/lib/vmware-wcp/decryptK8Pwd.py

(3) Grab the IP address and Password

(4) SSH to the IP address using root user - ssh root@<the_ip_address_from_step3>

(5) Enter the password - copied from step4

You are now connected to the Supervisor Cluster

Run the command "kubectl get tkr"

This will display the results similar to below:

NAME VERSION READY COMPATIBLE CREATED UPDATES AVAILABLE
v1.23.8---vmware.3-tkg.1 1.23.8+vmware.3-tkg.1 True True 120m
v1.26.5---vmware.2-fips.1 1.26.5+vmware.2-fips.1 False False 120m
v1.27.6---vmware.1-fips.1-tkg.1 1.27.6+vmware.1-fips.1-tkg.1 False False 120m

Some outputs may only be displayed below:

NAME VERSION READY COMPATIBLE CREATED UPDATES AVAILABLE
v1.23.8---vmware.3-tkg.1 1.23.8+vmware.3-tkg.1 True True 120m
v1.27.6---vmware.1-fips.1-tkg.1 1.27.6+vmware.1-fips.1-tkg.1 False False 120m

Environment

The error is observed in:

NAPP version 4.2.0-0.0-24124098

NSX version 4.2.0.1.0.24210154

NAPP Automation used: 4.2.0.24095980.ova

 vCenter Version: 7.0U3l and 7.0U3o

This is not specific to the above versions only. This can be observed in any combination of the above versions.

Cause

Customer uses an older version of vSphere i.e., 7.0.3 which does not support TKR 1.27.

Interoperability can be checked at https://interopmatrix.vmware.com/Interoperability -> Select the solution as "VMware vCenter Server" and the version used compare it with "Tanzu Kubernetes Releases" all versions.

By default, the NAPP automation tool checks 1.23 compatibility only if the NSX version is < 4.1.2.2, so it's not considered 1.23 for deployment.

But from NAPP Automation Appliance 4.2.0 and later, it finds the ready/compatible image in this order -> 1.27.6, 1.26.5, 1.23.8.

This operation has failed since the Tanzu client failed to find the 1.26.5 image due to either an incorrect file name or the file not being present at all. It's trying to check "v1.26.5---vmware.2-fips.1-tkg.1", whereas the version here is "v1.26.5---vmware.2-fips.1"

Resolution

1. SSH into the NAPP Automation Appliance VM using root credentials.
2. Remove 1.27.6 and 1.26.5 versions from /opt/napp/config/settings.json

It looks like below: 


{
  "TkrSupportMatrix": [
    {
      "name": "v1.23.8---vmware.3-tkg.1",
      "image": "ob-20953521-tkgs-ova-photon-3-v1.23.8---vmware.3-tkg.1",
      "compatible_napps": [
        "4.1",
        "4.1.2",
        "4.2"
      ],
      "CompatibleNSXs": [
        "4.1.2.1.0"
      ],
      "tools": "kubernetes-tools-1.23.3-00_3.8.0-1.tar.gz"
    }
  ]
}

3. Continue deployment by clicking on the “UPDATE & REDEPLOY" button under the Deploy TKG page on the UI.

 

This issue has been observed very intermittently with 7.0.3 versions of vCenter. Not all 7.0.3 versions are exhibiting this issue. 

Additional Information

You may see below error  in napp-deploy.log of napp-automation.

[ /var/log/napp-automation ]# cat napp-deploy.log | grep "WaitForVirtualimages"

{"function":"WaitForVirtualMachineImages","level":"debug","msg":"Error while fetching TKR image v1.27.6---vmware.1-fips.1-tkg.1. Error is: tanzukubernetesreleases.run.tanzu.vmware.com \"v1.27.6---vmware.1-fips.1-tkg.1\" not found.","time":"2024-10-08T13:29:41Z"}
{"function":"WaitForVirtualMachineImages","level":"debug","msg":"TKR v1.27.6---vmware.1-fips.1-tkg.1 did not show up on supervisor cluster. Max retries exhausted for instance nsx.","time":"2024-10-08T13:29:56Z"}