TKC Stuck in Creating with GPU-enabled VM Class on Supervisor When Using Photon OS
search cancel

TKC Stuck in Creating with GPU-enabled VM Class on Supervisor When Using Photon OS

book

Article ID: 408352

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When deploying a TanzuKubernetesCluster (TKC) with a GPU-enabled VM Class on Supervisor 8, the cluster may remain stuck in a Creating/Provisioning state. GPU worker VMs power on but initially show GuestBootstrap = Unknown (NoBootstrapStatus) and later report as unhealthy with NodeHealthy=False and the message Node condition MemoryPressure is True. Non-GPU TKCs deploy successfully, but GPU-enabled TKCs fail to become Ready.

Cause

GPU workers are not supported on Photon-based OS images. When a GPU-enabled VM Class is used with Photon OS, bootstrap may partially complete but the node reports MemoryPressure and is marked unhealthy. This prevents the Machines from becoming Ready and blocks the TKC from transitioning to a Ready state. Ubuntu is the supported operating system for GPU-enabled TKCs.

Resolution

Deploy GPU-enabled TKCs using Ubuntu instead of Photon.

  1. Edit the TKC manifest.
    kubectl edit tkc <tkc> -n <namespace>
  2. Under metadata.annotations, add the key:
    run.tanzu.vmware.com/os-image: ubuntu
  3. Apply the updated TKC. The Supervisor will select an Ubuntu OSImage based on the specified TKR.
  4. Reconcile the TKC and verify that GPU worker VMs bootstrap successfully, report NodeHealthy=True, and that the TKC transitions to Ready.