Gathering support bundle for Private AI Services
search cancel

Gathering support bundle for Private AI Services

book

Article ID: 408731

calendar_today

Updated On:

Products

VMware Private AI Foundation VMware Cloud Foundation

Issue/Introduction

When support requests regarding Private AI Services cannot be resolved by analyzing the reported error messages, Broadcom Support Engineers may ask for a support bundle.

In case of problems with the underlying infrastructure for VMware Private AI Services, you may also be asked to provide a support bundle for the Supervisor Control Plane VMs, or the VKS Guest Cluster VMs, See KB, Gathering logs for VKS
Note that this will require Virtual Infrastructure Administrator access rights

The shell commands in this article are expected to be called from a Bash terminal where kubectl is available and the current KUBECONFIG provides access to the PAIS-related Supervisor namespace for which we want to collect a support bundle.

Environment

VMware Cloud Foundation 9.0 with VMware Private AI Supervisor Service

Resolution

A VMware Private AI Services (PAIS) setup will consist of one Supervisor Namespace running the PAIS Supervisor Service and one or more (tenant) namespaces in which the PAIS service is installed for different users.

The PAIS support bundle is collected per namespace, and the collection runs in a workload (Pod) inside that namespace.

PRIVACY NOTE
If you are a Virtual Infrastructure Administrator and have been asked by a PAIS Tenant Namespace Owner to collect a bundle as described by this KB, make sure that you send the bundle directly to Broadcom Support and don't share it with the Tenant Namespace Owner.
Sharing this bundle with a Tenant Namespace Owner might cause privacy concerns for other tenants and should only happen in full awareness with the privacy policies of your company.

To collect a PAIS support bundle

  1. Download and store the pais-support.yml file attached to this article. It contains the Pod definition and configuration for the support bundle collection process.
  2. Determine the namespace for which you need to collect a support bundle - a Broadcom Support Engineer can provide help here.

    • If your issue is specific to a tenant namespace, this is your target namespace. If you are not sure about the exact name, here is how to get a list of all PAIS tenant namespaces that you have access to.
      kubectl get paisconfigurations -A --no-headers -o custom-columns=":metadata.namespace"
    • If the Broadcom Support Engineer determines that your issue is within the PAIS Supervisor Service controller they will request a bundle for the PAIS Supervisor Service namespace. You must be a Virtual Infrastructure Administrator or hold the "Namespace Owner" role for the PAIS Supervisor Service namespace. Use the following command to find the namespace name.
      kubectl get ns --no-headers -l "pais.vmware.com/component=supervisor-service" -o custom-columns=":metadata.name"
    • Once you have identified the target namespace, store it in a shell variable.
      export PAIS_SUPPORT_NAMESPACE=target-pais-tenant-namespace
  3. Determine the container image used for the pais-support Pod.  This could vary depending on whether you have an air-gapped setup.
    Note that the pais-support.yml file attached to this KB article uses the Internet-accessible image reference to the latest released version of PAIS.

    • If you don't have an air-gapped setup, you don't need to do anything - just use pais-support.yml as-is and the image will be fetched on demand

    • Otherwise, you will need to find the internal reference to the pais-controller image used by your PAIS setup, as well as the name of the correct ImagePullSecret for the target namespace.
      Both would have been properly updated during the PAIS installation.

    • If you are a Virtual Infrastructure Administrator or Namespace Owner of the PAIS Supervisor namespace, you need to inspect the controller-manager deployment to find the image and the name of the ImagePullSecret.
      pais_supervisor_namespace=$(kubectl get ns --no-headers -l "pais.vmware.com/component=supervisor-service" -o custom-columns=":metadata.name")
      echo "Looking into PAIS Supervisor Namespace '${pais_supervisor_namespace}'"
      
      export PAIS_SUPPORT_IMAGE=$(kubectl get deploy/controller-manager -n "${pais_supervisor_namespace}" -o 'jsonpath={.spec.template.spec.containers[0].image}')
      echo "Using PAIS Support Image from '${PAIS_SUPPORT_IMAGE}'"
      export PAIS_SUPPORT_IMAGE_PULL_SECRET=$(kubectl get deploy/controller-manager -n "${pais_supervisor_namespace}" -o 'jsonpath={.spec.template.spec.imagePullSecrets[0].name}')
      echo "Using PAIS Support ImagePullSecret '${PAIS_SUPPORT_IMAGE_PULL_SECRET}'"
    • If you are a Namespace Owner for a PAIS Tenant namespace, you can extract the image and the name of the ImagePullSecret from the support-bundle-config ConfigMap that is applied in that namespace by the PAISConfiguration.
      export PAIS_SUPPORT_IMAGE=$(kubectl -n ${PAIS_SUPPORT_NAMESPACE} get configmap -l "pais.vmware.com/component=support-bundle-config" -o 'jsonpath={.items[0].data.image}')
      echo "Using PAIS Support Image from ${PAIS_SUPPORT_IMAGE}"
      export PAIS_SUPPORT_IMAGE_PULL_SECRET=$(kubectl -n ${PAIS_SUPPORT_NAMESPACE} get configmap -l "pais.vmware.com/component=support-bundle-config" -o 'jsonpath={.items[0].data.imagePullSecretName}')
      echo "Using PAIS Support ImagePullSecret '${PAIS_SUPPORT_IMAGE_PULL_SECRET}'"
  4. Update pais-support.yml with the image and ImagePullSecret from step 3. (if necessary).

    • You could do this manually by:

      • Locating the container named pais-support-collect-bundle and updating its image: property to the value of ${PAIS_SUPPORT_IMAGE} from step 3, as in
        ...
          containers:
            - name: pais-support-collect-bundle
              image: ## Copy the value of ${PAIS_SUPPORT_IMAGE}
        ...
      • Locating the pais-support Pod and updating its spec.imagePullSecrets with an entry whose name matches the value of ${PAIS_SUPPORT_IMAGE_PULL_SECRET} from step 3, as in
        ...
        spec:
          imagePullSecrets:
          - name: ## Copy the value of ${PAIS_SUPPORT_IMAGE_PULL_SECRET}
        ...
    • Or, if you have the Go-based yq run the following
      yq e "select(.kind == \"Pod\") .spec.containers[0].image = \"${PAIS_SUPPORT_IMAGE}\" | \
            select(.kind == \"Pod\") .spec.imagePullSecrets = [{ \"name\": \"${PAIS_SUPPORT_IMAGE_PULL_SECRET}\" }]" \
        -i pais-support.yml
  5. Create the pais-support Pod to start a support bundle collection.
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" create -f pais-support.yml

    If any of the resources in `pais-support.yml` already exist, this means that a previous support bundle collection job was already started and has not been completed or cleaned up.

    • To complete the job, follow through the next steps.
    • If you need to clean up an old job, delete the resources as described in step 11 and then repeat the current step.

  6. Ensure the Pod is created, and the collection is running.
    Wait for this command to complete successfully. A timeout of 2 minutes should be enough for collection to start.
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" wait --for=jsonpath='{.status.phase}'='Running' pod/pais-support --timeout=120s

    If this command fails:

    • With Error from server (NotFound): pods "pais-support" not found, then make sure the create command succeeded and try again in a few seconds.
    • with error: timed out waiting for the condition on, then you can check for errors in the Pod's status and events. The describe command is useful for this.
      kubectl -n "${PAIS_SUPPORT_NAMESPACE}" describe pod pais-support

      If necessary, contact a Broadcom Support Engineer and share the output of the describe command for further troubleshooting.

  7. Wait for the collection to complete. When the collection completes, the Pod is marked as Ready. This usually takes a few minutes. You have to wait for this condition.
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" wait --for=condition=Ready pod/pais-support --timeout=30m
  8. Copy the collected support bundle from the Pod. The path to the support bundle will be stored as text in the /data/pais-support-bundle-ready file on the Pod. Use it to copy the support bundle from the pais-support Pod to your current folder.
    pais_support_bundle_path=$(kubectl -n "${PAIS_SUPPORT_NAMESPACE}" exec pod/pais-support -- cat /data/pais-support-bundle-ready)
    
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" cp "pais-support:${pais_support_bundle_path}" "./$(basename ${pais_support_bundle_path})"
  9. The support bundle file is just an archive of text files describing Kubernetes resources and logs. If your company policies demand that, run any necessary tools on the file to inspect or redact it.
  10. Upload the collected support bundle. For upload instructions, see KB, Uploading files to cases on the Broadcom Support Portal
  11. Delete the pais-support Pod. After you have copied the support bundle from the pais-support Pod and sent it to Broadcom Support, please clean up all leftover resources.
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" delete -f pais-support.yml

Additional Information

What is collected by the support bundle

The support bundle collects

  • The list of all installed Kubernetes API resources
  • The YAML descriptions of selected Kubernetes resources (as described in the config.yml part inside pais-support.yml) NOTE: Confidential data in secrets is not captured.
  • Pod logs for PAIS pods in the target namespace
  • Statistics and error reports from Indexing jobs
  • VKS cluster API resources and Pod logs

What does pais-support.yml do

The pais-support.yml manifest defines all the necessary Kubernetes resources to run the support bundle collection process within a target namespace. When applied, it creates the following resources:

  • ConfigMap (support-bundle-config): Holds the config.yml file that determines what will be collected. The default config.yml collects non-confidential data and this is all that is usually required by Broadcom Support.
  • ServiceAccount (pais-support-service-account): A dedicated service account for the collection Pod to interact with the Kubernetes API.
  • Role (pais-support-role): Grants the necessary read-only permissions (get, list) for the ServiceAccount to access various Kubernetes resources.
  • RoleBinding (pais-support-role-binding): Links the pais-support-role to the pais-support-service-account, applying the defined permissions to the collection Pod.
  • Pod (pais-support): The workload that runs the collection process. It uses the specified ServiceAccount, mounts a temporary volume to store the bundle, and has a long-running readinessProbe that signals completion (Pod becomes Ready) only when the support bundle is successfully created.

Attachments

pais-support.yml get_app