Gathering support bundle for Private AI Services
search cancel

Gathering support bundle for Private AI Services

book

Article ID: 408731

calendar_today

Updated On:

Products

VCF Private AI Services VMware Cloud Foundation

Issue/Introduction

When support requests regarding Private AI Services cannot be resolved by analyzing the reported error messages, Broadcom Support Engineers may ask for a support bundle.

In case of problems with the underlying infrastructure for VMware Private AI Services, you may also be asked to provide a support bundle for the Supervisor Control Plane VMs, or the VKS Guest Cluster VMs, See KB, Gathering logs for VKS
Note that this will require Virtual Infrastructure Administrator access rights

The shell commands in this article are expected to be called from a Bash terminal where kubectl is available and the current KUBECONFIG provides access to the PAIS-related Supervisor namespace for which we want to collect a support bundle.

Environment

VMware Cloud Foundation 9.0 with VMware Private AI Supervisor Service

Resolution

A VMware Private AI Services (PAIS) setup will consist of one Supervisor Namespace running the PAIS Supervisor Service and one or more (tenant) namespaces in which the PAIS service is installed for different users.

The PAIS support bundle is collected per namespace, and the collection runs in a workload (Pod) inside that namespace.

PRIVACY NOTE
If you are a Virtual Infrastructure Administrator and have been asked by a PAIS Tenant Namespace Owner to collect a bundle as described by this KB, make sure that you send the bundle directly to Broadcom Support and don't share it with the Tenant Namespace Owner.
Sharing this bundle with a Tenant Namespace Owner might cause privacy concerns for other tenants and should only happen in full awareness with the privacy policies of your company.

 

Method 1: Using VCF PAIS CLI (Recommended)

The VCF PAIS CLI provides a simplified command-line interface to collect support bundles. This method automates the collection process and handles all the necessary steps.

Prerequisites:

  • The vcf CLI tool must be installed and configured with access to your VCF environment
  • The pais plugin must be installed for the vcf CLI
  • Your KUBECONFIG must be configured to access the target namespace

Steps:

  1. Determine the namespace for which you need to collect a support bundle - a Broadcom Support Engineer can provide help here.

    • If your issue is specific to a tenant namespace, this is your target namespace.

    • If the Broadcom Support Engineer determines that your issue is within the PAIS Supervisor Service controller they will request a bundle for the PAIS Supervisor Service namespace. You must be a [VI Admin] or hold the "Namespace Owner" role for the PAIS Supervisor Service namespace.

  2. Run the support bundle collection command:

    • For the PAIS Supervisor Service namespace:
      vcf pais support collect-bundle
    • For a PAIS tenant namespace:
      vcf pais support collect-bundle --namespace <target-namespace>

    Command options:

    • --namespace or -n: Specify the target namespace (default: auto-detects PAIS Supervisor Service namespace)
    • --output or -o: Specify the output path for the support bundle tarball (default: ./support-bundle-<namespace>-<timestamp>.tgz)
    • --config or -c: Path to a custom config.yml file (default: uses the default config from the plugin)
    • --timeout or -t: Timeout for bundle collection (default: 5 hours)
    • --try-resume: Try to resume from an existing support bundle pod if one exists from a previous run

    Example commands:

    # Collect bundle from a specific tenant namespace
    vcf pais support collect-bundle -n my-pais-tenant-namespace
    
    # Collect bundle with custom output path
    vcf pais support collect-bundle -n pais-system -o ./my-bundle.tgz
    
    # Collect bundle with custom configuration
    vcf pais support collect-bundle -n my-namespace -c ./custom-config.yml
    
    # Collect bundle and resume from existing pod if available
    vcf pais support collect-bundle -n my-namespace --try-resume

     

  3. The CLI will automatically:

    • Verify the namespace exists
    • Determine the correct container image and image pull secret
    • Deploy the support bundle collector pod
    • Wait for the collection to complete
    • Download the support bundle to your local machine
    • Clean up the temporary resources
  4. Upload the collected support bundle. For more information, see, Uploading files to cases on the Broadcom Support Portal.

Method 2: Manual collection using kubectl

If you prefer to use the manual method or if the CLI is not available, follow these steps:

  1. Download and store the pais-support.yml file. It contains the Pod definition and configuration for the support bundle collection process.
    The latest pais-support.yml attachment can be found here: pais-support.yml

  2. Determine the namespace for which you need to collect a support bundle - a Broadcom Support Engineer can provide help here.

    • If your issue is specific to a tenant namespace, this is your target namespace. If you are not sure about the exact name, here is how to get a list of all PAIS tenant namespaces that you have access to.

      kubectl get paisconfigurations -A --no-headers -o custom-columns=":metadata.namespace"
      
    • If the Broadcom Support Engineer determines that your issue is within the PAIS Supervisor Service controller they will request a bundle for the PAIS Supervisor Service namespace. You must be a [VI Admin] or hold the "Namespace Owner" role for the PAIS Supervisor Service namespace. Use the following command to find the namespace name.

      kubectl get ns --no-headers -l "pais.vmware.com/component=supervisor-service" -o custom-columns=":metadata.name"
      
    • Once you have identified the target namespace, store it in a shell variable.

      export PAIS_SUPPORT_NAMESPACE=target-pais-tenant-namespace
      
  3. Determine the container image and imagePullSecret used for the pais-support Pod.

    You will need to find the internal reference to the pais-controller image used by your PAIS setup, as well as the name of the correct ImagePullSecret for the target namespace. Both would have been properly updated during the PAIS installation.

    • If you (being a VI Admin or Namespace Owner) want to collect a bundle for the PAIS Supervisor namespace you will need to inspect the controller-manager deployment to find the image and the name of the ImagePullSecret.

      export PAIS_SUPPORT_IMAGE=$(kubectl get deploy/controller-manager -n "${PAIS_SUPPORT_NAMESPACE}" -o 'jsonpath={.spec.template.spec.containers[0].image}')
      echo "Using PAIS Support Image from PAIS_SUPPORT_IMAGE='${PAIS_SUPPORT_IMAGE}'"
      export PAIS_SUPPORT_IMAGE_PULL_SECRET=$(kubectl get deploy/controller-manager -n "${PAIS_SUPPORT_NAMESPACE}" -o 'jsonpath={.spec.template.spec.imagePullSecrets[0].name}')
      echo "Using PAIS Support ImagePullSecret PAIS_SUPPORT_IMAGE_PULL_SECRET='${PAIS_SUPPORT_IMAGE_PULL_SECRET}'"

       

    • If you (being a Namespace Owner) want to collect a bundle for a PAIS Tenant namespace, you can extract the image and the name of the ImagePullSecret from the support-bundle-config ConfigMap that is applied in that namespace by the PAISConfiguration.

      export PAIS_SUPPORT_IMAGE=$(kubectl -n ${PAIS_SUPPORT_NAMESPACE} get configmap -l "pais.vmware.com/component=support-bundle-config" -o 'jsonpath={.items[0].data.image}')
      echo "Using PAIS Support Image from PAIS_SUPPORT_IMAGE=${PAIS_SUPPORT_IMAGE}"
      export PAIS_SUPPORT_IMAGE_PULL_SECRET=$(kubectl -n ${PAIS_SUPPORT_NAMESPACE} get configmap -l "pais.vmware.com/component=support-bundle-config" -o 'jsonpath={.items[0].data.imagePullSecretName}')
      echo "Using PAIS Support ImagePullSecret PAIS_SUPPORT_IMAGE_PULL_SECRET='${PAIS_SUPPORT_IMAGE_PULL_SECRET}'"
      
  4. Update pais-support.yml with the image and ImagePullSecret from step 3. Inspect the output from step 3. to make sure that PAIS_SUPPORT_IMAGE and PAIS_SUPPORT_IMAGE_PULL_SECRET are set to non-empty image ref and secret name.

    • You could do this manually by:

      • locating the container named pais-support-collect-bundle and updating its image: property to the value of ${PAIS_SUPPORT_IMAGE} from step 3, as in
        ...
          containers:
            - name: pais-support-collect-bundle
              image: ## Copy the value of ${PAIS_SUPPORT_IMAGE}
        ...
        
         
      • locating the pais-support Pod and updating its spec.imagePullSecrets with an entry whose name matches the value of ${PAIS_SUPPORT_IMAGE_PULL_SECRET} from step 3, as in
        ...
        spec:
          imagePullSecrets:
          - name: ## Copy the value of ${PAIS_SUPPORT_IMAGE_PULL_SECRET}
        ...
        
         
    • Or, if you have the Go-based yq run the following

      yq e "select(.kind == \"Pod\") .spec.containers[0].image |= \"${PAIS_SUPPORT_IMAGE}\" | \
            select(.kind == \"Pod\") .spec.imagePullSecrets |= [{ \"name\": \"${PAIS_SUPPORT_IMAGE_PULL_SECRET}\" }]" \
        -i pais-support.yml
      

      Note: This will flatten the config.yml part of the pais-support.yml file so if you want to modify what is being collected you should do it prior to the yq call.

  5. Create the pais-support Pod to start a support bundle collection.

    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" create -f pais-support.yml
    

    If any of the resources in pais-support.yml already exist this means that a previous support bundle collection job was already started and has not been completed or cleaned up.

    • To complete the job follow through the next steps.
    • If you need to clean up an old job, delete the resources as described in step 11 and then repeat the current step.
  6. Ensure the Pod is created and the collection is running. Wait for this command to complete successfully. A timeout of 2 minutes should be enough for collection to start.

    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" wait --for=jsonpath='{.status.phase}'='Running' pod/pais-support --timeout=120s
    

    If this command fails:

    • with Error from server (NotFound): pods "pais-support" not found, then make sure the create command succeeded and try again in a few seconds
    • with error: timed out waiting for the condition on, then you can check for errors in the Pod's status and events. The describe command is useful for this.
      kubectl -n "${PAIS_SUPPORT_NAMESPACE}" describe pod pais-support
      
       
      If necessary, contact a Broadcom Support Engineer and share the output of the describe command for further troubleshooting.
  7. Wait for the collection to complete. When the collection completes the Pod is marked as Ready. This usually takes a few minutes. You have to wait for this condition.

    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" wait --for=condition=Ready pod/pais-support --timeout=30m
    
  8. Copy the collected support bundle from the Pod. The path to the support bundle will be stored as text in the /data/pais-support-bundle-ready file on the Pod. Use it to copy the support bundle from the pais-support Pod to your current folder.

    pais_support_bundle_path=$(kubectl -n "${PAIS_SUPPORT_NAMESPACE}" exec pod/pais-support -- cat /data/pais-support-bundle-ready)
    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" cp "pais-support:${pais_support_bundle_path}" "./$(basename ${pais_support_bundle_path})"
    
  9. The support bundle file is just an archive of text files describing Kubernetes resources and logs. If your company policies demand that, run any necessary tools on the file to inspect or redact it.

  10. Upload the collected support bundle. For more information, see, Uploading files to cases on the Broadcom Support Portal.
  11. Delete the pais-support Pod. After you have copied the support bundle from the pais-support Pod and sent it to Broadcom Support, please clean up all leftover resources.

    kubectl -n "${PAIS_SUPPORT_NAMESPACE}" delete -f pais-support.yml
    

 

 

Additional Information

The latest pais-support.yml attachment can be found here: pais-support.yml

What is collected by the support bundle

The support bundle collects

  • The list of all installed Kubernetes API resources
  • The YAML descriptions of selected Kubernetes resources (as described in the config.yml part inside pais-support.yml) NOTE: Confidential data in secrets is not captured.
  • Pod logs for PAIS pods in the target namespace
  • Statistics and error reports from Indexing jobs
  • VKS cluster API resources and Pod logs

What does pais-support.yml do

The pais-support.yml manifest defines all the necessary Kubernetes resources to run the support bundle collection process within a target namespace. When applied, it creates the following resources:

  • ConfigMap (support-bundle-config): Holds the config.yml file that determines what will be collected. The default config.yml collects non-confidential data and this is all that is usually required by Broadcom Support.
  • ServiceAccount (pais-support-service-account): A dedicated service account for the collection Pod to interact with the Kubernetes API.
  • Role (pais-support-role): Grants the necessary read-only permissions (get, list) for the ServiceAccount to access various Kubernetes resources. The indexing support bundle information requires create and get on pods/exec, as it is collected by an additional command that is executed in the pais-api pod.
  • RoleBinding (pais-support-role-binding): Links the pais-support-role to the pais-support-service-account, applying the defined permissions to the collection Pod.
  • Pod (pais-support): The workload that runs the collection process. It uses the specified ServiceAccount, mounts a temporary volume to store the bundle, and has a long-running readinessProbe that signals completion (Pod becomes Ready) only when the support bundle is successfully created.

References