Private AI Foundation (PAIF) - reconciling zone reservation: the cpu reservation value specified in the config spec )('0') is invalid
search cancel

Private AI Foundation (PAIF) - reconciling zone reservation: the cpu reservation value specified in the config spec )('0') is invalid

book

Article ID: 436503

calendar_today

Updated On:

Products

VCF Private AI Services

Issue/Introduction

Unable to install, activate or deploy Private AI Foundation (PAIF) through PAIF QuickStart.

 

While connected to the vCenter Appliance shell, the following symptoms are observed:

  • Note down the Supervisor ID with the below command which will prompt for administrator credentials:
    dcli> com vmware vcenter namespacemanagement supervisors summary list
    
    items:
       - supervisor: <supervisor-ID>

     

  • The zone bindings list status reports an error message similar to the following for the affected Supervisor intended to run PAIF workloads:
    dcli> com vmware vcenter namespacemanagement supervisors zone bindings list --supervisor <supervisor ID>
    - zone: <cluster-ID>
         marked_for_removal: False
         resource_allocation:
            vm_reservations:
             - reserved_vm_class: <custom vmclass>
                 count: #
    
    messages:
      - severity: INFO
            details:
           error reconciling Zone reservation; the failed operation will be retried: the cpu reservation value specified in the config spec )('0') is invalid 
           type: MANAGEMENT
           status: ERROR

     

  • The <custom vmclass> noted in the above zone bindings list command has 0 CPU Reservation configured.
    • This issue can also occur when memory Reservation is 0:
      error reconciling Zone reservation; the failed operation will be retried: the memory reservation value specified in the config spec )('0') is invalid 


  • You are unable to edit the noted <custom vmclass> to update its configuration because it is currently in use by the Supervisor.

 

Environment

VMware Private AI Foundation (PAIF)

vCenter 9.0.2

Cause

This is caused by product limitation and guardrails to prevent the creation of a PAIF cluster associated with a reserved vmclass that does not have any reservations defined for CPU or MEM.

PAIF does not allow for updating the configuration of a reserved vmclass that is being used by a Supervisor cluster.

Resolution

Configure a reserved vmclass appropriately for use with PAIF.

  1. Create a new reserved vmclass.
    • IMPORTANT: Ensure that each vmclass has an unique resource configuration to avoid duplicate vmclass issues in the environment.
      This is a known bug that is currently being worked on by VMware by Broadcom engineering.

  2. Ensure that there is a corresponding DirectPath Profile for the reserved vmclass under Policies and Profiles in the vSphere Client web UI
  3. Use the same non-zero values for reservations and limits on CPU and MEM.
    • "For a VM class with GPU reservation, enter the same non-zero values for reservation and limit for the required CPU resource and for the required memory resources."

  4. Associate the reserved vmclass with the zone for the Supervisor cluster intended to run PAIF workloads:
    1. Connect into the vCenter appliance for dcli access with administrator credentials.

    2. Retrieve the Supervisor ID:
      dcli> com vmware vcenter namespacemanagement supervisors summary list

       

    3. Note down the zone ID for the specified Supervisor ID:
      dcli> com vmware vcenter namespacemanagement supervisors zones bindings list --supervisor <supervisor-ID>


    4. Update the reserved_vm_class for the Supervisor by its ID and zone ID from previous steps:
      dcli> com vmware vcenter namespacemanagement supervisors zones bindings update --supervisor <supervisor-id> --resource-allocation-vm-reservations '[{"reserved_vm_class": "<reserved vmclass name>", "count": <count #>}]' --zone <zone-id>

      This step is written with the intentions to replace the incorrectly configured vmclass with the newly created vmclass.
      For multiple reserved vmclass entries, see the below example:

      '[{"reserved_vm_class": "<vmclass A>", "count": <count #>}, {"reserved_vm_class": "<vmclass B>", "count": <count #>}]'

       

  5. Confirm that the zone bindings status now shows READY status without any errors:
    dcli> com vmware vcenter namespacemanagement supervisors zone bindings list --supervisor <supervisor ID>

     

  6. If there is no DirectPath Profile associated with the above reserved_vm_class, the below error message will be returned:
    error reconciling zone reservation; the failed operation will be retried: no user-created directpath profile exists for the accelerator device in the VM configSpec

    See Step 1 of this KB.

Additional Information

Private AI Foundation (PAIF) Documentation:

Deploying Private AI Foundation with Nvidia - Configure vGPU based on VM Classes for AI Workloads

Deploying Private AI Foundation with Nvidia - Setting up VCF Automation Organization for VMware Private AI Foundation with Nvidia