Tanzu Hub Foundation Vending failing on Extract Files step due to OOM condition on graphql-rest-provider Deployment
search cancel

Tanzu Hub Foundation Vending failing on Extract Files step due to OOM condition on graphql-rest-provider Deployment

book

Article ID: 439871

calendar_today

Updated On:

Products

VMware Tanzu Platform - Hub

Issue/Introduction

  • When attempting to create a new Foundation from Tanzu Hub 10.4, you see the "Extract files from a downloaded file into remote storage" retried until failure:



  • The Tanzu Hub is deployed in Evaluation Mode.
  • From an Bosh SSH to the Registry VM, you see the graphql-rest-provider-service pods show restarts:

    bosh -d hub-<ID> ssh registry/0
    kubectl get pods -n tanzusm | grep graphql-rest-provider-service

    Example:

    registry:~$ kubectl get pods -n tanzusm | grep rest-provider-service
    graphql-rest-provider-service-<POD_ID>   1/1     Running     5          82m
    graphql-rest-provider-service-<POD_ID>   1/1     Running     9          82m


  • The graphql-rest-provider-service pod logs show errors like:

    16:56:04.988Z [thread='task-73' user='' org='' trace=''] ERROR com.vmware.ensemble.rest.document.service.DocumentFileService - Error while uploading file for documentId and path = ########-####-####-####-2f3b46aa887e, ########-####-####-####-07d16a40aa6c/BUILD_ARTIFACTS/########-####-####-####-2f3b46aa887e. Error: Failed to upload file to S3


Environment

Tanzu Hub 10.4 release running in Evaluation Mode.

Cause

The graphql-rest-provider-service deployment is configured with limited Memory per pod when deployed in Evaluation Mode. Extraction failures may occur during foundation vending operations due to the way the files are buffered into the S3 storage location when using Evaluation Mode. 

Resolution

Improvements have been made to the memory buffering in the graphql-rest-provider-service application in the second patch release of Tanzu Hub 10.4.2

 

 

Workaround

  1. Pause the graphql-rest-provider-service package install:

    # kctrl package installed list -n tanzusm
    # kctrl package installed pause -i sm -n tanzusm 
    # kctrl package installed pause -i ensemble-helm -n tanzusm 

  2. Once paused, edit the graphql-rest-provider-service deployment, decrease the JVM_HEAP to 2g and set '-XX:InitiatingHeapOccupancyPercent=45' to temporarily allow foundation vending:

    # kubectl edit deploy -n tanzusm graphql-rest-provider-service

    Find the spec.containers.env.JVM_HEAP and JVM_OPTS section, edit value from 3g to 2g and -XX:InitiatingHeapOccupancyPercent to 45:

    containers:
    - env:
      - name: JVM_HEAP
        value: 3g                     #--------> Decrease to 2G
      - name: SSL_KEY_PASSWORD

      - name: JAVA_OPTS
        value: -XX:+UseG1GC -XX:+PrintGCDetails -XX:G1HeapRegionSize 2m -XX:+UseStringDeduplication
          -XX:InitiatingHeapOccupancyPercent=65               #--------> Change to 45

  3. Once the new pods are deployed for the graphql-rest-provider-service, attempt the Foundation Vending again.