VCF Automation patching from version 9.0.1 to 9.0.2 fails during synthetic check with error LCMVMSP10035
search cancel

VCF Automation patching from version 9.0.1 to 9.0.2 fails during synthetic check with error LCMVMSP10035

book

Article ID: 438093

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

VCF Automation patching from version 9.0.1 to 9.0.2 fails during the validation phase of synthetic check with the following error code: LCMVMSP10035

 Error Code : 'LCMVMSP10035', Retry : 'true', Causing Properties : '{ CAUSE :: skipSyntheticCheck ===  }'
com.vmware.vrealize.lcm.vmsp.common.exception.ValidateSyntheticCheckerException: Synthetic check failed. Please refer to Broadcom Knowledge Base Article https://knowledge.broadcom.com/external/article/389510 for remediation details.,"platform-statefulsets : 1 of 7 resources are not ok: logging-operator-fluentd: wrong resource state: InProgress - Ready: 0/6;"

The logging-operator-fluentd pods remain in a 1/2 Running state, failing the readiness probe running the following command:
kubectl get pods -n kube-system | grep fluentd

Environment

VCF Automation 9.x

Cause

Fluentd pods fail readiness probes because log buffer accumulation exceeds defined thresholds. Slow log ingestion at the VMware Aria Operations for Logs destination prevents timely flushes, causing the logging-operator-fluentd statefulset to remain in an "InProgress" state during the upgrade's synthetic health check.

Resolution

Note: This KB bypasses the problem and does not fixes it, this issue may repeat due to the slow log ingestion to Aria Operations for logs and log forwarding issues after upgrade should be addressed.

To bypass this issue, following workaround could be used.

Workaround 1: Increase readiness probe of fluentd to allow the pods to transition to a ready state:

  1. SSH to the VCF Automation node.
  2. Check the accumulated buffers for each non-ready Fluentd pod by running the following commands:
    kubectl exec -it -n vmsp-platform logging-operator-fluentd-0 -- bash
    ls -l buffers/ | wc -l

    Note: If the above command returns value above 20000, update the readiness probe to 50000.
  3. Get name of the release template.
    kubectl get rt -n vmsp-platform | grep logging-operator
  4. Edit release template
    kubectl edit rt logging-operator-4.8.0.37 -n vmsp-platform
  5. Change the value of bufferFileNumberMax to 50000 from 10000, save and exit editor with wq!
  6. Delete fluentd pods to reinitialize
    kubectl delete pod -n vmsp-platform logging-operator-fluentd-0
  7. Re-initiate the 9.0.2 patch via VCF Operations Fleet Manager after fluentd pods are healthy, synthetic check will pass.

Workaround 2: Clear the accumulated buffers to allow the pods to transition to a ready state:
Note: Clearing accumulated buffer, old logs will be lost, if acceptable proceed with this workaround.

  1. SSH to the VCF Automation node.
  2. Running following commands will delete buffered(old) logs.
    kubectl exec -it -n vmsp-platform logging-operator-fluentd-0 -- bash -c "find /buffers -type f -delete"
    kubectl delete pod -n vmsp-platform logging-operator-fluentd-0
  3. Re-initiate the 9.0.2 patch via VCF Operations Fleet Manager after fluentd pods are healthy, synthetic check will pass.