vapi-endpoint status changed from green to yellow for a short time

Products

VMware vCenter Protect 8.x and 9.x

Issue/Introduction

vapi-endpoint status changed from green to yellow, but quickly recovered back to green.

"VMware vAPI Endpoint Service Health Alarm" might be also triggered.

Environment

vCenter Server 7.X
vCenter Server 8.X

Cause

vapi-endpoint failed to query to hvc service and turned its health status to YELLOW. If a query failure is temporary, its health status would recover to GREEN again.

# /var/log/vmware/vapi/endpoint/endpoint.log

YYYY-MM-DDThh:mm:ss.###Z | INFO  | state-manager1            | CollectedHealthStatusProviderImpl | Computed health status is GREEN.
:::
YYYY-MM-DDThh:mm:ss.###Z | WARN  | state-manager1            | ApiInterfacesFactory           | Retrieving interfaces for service fcc1501a-2b17-4d8d-a41b-94923ad1a184\com.vmware.vcenter.hvc.vapi has failed.
com.vmware.vapi.internal.core.abort.RequestAbortedException: Http request aborted.
        at com.vmware.vapi.internal.protocol.common.Util$1.onAbort(Util.java:105) ~[vapi-runtime.jar:?]
        at com.vmware.vapi.internal.core.abort.AbortHandleImpl.abort(AbortHandleImpl.java:45) ~[vapi-runtime.jar:?]
        at com.vmware.vapi.endpoint.api.TimedApiProvider.lambda$invoke$0(TimedApiProvider.java:58) ~[vapi-endpoint-1.0.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_351]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_351]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_351]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_351]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_351]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_351]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351]
:::
YYYY-MM-DDThh:mm:ss.###Z | INFO  | state-manager1            | CollectedHealthStatusProviderImpl | Computed health status is YELLOW.
:::
YYYY-MM-DDThh:mm:ss.###Z | INFO  | state-manager1            | CollectedHealthStatusProviderImpl | Computed health status is GREEN.

During Full GC, hvc service would be unresponsive and query from vapi-endpoint service would also fail.

We could check Full GC time from /var/log/vmware/hvc/vmware-hvc-gc.log.#.

YYYY-MM-DDThh:mm:ss.###+0000: 56549926.551: [Full GC (Ergonomics) YYYY-MM-DDThh:mm:ss.###+0000: 56549928.467: [SoftReference, 275 refs, 0.0000657 secs]YYYY-MM-DDThh:mm:ss.###+0000: 56549928.467: [WeakReference, 1849 refs, 0.0001588 secs]YYYY-MM-DDThh:mm:ss.###+0000: 56549928.468: [FinalReference, 466 refs, 0.2653113 secs]YYYY-MM-DDThh:mm:ss.###+0000: 56549928.733: [PhantomReference, 1 refs, 14 refs, 0.0000222 secs]YYYY-MM-DDThh:mm:ss.###+0000: 56549928.733: [JNI Weak Reference, 0.0000248 secs][PSYoungGen: 704K->0K(3584K)] [ParOldGen: 35965K->25869K(36864K)] 36669K->25869K(40448K), [Metaspace: 76553K->76270K(92160K)], 42.7521333 secs] [Times: user=0.15 sys=0.47, real=42.75 secs]

Resolution

Full GC of hvc service isn't avoidable, but there are two ways to mitigate this issue.

# Option 1

If current hvc service is running for a long time and Full GC happens frequently, restarting hvc service will reset heap allocation and reduce Full GC frequency.

service-control --stop hvc; service-control --start hvc

# Option 2

If Full GC happens frequently even after restarting hvc service, increasing heap allocation for hvc service could be effective.

Manually increase the heap memory