analytics status changed from green to red on vCenter server appliance

Products

VMware vCenter Server

Issue/Introduction

The analytics service status changed to red. It returned to green after 30 minutes.

[2025-05-10T13:20:34.315271Z] [vim.event.HealthStatusChangedEvent] [info] [Vmonuser] [] [768532] [analytics status changed from green to red]
[2025-05-10T13:51:46.918099Z] [vim.event.HealthStatusChangedEvent] [info] [Vmonuser] [] [768609] [analytics status changed from red to green]

Environment

vCenter Server Appliance 8

Cause

A timeout occurred during the health check of the vmon service, causing a restart.

vmon.log
2025-05-10T13:20:01.238Z In(05) host-2324 <analytics> Running the API Health command as user analytics
...
2025-05-10T13:20:33.825Z Wa(03) host-2324 <analytics> Service api-health command's stderr: Exception while retrieving health xml from url http://localhost:15080/analytics/healthstatus. Exception: timed out
...
2025-05-10T13:20:33.847Z Wa(03) host-2324 <analytics> Health of service failed. Health data: {"localizable_msgs": [{"id": "com.vmware.vmon.svc_health_fail", "default_message": "Failed to retrieve service health.", "args": []}], "_service_name": "analytics", "_trigger_threaddump_on_failure": 0}
...
2025-05-10T13:20:33.847Z In(05) host-2324 <event-pub> Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonEventPublisher.py --eventdata analytics,UNHEALTHY,HEALTHY,1
...
2025-05-10T13:51:25.592Z Wa(03) host-2324 <analytics> Service api-health command's stderr: Exception while retrieving health xml from url http://localhost:15080/analytics/healthstatus. Exception: timed out
2025-05-10T13:51:25.615Z Wa(03) host-2324 <analytics> Health of service failed. Health data: {"localizable_msgs": [{"id": "com.vmware.vmon.svc_health_fail", "default_message": "Failed to retrieve service health.", "args": []}], "_service_name": "analytics", "_trigger_threaddump_on_failure": 0}
2025-05-10T13:51:25.615Z In(05) host-2324 <analytics> Recover from service api health check failure. Fail count 1
2025-05-10T13:51:25.615Z In(05) host-2324 <analytics> Restarting service.
2025-05-10T13:51:25.615Z Wa(03) host-2324 <analytics> Found empty StopSignal parameter in config file. Defaulting to SIGTERM
2025-05-10T13:51:27.880Z Wa(03) host-2324 <analytics> Service exited. Exit code 143
2025-05-10T13:51:27.880Z In(05) host-2324 <analytics-prestart> Constructed command: /usr/lib/vmware-analytics/scripts/pre-start.sh /usr/sbin/cloudvm-ram-size -J vmware-analytics -O /storage/vmware-vmon/analytics.start.cmd
2025-05-10T13:51:28.154Z In(05) host-2324 <analytics> Service pre-start command completed successfully.At that time, the analytics service had Out of Memory in Java Heap.

-At that time, the analytics service was running out of Java Heap memory.

analytics.log

2025-05-10T12:58:47.921Z phProdLogDrainerTaskExecutor-4 WARN org.bouncycastle.jsse.provider.ProvTrustManagerFactorySpi Skipped default trust store java.lang.OutOfMemoryError: Java heap space

analytics-runtime.log.stderr

Picked up JAVA_TOOL_OPTIONS: -Xms32M -Xmx128M -Dcom.sun.org.apache.xml.internal.security.ignoreLineBreaks=true -Dorg.apache.xml.security.ignoreLineBreaks=true
Exception in thread "phProdLogDrainerTaskExecutor-1" java.lang.OutOfMemoryError: Java heap space

- Cause of Java Heap out of Memory

Connection errors to vcsa.vmware.com continued to occur, which caused Java heap memory usage to gradually increase.

analytics.log

2025-05-10T13:04:08.042Z phProdLogDrainerTaskExecutor-7 ERROR ph.phservice.push.telemetry.DefaultTelemetryLevelService Unexpected error during telemetry level retrieval for CollectorAgent: {collectorId:vcenter-all.vpxd.vclscrx.8_0u3, collectorInstanceId:ph-vpxd-ac645c35-eebc-4983-9969-ee2066e50251} java.util.concurrent.CompletionException: com.vmware.ph.phservice.common.manifest.ManifestContentProvider$ManifestException: com.vmware.ph.client.api.exceptions.PhClientConnectionException: java.net.UnknownHostException: vcsa.vmware.com: Temporary failure in name resolution

Resolution

Check the connectivity to vcsa.vmware.com on vCenter Server Appliance.

1.Connect to the vCenter Server appliance through SSH or console.
2.Test the connectivity using command:
# curl -v https://vcsa.vmware.com

If vCenter server is in an environment where it can not connect to the Internet, please check the CEIP settings and disable it.