kubectl get pods -n prelude, you’ll notice that all pods in Aria Automation restarts intermittently"Possible too long JVM pause: ### milliseconds"/var/log/services-logs/prelude/tango-blueprint-service-app/file-logs/tango-blueprint-service-app.log :####-##-####:##:##.#### INFO tango-blueprint host='tango-blueprint-service-app-<service_id>' thread='tcp-disco-srvr-[:47500]-#3%embedded%-#24%embedded%' user='' org='' blueprint='' project='' deployment='' request='' flow='' task='' tile='' resourceName='' operation='' trace='' org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - TCP discovery accepted incoming connection [rmtAddr=/<ip_address>, rmtPort=53653]####-##-####:##:##.#### WARN tango-blueprint [host='tango-blueprint-service-app-<service_id>' thread='jvm-pause-detector-worker' user='' org='' blueprint='' project='' deployment='' request='' flow='' task='' tile='' resourceName='' operation='' trace=''] org.apache.ignite.internal.IgniteKernal%embedded - Possible too long JVM pause: 607 milliseconds.####-##-####:##:##.#### INFO tango-blueprint host='tango-blueprint-service-app-<service_id>' thread='tcp-disco-srvr-[:47500]-#3%embedded%-#24%embedded%' user='' org='' blueprint='' project='' deployment='' request='' flow='' task='' tile='' resourceName='' operation='' trace='' org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi - TCP discovery spawning a new thread for connection [rmtAddr=/<ip_address>, rmtPort=53653]####-##-####:##:##.#### WARN tango-blueprint [host='tango-blueprint-service-app-<service_id>' thread='Notification listener' user='' org='' blueprint='' project='' deployment='' request='' flow='' task='' tile='' resourceName='' operation='' trace=''] com.####.####.####.ProxyConnection - ####Pool-1 - Connection org.postgresql.jdbc.PgConnection@#### marked as broken because of SQLSTATE(08006), ErrorCode(0)org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend./var/log/services-logs/prelude/catalog-service-app/file-logs/catalog-service-app.log####-##-####:##:##.#### WARN catalog-service-app [host='catalog-service-app-<service_id>' thread='jvm-pause-detector-worker' user='' org='' trace=''] o.a.i.internal.IgniteKernal%embedded - Possible too long JVM pause: 638 milliseconds.####-##-####:##:##.#### WARN catalog-service-app [host='catalog-service-app-<service_id>' thread='scheduling-3' user='' org='' trace='###############################'] c.v.s.c.c.r.i.SlowRequestInterceptor - Slow API call GET 'http://<POD_Name>:4242/event-broker/api/runnable/types/catalog-service.runnable/poll/100' with response 200 OK took 1321 ms.####-##-####:##:##.#### WARN catalog-service-app [host='catalog-service-app-<service_id>' thread='jvm-pause-detector-worker' user='' org='' trace=''] o.a.i.internal.IgniteKernal%embedded - Possible too long JVM pause: 1042 milliseconds.
5 milliseconds, which surpasses the maximum allowed latency of 5 ms between each cluster node. For more information, please refer to the system requirementsroot@<AriaAutomationNode01_FQDN> ping <AriaAutomationNode02_FQDN>Pinging <AriaAutomationNode02_FQDN> [##.##.##.##] with 32 bytes of data:Reply from ##.##.##.##: bytes=32 time=15ms TTL=110Reply from ##.##.##.##: bytes=32 time=51ms TTL=110Reply from ##.##.##.##: bytes=32 time=54ms TTL=110Reply from ##.##.##.##: bytes=32 time=39ms TTL=110
Ping statistics for ##.##.##.##: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),Approximate round trip times in milli-seconds: Minimum = 15ms, Maximum = 54ms, Average = 51ms
Aria Automation 8.x
The issue occurs due to high system stun times on Aria Automation nodes when there are insufficient compute resources (CPU or Memory) allocated to the Aria Automation appliances. This leads to intermittent pod restarts and degraded service availability
To resolve the issue, Please address the hardware compute resource (CPU and Memory) limitations within the vSphere cluster hosting the Aria Automation appliances. Ensuring the vSphere cluster has adequate compute resources will prevent pods restarts and maintain stable operation of Aria Automation services