NAPP was in degraded state and Health API check command shows
{"name": "druid-historical","readyReplica": 0,"reason": "Back-off restarting failed container;Container image \"projects.registry.vmware.com/nsx_application_platform/clustering/third-party/druid@sha256:e0184c88019ff461652fa1583d68342b88aa3728f5b3992ebc8981c5f9666905\" already present on machine;Readiness probe failed: Get \"https://192.xx.xx.xx:8283/status/health\": dial tcp 192.xx.xx.xx:8283: connect: connection refused;Liveness probe failed: Get \"https://192.xx.xx.xx:8283/status/health\": dial tcp 192.xx.xx.xx:8283: connect: connection refused;Back-off restarting failed container;Container image \"projects.registry.vmware.com/nsx_application_platform/clustering/third-party/druid@sha256:e0184c88019ff461652fa1583d68342b88aa3728f5b3992ebc8981c5f9666905\" already present on machine;Liveness probe failed: Get \"https://192.xx.xx.xx:8283/status/health\": dial tcp 192.xx.xx.xx:8283: connect: connection refused;Readiness probe failed: Get \"https://192.xx.xx.xx:8283/status/health\": dial tcp 192.xx.xx.xx:8283: connect: connection refused;","status": "DOWN","totalReplica": 2
druid historical logs shows :
2024-05-24T10:37:52,671 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13610/27455][active_flow_2023-12-20T01:00:00.000Z_2023-12-20T02:00:00.000Z_2023-12-20T01:00:30.439Z]
2024-05-24T10:37:52,673 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13611/27455][active_flow_2024-03-10T15:00:00.000Z_2024-03-10T16:00:00.000Z_2024-03-10T15:00:45.290Z_2]
2024-05-24T10:37:52,687 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13612/27455][active_flow_2024-01-14T11:00:00.000Z_2024-01-14T12:00:00.000Z_2024-01-14T11:00:15.586Z_1]
2024-05-24T10:37:52,700 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13613/27455][active_flow_2024-01-25T07:00:00.000Z_2024-01-25T08:00:00.000Z_2024-01-25T07:00:15.711Z_2]
2024-05-24T10:37:52,717 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13614/27455][active_flow_2024-04-07T08:00:00.000Z_2024-04-07T09:00:00.000Z_2024-04-07T08:00:15.583Z_3]
2024-05-24T10:37:52,733 INFO [Segment-Load-Startup-0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment[13615/27455][active_flow_2024-01-07T13:00:00.000Z_2024-01-07T14:00:00.000Z_2024-01-07T13:00:30.617Z_2]
AND
2024-05-24T09:43:42,365 INFO [NamespaceExtractionCacheManager-1] org.apache.druid.server.lookup.namespace.JdbcCacheGenerator - Finished loading 40 values (10298 bytes) for [namespace [JdbcExtractionNamespace{connectorConfig=DbConnectorConfig{createTables=true, connectURI='jdbc:postgresql://postgresql-ha-pgpool:5432/pace?ssl=true&usessl=true&sslmode=prefer&socketTimeout=6000&connectTimeout=6000', user='postgres', passwordProvider=org.apache.druid.metadata.DefaultPasswordProvider, dbcpProperties=null}, table='normalizedgroupconfig', keyColumn='managerid', valueColumn='metainfo', tsColumn='null', filter='null', pollPeriod=PT30S, maxHeapPercentage=10}] : org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@73d11bc2] in 37,744,631,597 ns java.lang.OutOfMemoryError: Java heap space Dumping heap to /data/dump/druid/historical ... Unable to create /data/dump/druid/historical: No such file or directory Terminating due to java.lang.OutOfMemoryError: Java heap spaceNSX 3.2 and NAPP 4.0.1
The Java heap space error in the Druid historical pods was caused by the increasing load on each pod, which led to a memory leak.
Please contact Broadcom Support for further assistance