It was noticed the Data Aggregator service had stopped working.
A review of the /opt/IMDataAggregator/apache-karaf/date/log/karaf.log file shows the following errors. However, the data repository systems are up and running.
ERROR | st:HOST1 | 2024-07-13T11:06:11,238 | shutdown | ase.heartbeat.DBStateManagerImpl 747 | ommon.core.services.impl | | DB heartbeat to host HOST1 failed. Query attempt took 0:00:20.312
org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](100176) Failed to connect to host HOST1 on port 5433. Reason: Failed to establish a connection to the primary server or any backup address.
ERROR | anager-thread-10 | 2024-07-13T11:06:11,240 | shutdown | ces.shutdown.ShutdownManagerImpl 131 | ommon.core.services.impl | | Shutting down the data aggregator.It was detected that no data repository nodes were contactable. The uncontactable hosts are:[HOST1, HOST2, HOST3]
Version: Any
Component: Data Aggregator
Looking at the /var/log/messages file, we see a lot of the following messages at the same time the issue occurred:
2024-07-13T11:01:46.839673+02:00 HOSTNAME adclient[1928]: WARN <bg-MAIN:ageBindings> dns.resolver DNS server xxx.xxx.xxx.xxx is down. (ErrCode: 62) : Timer expired
2024-07-13T11:01:47.045415+02:00 HOSTNAME adclient[1928]: WARN <bg-MAIN:ageBindings> dns.resolver DNS server xxx.xxx.xxx.xxx is down. (ErrCode: 113) : No route to host
2024-07-13T11:01:48.046754+02:00 HOSTNAME adclient[1928]: WARN <bg-MAIN:ageBindings> dns.resolver DNS server xxx.xxx.xxx.xxx is down. (ErrCode: 62) : Timer expired
2024-07-13T11:01:49.048401+02:00 HOSTNAME adclient[1928]: WARN <bg-MAIN:ageBindings> dns.resolver DNS server xxx.xxx.xxx.xxx is down. (err: timeout during DNS lookup)
2024-07-13T11:01:50.050096+02:00 HOSTNAME adclient[1928]: WARN <bg-MAIN:ageBindings> dns.resolver DNS server xxx.xxx.xxx.xxx is down. (err: timeout during DNS lookup)
None. Issue was with the DNS servers.