It can be seen in the Enforce console that one or more of your Detection server is stuck in "Starting" state.
Main trace of the restarts in the server logs is these entries from the BoxMonitor.log for all subservices, apparently happening every 16 minutes.
"Jun 17, 2024 3:30:05 PM com.vontu.util.process.ChildProcessProxy destroy
INFO: Process DSD is terminated.
Jun 17, 2024 3:30:05 PM com.vontu.boxmonitor.ProcessWatcher processWentDown
INFO: Process DetectionServerDatabase went down with exit code 1.
Jun 17, 2024 3:30:07 PM com.vontu.boxmonitor.CheckHeartbeatsTask run
SEVERE: (BOXMONITOR.14) Process [DetectionServerDatabase] has not responded for 16 minute(s) 0 ms
Jun 17, 2024 3:30:07 PM com.vontu.logging.LocalLogWriter write
WARNING: Restarted DetectionServerDatabase. DetectionServerDatabase was restarted because it wasn't responding."
"Jun 17, 2024 3:30:07 PM com.vontu.util.process.ChildProcessProxy destroy
INFO: Process AG is terminated.
Jun 17, 2024 3:30:07 PM com.vontu.boxmonitor.ProcessWatcher processWentDown
INFO: Process EndpointServer went down with exit code 1.
Jun 17, 2024 3:30:09 PM com.vontu.boxmonitor.CheckHeartbeatsTask run
SEVERE: (BOXMONITOR.14) Process [EndpointServer] has not responded for 16 minute(s) 0 ms
Jun 17, 2024 3:30:09 PM com.vontu.logging.LocalLogWriter write
WARNING: Restarted EndpointServer. EndpointServer was restarted because it wasn't responding."
"Jun 17, 2024 3:30:09 PM com.vontu.util.process.ChildProcessProxy destroy
INFO: Process IW is terminated.
Jun 17, 2024 3:30:09 PM com.vontu.boxmonitor.ProcessWatcher processWentDown
INFO: Process IncidentWriter went down with exit code 1.
Jun 17, 2024 3:30:11 PM com.vontu.boxmonitor.CheckHeartbeatsTask run
SEVERE: (BOXMONITOR.14) Process [IncidentWriter] has not responded for 16 minute(s) 0 ms
Jun 17, 2024 3:30:11 PM com.vontu.logging.LocalLogWriter write
WARNING: Restarted IncidentWriter. IncidentWriter was restarted because it wasn't responding."
"Jun 17, 2024 3:30:11 PM com.vontu.util.process.ChildProcessProxy destroy
INFO: Process FR is terminated.
Jun 17, 2024 3:30:12 PM com.vontu.boxmonitor.ProcessWatcher processWentDown
INFO: Process FileReader went down with exit code 1.
Jun 17, 2024 3:30:14 PM com.vontu.boxmonitor.CheckHeartbeatsTask run
SEVERE: (BOXMONITOR.14) Process [FileReader] has not responded for 16 minute(s) 0 ms
Jun 17, 2024 3:30:14 PM com.vontu.logging.LocalLogWriter write
WARNING: Restarted FileReader. FileReader was restarted because it wasn't responding."
Logging changes that needs to be applied on the affected detection server, in each of the following logging properties files:
FileReaderLogging.properties, DetectionServerDatabaseLogging.properties, IncidentWriterLogging.properties, MonitorLogging.properties, AggregatorLogging.properties
Change .level = INFO to .level = FINEST
Change java.util.logging.FileHandler.level = INFO to java.util.logging.FileHandler.level = FINEST
Change java.util.logging.FileHandler.limit = 5000000 to java.util.logging.FileHandler.limit = 50000000 and change java.util.logging.FileHandler.count = 8 to java.util.logging.FileHandler.count = 10
On Enforce, click the Recycle link next to the Status on the Detection Server detail page
Wait for 45 minutes, then gather all Enforce and Detection Server logs for us to review
From the logs we can clearly see that BoxMonitor is starting the heartbeat listeners and that the child processes are sending their heartbeat datagrams to BoxMonitor. Some excerpts:
BoxMonitor:
Jun 28, 2024 12:00:52 PM com.vontu.boxmonitor.HeartbeatListener startListener
FINE: Starting heartbeat listener on port 12806.
Jun 28, 2024 12:00:52 PM com.vontu.boxmonitor.HeartbeatListener startListener
FINE: Starting heartbeat listener on port 12805.
Jun 28, 2024 12:00:52 PM com.vontu.boxmonitor.HeartbeatListener startListener
FINE: Starting heartbeat listener on port 12802.
Jun 28, 2024 12:00:52 PM com.vontu.boxmonitor.HeartbeatListener startListener
FINE: Starting heartbeat listener on port 12800.
Aggregator:
28.6.2024 12:34:11 com.vontu.boxmonitor.HeartbeatListener reportLocalHeartbeat
FINER: Sent the heartbeat datagram to port 12805.
What was missing in this case is the following in BoxMonitor logs(taken from healthy server):
DLP 15.8
DLP 16.0
DLP 16.0.1
DLP 16.0.2
From the BoxMonitor perspective, there is no HeartBeat response form the subcomponents, which is why all the child services are getting restarted every 16 minutes:
23/Apr/24:01:38:49:947+0200 [SEVERE] (BOXMONITOR.14) Process [DetectionServerDatabase] has not responded for 16 minute(s) 0 ms
23/Apr/24:01:38:52:088+0200 [SEVERE] (BOXMONITOR.14) Process [EndpointServer] has not responded for 16 minute(s) 0 ms
23/Apr/24:01:38:54:213+0200 [SEVERE] (BOXMONITOR.14) Process [IncidentWriter] has not responded for 16 minute(s) 0 ms
23/Apr/24:01:38:57:150+0200 [SEVERE] (BOXMONITOR.14) Process [FileReader] has not responded for 16 minute(s) 0 ms
Check the presence of UDP traffic to the BoxMonitor process on the loopback adapter. The heartbeat listener binds to ports 12801 - 12806 to listen for the respective child service heartbeat datagrams.
In a healthy environment you should see the following on the Detection server when you run "netstat -aon | findstr :128":
... and in the WireShark capture on the LoopBack adapter.