Logmon alerting delay request
search cancel

Logmon alerting delay request

book

Article ID: 418277

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We have set up a logmon probe to interrogate a log and I am keying in on a service state changes of Inactive and Active.  My work mates want a 5 minute delay factored in.  If the service reports an inactive state, they want to wait 5 minutes before alerting in case it goes back into the active state.  That is proving tricky.  We have tried working with the Variables in the probe and have tried working the issue through the NAS, but it is not working.  

There are actually 2 scenarios that could occur.

One is that the service completely stops on the primary cluster and that needs to be reported, the other case is that the service could appear to be down for a few seconds during a database backup so they want the 5-minute delay in the alerting for this case.

Environment

  • DX UIM 23.4 CU2

Cause

  • logmon configuration guidance

Resolution

Custom solution provided via logmon probe to read the log extract for testing, and generate an alarm after a configurable time period based on state changes.

The logmon profile parses the log for service state, e.g., 'Inactive' versus 'Active'

A particular dept. requested a 5-minute delay factored in before the alarm is generated.

If the service reports an Inactive state, they want to wait 5 minutes before alerting in case it goes back into the Active state. 

This solution addresses two case scenarios:

  1. The service completely stops on the primary cluster and that needs to be reported
  2. The service could appear to be down for a few seconds during a database backup so they want the 5-minute delay before alerting in this case.

If the service goes down, (becomes INACTIVE) or there is an sql db backup and the service goes down for only a few seconds, and it's still down, and there is no clear alarm, then after

log extract example:

'<service_name> changed state to INACTIVE

'<service_name>' changed state to ACTIVE

logmon: 2  watcher profiles:

  • '<service_name>' Inactive
  • '<service_name>' Active

Additional Information

Script/files attached. Store them in the logmon folder.

updatestatus.bat:

@"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -Command "Add-Content -Path 'current_state.txt' -Value ('%~1-' + [int][double]::Parse((Get-Date -UFormat %%s)))"

validate.bat:

@echo off
setlocal enabledelayedexpansion
 
set "file=current_state.txt"
set "lastLine="
set "prevLine="
 
:: Get the current Unix timestamp (seconds since epoch)
for /f %%T in ('C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -nologo -noprofile -command "[int][double]::Parse((Get-Date -UFormat %%s))"') do (
    set "curr_time=%%T"
)
 
:: If the state file doesn't exist, assume service is active
if not exist "%file%" (
    echo Service is active
    goto :eof
)
 
:: Read the last two non-empty lines from the file
for /f "usebackq delims=" %%A in (`C:\Windows\System32\findstr.exe /r /v "^$" "%file%"`) do (
    set "prevLine=!lastLine!"
    set "lastLine=%%A"
)
 
:: If the file is empty or has only one line, assume service is active
if not defined lastLine (
    echo Service is active
    goto :eof
)
 
:: Parse the last line into status, timestamp, and tag
for /f "tokens=1,2,3 delims=-" %%A in ("!lastLine!") do (
    set "lastStatus=%%A"
    set "lastTime=%%B"
    set "lastTag=%%C"
)
 
:: If the last status is "active"
if /i "%lastStatus%"=="active" (
    :: If already validated, exit silently
    if /i "%lastTag%"=="alarm_cleared" (
        exit /b
    ) else (
        :: Otherwise, echo active and mark it as validated
        echo service is active
        goto :appendValidated
    )
)
 
:: If the last status is "inactive"
if /i "%lastStatus%"=="inactive" (
:: Calculate how long the service has been inactive
set /a diff=!curr_time! - !lastTime!
:: If inactive for more than 305 seconds, raise an alarm
if !diff! gtr 305 (
echo Service is in-active, raise an alarm
)
)
 
goto :eof
 
:appendValidated
:: Append a validated tag to the last line and write it to the file
set "newLine=%lastStatus%-%lastTime%-alarm_cleared"
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command "Add-Content -Path \"current_state.txt\" -Value \"%newLine%\""
goto :eof

To test you can set the seconds lower, e.g., 125.


 

 

Attachments

logmon_example.cfg get_app
update_status.bat get_app
validate.bat get_app