WatchWorx.notify only sends a notification the first time the RMI Server process is killed
search cancel

WatchWorx.notify only sends a notification the first time the RMI Server process is killed

book

Article ID: 421562

calendar_today

Updated On:

Products

CA Automic Applications Manager (AM)

Issue/Introduction

When testing WatchWorx.notify script, killing the RmiServer results in a notification being sent and the RmiServer process is restarted by the watchworx process a expected.

However, if the RmiServer process is killed a second time, it is restarted but no notification is sent. 

Note that this unexpected behavior is occurring when taking the specific action of killing (kill -9) the RmiServer process 2 times in a row. If the RmiServer and/or AgentService process goes down by other more natural means, this behavior may or may not occur.

Environment

Applications Manager 9.4+

Cause

Under investigation 

Resolution

More information on the topic of WatchWorx.notify can be found at the documentation topic Sending Process Restart Notifications with WatchWorx.notify

This behavior is still under investigation for why it is occurring and possible fix.

Workaround:

The watchworx process needs to be killed and then restarted (startso watchworx).

It is possible to modify the WatchWorx.notify script to include some code to grep for your specific instance's watchworx process ID, kill it, and then restart it.

Below is an example script to do this. It is recommended that the script is tested in a non Prod environment before implementing into a Prod environment.

#!/bin/bash

#This script notifies Administrator a Applications Manager process went down.
echo "`date`Applications Manager process $1 of type $2. WatchWorx message: $3."| mail -s "<Email Subject line>" <your email.com>

# Define the service name for clarity and easy modification
SERVICE_NAME="watchworx"

echo "---------------------------------------------------"
echo "Attempting to kill and restart '$SERVICE_NAME' for the current instance..."
echo "---------------------------------------------------"

# --- 0. Validate AW_HOME Environment Variable ---
# It's crucial that $AW_HOME is set correctly in the environment
# where this script is executed. This variable uniquely identifies
# the application instance we intend to manage.
if [ -z "$AW_HOME" ]; then
    echo "ERROR: \$AW_HOME environment variable is not set."
    echo "Cannot determine the specific instance's '$SERVICE_NAME' process to target."
    echo "Please ensure \$AW_HOME is set before running this script."
    exit 1
fi

# Construct the full, unique path to the watchworx executable for *this* instance.
# This makes the target precise to the current application instance, preventing
# interference with other instances that might have their own $AW_HOME.
THIS_INSTANCE_WATCHWORX_EXECUTABLE="$AW_HOME/c/$SERVICE_NAME"
echo "Targeting '$SERVICE_NAME' process at path: $THIS_INSTANCE_WATCHWORX_EXECUTABLE"

# --- 1. Find and Kill Existing PID(s) for This Instance ---
# - ps -ef: Lists all running processes in full format.
# - grep "$THIS_INSTANCE_WATCHWORX_EXECUTABLE": Filters for lines containing this exact path.
#   This ensures we only target the watchworx process associated with the current $AW_HOME.
# - grep -v grep: Excludes the 'grep' command itself from the results, avoiding self-matching.
# - awk '{print $2}': Extracts the second column, which is the Process ID (PID), from all matching lines.
PIDS=$(ps -ef | grep "$THIS_INSTANCE_WATCHWORX_EXECUTABLE" | grep -v grep | awk '{print $2}')

# Check if any PIDs were found for this specific instance
if [ -z "$PIDS" ]; then
    echo "No '$SERVICE_NAME' processes found running for this instance ($AW_HOME)."
    echo "Proceeding directly to start it."
else
    echo "Found '$SERVICE_NAME' processes for this instance with PIDs: $PIDS"
    # Iterate over each PID found and kill it individually.
    # While typically one main 'watchworx' process runs per instance,
    # this loop handles cases where multiple processes might spawn from
    # the same executable path for a single instance.
    for PID_TO_KILL in $PIDS; do
        echo "Killing process $PID_TO_KILL with 'kill -9'..."
        # We redirect stderr and stdout to /dev/null for kill, as it might
        # complain if the process is already gone (e.g., if killed by another
        # script or race condition), which isn't an error for our logic.
        kill -9 "$PID_TO_KILL" &amp;> /dev/null

        # Give the system a moment to terminate the process
        sleep 1

        # Verify if the process is truly dead. If it's still there, it's a warning.
        if ps -p "$PID_TO_KILL" &amp;> /dev/null; then
            echo "WARNING: Process $PID_TO_KILL is still running after kill -9. Manual intervention might be required."
        else
            echo "Process $PID_TO_KILL for '$SERVICE_NAME' (instance $AW_HOME) killed successfully."
        fi
    done
    # Give a bit more time after all kills if multiple processes were terminated
    sleep 2
fi

# --- 2. Start the watchworx service for this instance ---
# The 'startso watchworx' command is expected to operate within the context
# of the current environment's $AW_HOME, thereby starting the service
# specifically for this instance.
echo "Starting '$SERVICE_NAME' for this instance using 'startso $SERVICE_NAME'..."
startso "$SERVICE_NAME"

echo "---------------------------------------------------"
echo "'$SERVICE_NAME' restart attempt complete for instance $AW_HOME."
echo "---------------------------------------------------"