Connectivity between the AS and DS hosts should always be tested prior to any attempt to resolve this issue via the steps listed below, since the below steps will have little or no effect if connectivity is at issue. One way to test connectivity is to use the openSSL instance installed by NCM on both the AS and DS hosts to directly test the connection from a command line session on each host. This can be accomplished running the following command in a command line session on the AS and DS hosts to establish a direct connect to each other outside of NCM:
openssl s_client -connect {target host ip}:443 -CApath {NCM home path}/conf/CA/
A successful connection will usually yield a fairly verbose result that contains information about the connection request and the certificate validation. Among the various lines returned, a successful connection will contain output similar to the following two lines:
...
CONNECTED(00000003)
...
SSL handshake has read 2726 bytes and written 383 bytes
...
If the connection fails, the last line of the output will usually return a failure code in the form of a number that is non-zero. If this occurs, connectivity troubleshooting should be pursued before proceeding with the below steps. The steps listed below may be unnecessary if lost connectivity can be restored. However, if the connection succeeds, or if the issue persists after connectivity is restored, it is likely that the command files NCM uses to communicate between the AS and the DS may be out legitimately out of sync. Out of sync command files must be cleared from the instance in order to allow an all new set of command files to be created by any new jobs that will be properly syncrhonized and able to flow normally between the AS and DS hosts.
To clear NCM command files from the instance, do as follows:
Instance Wide Preliminary Steps:
- Simultaneously Log into distinct Linux shells for each NCM Device Server (DS) host in the instance as well as the Application Server (AS) and controldb (if on a separate host from the AS) hosts as 'root'.
- Run the following command to set NCM related shell session variables in the AS host shell session as well as all DS host shell sessions opened in Step 1 above:
source /etc/voyence.conf
- Stop NCM services on all Device Server (DS) hosts to prevent receipt and processing of any new snmp traps from devices under management during this process.
/etc/init.d/vcmaster stop
Application Server:
- Log into the NCM Client Application as 'sysadmin'.
- Cancel all currently running jobs from all users, including the system user (ex. pull jobs scheduled in response to receipt of snmp device configuration state change traps) as well as any pending jobs that are scheduled to begin during the maintenance window declared for this process. Note: It is not necessary to cancel recurring series jobs, however, you must verify that no child jobs are scheduled to be created by them during this process.
- Log out of the NCM Client Application.
- Switch to the controldb shell you opened in Step 1 above (which may also be the AS shell if the controldb resides on the AS host) and log into the controldb using the following command (Note: The current controldb password will be required):
su - pgdba -c 'psql voyencedb voyence'
- Run the following query to confirm that there are no jobs are in a running state:
SELECT
status,
count(*)
FROM cm_job
WHERE status LIKE '%running'
GROUP BY status;
If the query returns a number greater than zero (0), run the following query to cancel the remaining jobs that failed to cancel in Step 5 above:
UPDATE cm_job
SET status = 'enum.taskStatus.canceled'
WHERE status LIKE '%running';
- Log out of the controldb using the following command:
\q
- Switch to the AS shell you opened in Step 1 above and run the following command to stop services on the AS host:
/etc/init.d/vcmaster stop
- Run the following commands in sequence in the AS shell to clear all command files residing on the AS host:
cd $VOYENCE_HOME/data/appserver/pops
find . -name "acmd_*xml" -exec rm -f {} \;
find . -name "cmd_*xml" -exec rm -f {} \;
find . -name "status_*" -exec rm -f {} \;
Device Server:
- Switch to each DS shell in turn, then run the following commands in sequence in each DS to clear all command files residing on each DS host (Note: If the AS host is also a DS because the Combination Server option was selected at time of installation, the commands indicated below need to be run on the AS host as well):
cd $VOYENCE_HOME/data/devserver/syssync
find . -name "acmd_*xml" -exec rm -f {} \;
find . -name "cmd_*xml" -exec rm -f {} \;
find . -name "status_*" -exec rm -f {} \;
- Start all NCM Services on all NCM hosts again by running the following command in each shell you opened in Step 1 above - first on the AS host, then on each DS host:
/etc/init.d/vcmaster start
- Confirm that jobs are running again by logging into the NCM Client from your workstation, then running a "Test Credentials" job against one device residing on each device server in your instance.