Smarts SAM: Failover Manager Troubleshooting; How to Resolve Domain Unknown Status;
search cancel

Smarts SAM: Failover Manager Troubleshooting; How to Resolve Domain Unknown Status;

book

Article ID: 332132

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

Symptoms:


This document is to provide information to Troubleshoot the Failover Manager for these two scenarios:  

1.   When you are adding an Active and Standby Domain pair to the Failover.conf and the Failover Manager environment 
2.   When both the Active and Backup Smarts of a specific domain are down at the same time and and Failover Manager is not able to register the domain properly after the Active and Backup domains are brought back up. 

Failover Manager reports Unknown Status and Role on newly added Domains to Failover Manager, or when the Active and Backup pair are down at the same time.


Example 

Status of Monitored Entities: 
|--------- Process ---------|- P-Status -|----- Role -----|----------- Host -----------|- H-Status -|
TEST-REMOTESITES-APM-A ....  UNKNOWN..... UNKNOWN......... Host99.................... UP..........
TEST-REMOTESITES-APM-B ....  UNKNOWN..... UNKNOWN......... Host98.................... UP..........
TEST-APM-A ........  UP.......... ACTIVE.......... Host99.................... UP..........
TEST-APM-B ........  UP.......... STANDBY......... Host98.................... UP..........
TEST-APOI-A .......  UP.......... ACTIVE.......... Host99.................... UP..........
TEST-APOI-B .......  DOWN........ STANDBY......... Host98.................... UP..........
TEST-SAM-A ........  UP.......... ACTIVE.......... Host99.................... UP..........
TEST-SAM-B ........  DOWN........ STANDBY......... Host98.................... UP..........

Environment

VMware Smart Assurance - SMARTS

Cause

The Failover Manager requires that all the domain pairs (Site A and B)  that are listed in the failover.conf file be up and running prior to start up or restart of the Failover Manager to properly determine the Active and Standby Domains.  This applies if both Site A and B are down.

Resolution

The steps to take when both the Active and Standby for a specific domain are not running at the time the Failover Manager is started or restarted:

1.   Take actions to make sure that both Site A and Site B for the domain are up and running.  

2.   Stop Failover Manager service, and confirm that the Failover Manager service has stopped.

3.   Rename the Failover Manager rps.  

4.   Restart Failover Manager domain.

The  Failover Manager should be able to connect and determine the status of the Active and Backup Servers for the Domain set.

If this does not resolve the issue then we recommend reviewing the Failover Actions log, and the Failover Manager domain log.  These logs should indicate what the issue may be.  Check the failover.conf for the proper Host Name, Domain Server name and port, and compare the Domain Service startup commands to make sure that you have the correct settings.  Check the ssh settings and the log files for any Host Key Verification Failures.  Check network or firewall settings to determine if the issue is causing communication failures.