Smarts SAM: Global SAM does not synchronize events consistently
search cancel

Smarts SAM: Global SAM does not synchronize events consistently

book

Article ID: 332060

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

Symptoms:


Global SAM does not synchronize events consistently
  • Global SAM (single SAM) with other SAM AGG domains feeding into it
  • The event flow is inconsistent
  • It is common to see discrepancies of hundreds of alerts between a single regional AGG and the number of alarms from that AGG that appear in the Global Console notification log. They can confirm this by connecting to the regional and then to the Global and comparing side-by-side.
  • The only way to resolve this, so far, is to reconfigure at the DXA level. That means a regular reconfigure will not resolve the issue. A forced topology sync will not resolve the issue either.
Examples of AGG being disconnected from PRES
January 22, 2016 10:49:28 PM GMT+00:00 to January 22, 2016 10:59:46 PM GMT+00:00 ~ 10 mins
January 23, 2016 9:42:41 PM GMT+00:00 to January 23, 2016 9:43:06 PM GMT+00:00 ~ 21 secs
January 24, 2016 8:48:54 PM GMT+00:00 to January 24, 2016 8:49:58 PM GMT+00:00 ~ 1 min
January 25, 2016 8:31:49 PM GMT+00:00 to January 25, 2016 8:36:54 PM GMT+00:00 ~ 5 mins
January 25, 2016 8:45:55 PM GMT+00:00 to January 25, 2016 8:46:56 PM GMT+00:00 ~ 1 min


Environment

VMware Smart Assurance - SMARTS

Cause

The root cause is found to be the code that handles the DISCONNECT and a CONNECT between the domain managers. When there is a disconnect there are a set of book keeping operations that will be done.  While this is in progress and not complete we get a CONNECT and the notifications end up in wrong state

Resolution

This issue has been fixed in Smarts Service Assurance Manager 9.4.1 Patch 20. An excerpt from the Smarts 9.4.1 Cumulative Readme, 302-002-926 Rev 01 Released 01/17/2017, Page 27:
IS-5643, IS-5655
77068086, 77047430
Notification synchronization issue between Presentation and Aggregate SAM. During short disconnect between SAM domains, the  hyper notif  driver(s) consistently would try to pull notifications from Aggregate SAM servers which would result in notifications going out of sync between the SAM domains. To avoid this scenario, the  hyper notif  driver(s) are explicitly stopped when DISCONNECT is received for Aggregate SAM server and restarted when a subsequent CONNECT is received. ics/ics-adapters/ics-event-driver.asl
ics/ics-adapters/ics-nl-processing.asl
9.4.1.20


Additional Information

Workaround:
Changing the smoothing interval either via GUI or command line will force a domain sync without having to restart the domain or the event driver
 
--GUI (SAM Console or Global/Web console):

 
--CLI (<INCHARGE-SA/smarts/bin):
dmctl -s INCHARGE-SA put ICS_DomainTypeConfiguration::ICS-DomainType-INCHARGE-AM-PM-SUITE::SmoothingInterval 65

NOTE: The above value can be changed from 65 to 64 and/or reversed.