Customer noticed that the "alert state" of high which concerned them because they hadn't noticed it before. The replication jobs had a high number of failed objects,
Alert State: High
In the database is a table called Evt_NS_Hierarchy_Alert that keeps track of the alert status for each replication job. It tracks each replication job based upon SourceNS guid.
When the hierarchy topology view opens in the console, the spGetServerAlertStat runs for each node listed in the view.
DECLARE @AlertState__auto AS int;
EXECUTE spGetServerAlertState @ServerGuid='ca3aa222-f786-48bf-b93e-fa1940165b47', @[email protected]__auto OUTPUT
SELECT MAX(AlertState) FROM Evt_NS_Hierarchy_Alert ha
INNER JOIN HierarchyNode hn ON ((ha.[DestinationNS] = hn.[ParentGuid] AND ha.[SourceNS] = hn.[ChildGuid]) OR
(ha.[DestinationNS] = hn.[ChildGuid] AND ha.[SourceNS] = hn.[ParentGuid]))
WHERE (ha.[SourceNS] = 'ca3aa222-f786-48bf-b93e-fa1940165b47') AND
(ha.[Latest] = 1) AND
(ha.[ValidUntilDate] >= GETUTCDATE())
spGetServerAlertState pulls the MAX Alert state from all replication jobs that ran within the last 24 hours.(ValidUntilDate is always exactly 24 hours from the time the alert was created in the Evt_NS_Hierarchy_Alert table)
This max alert state is what is displayed by the indicator in the console.
There are 4 alert states Low, medium, high and critical.
/// The alert state is unknown.
Unknown = 0,
/// Low-priority information that might not be important to the user. For example,
/// replication has been completed between two servers in a Hierarchy structure.
AlertLow = 1,
/// Medium-priority information that does not need to be conveyed to the user
/// immediately. For example, several items could not be replicated between two
/// servers in a Hierarchy structure. 1% or less of items could not be replicated
AlertMedium = 2,
/// Important information that should be conveyed to the user as soon as possible.
/// For example, 10% or more of items could not be replicated between two servers
/// in a Hierarchy structure.
AlertHigh = 3,
/// Critical information that should be conveyed to the user immediately. For
/// example a link could not be established between two servers within a Hierarchy
AlertCritical = 4
The status alerts for medium and high are calculated based upon the following code.
failedReplicationPercentage = (status.FailedReplicationCount * 100) / status.TotalReplicationCount;
if (failedReplicationPercentage >= 10) HierarchyAlertState.AlertHigh,
if (failedReplicationPercentage >= 1) HierarchyAlertState.AlertMedium,
The problem is,under most circumstances, that a high alert state is always going to be displayed because the ratio of failed items to the count of items replicated is always going to be high with a differential job and a differential job is most often always ran daily. The failures occur a majority of the time due the items not supporting replication. Either the item itself doesn't support replication, its class guid doesn't support replication, or the item's product doesn't support replication.
Examples of products that don't support replication are CMDB, Workflow, and Altiris Connector.
Items that are deleted but still have an entry in the ReplicationItemDependencyCache table will continue to attempt to replicate. Another scenario that adds to the list of failed replication items are Patch Management files. The Patch Management Import Data Replication for Windows replication rule replicates down the configured updates. The dependencies associated with these updates, including the associated file guids, get inserted into the ReplicationItemDependencyCache table but these file guids are configured as non-replicable because the files are actually pulled down as part of the PMImport process on the child SMPs.
Whenever replication runs, whether Patch Management updates need to replicate or not, their dependencies will attempt to replicate but will fail because of their non-replicable attributes.
An etrack 3218123 was logged against this issue because if we are going to provide an alert status indicator then there needs to be a way for the customer to decipher the data and remediate if necessary. Otherwise, this will just cause unnecessary support calls. Some work has already been done around this in 7.5.
ITMS 7.1 SP2
ITMS 7.1 SP2 MP1