VMware Smart Assurance Watch4Net/M&R: Missing historic Events in ElasticSearch
search cancel

VMware Smart Assurance Watch4Net/M&R: Missing historic Events in ElasticSearch

book

Article ID: 345360

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:

Events count in Smarts and Watch4Net show discrepancies.


Environment

VMware Smart Assurance - Watch4Net/M&R

Cause

When Event Processing Manager(EPM) service does the bulk posting of archived events from MySQL to ElasticSearch, the document id for all events in a batch is marked the same. Ideally, the document id for each event should be unique. Due to which the historic events are missing in ElasticSearch Database.

Below is the snippet from EPM logs (in FINEST Log Level) while bulk posting to ES indicating the issue. Document id is the same for all events in a batch.
 

APG_HOME/APG/Event-Processing/Event-Processing-Manager/<EPM-Instance>/logs/

FINE     -- [2021-02-12 12:06:58 IST] -- Wire::wire(): http-outgoing-1 >> "{"index":{"_index":"smarts-1613001600000","_type":"alert","_id":"488502733593714688"}}[\n]"
FINE     -- [2021-02-12 12:06:58 IST] -- Wire::wire(): http-outgoing-1 >> "{"Id":6922125531357870769,"Name":"NOTIFICATION-FileSystem__Performance__HostResources__RedHat_I-FileSystem__Performance__HostResources__RedHat-FS-w4n2.lab.local/1_HighUtilization","OpenedAt":1611682942000,"Source":"INCHARGE-SA-PRES","ClassName":"FileSystem_Performance_HostResources_RedHat","InstanceName":"I-FileSystem_Performance_HostResources_RedHat-FS-w4n2.lab.local/1","EventName":"HighUtilization","ClassDisplayName":"FileSystem","InstanceDisplayName":"FS-w4n2.lab.local/1 [/]","EventDisplayName":"HighUtilization","ElementClassName":"Host","ElementName":"w4n2.lab.local","SourceDomainName":"INCHARGE-AM-PM","SourceEventType":"EVENT","Active":false,"ClosedAt":1613045528000,"Duration":1362586000,"LastChangedAt":1613045828000,"IsRoot":true,"IsProblem":false,"Acknowledged":true,"EventType":"DURABLE","Category":"Resource","EventText":"Indicates that the utilization of the file System exceeds the MaxUtilizationPct.","Severity":2,"Impact":0,"Certainty":100.0,"InMaintenance":false,"TroubleTicketID":"","Owner":"SYSTEM","UpdatedAt":1613110966000,"UserDefined1":"","UserDefined2":"","UserDefined3":"","UserDefined4":"","UserDefined5":"","UserDefined6":"","UserDefined7":"","UserDefined8":"","UserDefined9":"","UserDefined10":"","UserDefined11":"","UserDefined12":"","UserDefined13":"","UserDefined14":"","UserDefined15":"","UserDefined16":"","UserDefined17":"","UserDefined18":"","UserDefined19":"","UserDefined20":""}[\n]"

FINE     -- [2021-02-12 12:06:58 IST] -- Wire::wire(): http-outgoing-1 >> "{"index":{"_index":"smarts-1613001600000","_type":"alert","_id":"488502733593714688"}}[\n]"
FINE     -- [2021-02-12 12:06:58 IST] -- Wire::wire(): http-outgoing-1 >> "{"Id":6928259410293539086,"Name":"NOTIFICATION-Host_Host1_Down","OpenedAt":1613111097000,"Source":"INCHARGE-SA-PRES","ClassName":"Host","InstanceName":"Host1","EventName":"Down","ClassDisplayName":"Host","InstanceDisplayName":"Host1","EventDisplayName":"Down","ElementClassName":"","ElementName":"","SourceDomainName":"INCHARGE-AM-PM","SourceEventType":"UNKNOWN","Active":false,"ClosedAt":1613111097000,"Duration":0,"LastChangedAt":1613111407000,"IsRoot":true,"IsProblem":false,"Acknowledged":true,"EventType":"MOMENTARY","Category":"","EventText":"","Severity":1,"Impact":0,"Certainty":100.0,"InMaintenance":false,"TroubleTicketID":"","Owner":"SYSTEM","UpdatedAt":1613111414000,"UserDefined1":"","UserDefined2":"","UserDefined3":"","UserDefined4":"","UserDefined5":"","UserDefined6":"","UserDefined7":"","UserDefined8":"","UserDefined9":"","UserDefined10":"","UserDefined11":"","UserDefined12":"","UserDefined13":"","UserDefined14":"","UserDefined15":"","UserDefined16":"","UserDefined17":"","UserDefined18":"","UserDefined19":"","UserDefined20":""}[\n]"

Resolution

As of February 2021, this is a known issue up to W4N version 7.0u8. This issue is being considered to be fixed in upcoming Watch4Net releases.



Follow the below steps to resolve this issue:

1) Ensure that the Event Log Processor(ELP) is installed for the desired Event Processing Manager(EPM) Instance
APG_HOME/APG/bin/manage-modules.sh list installed | grep -i event-log | grep -i <EPM instance Name>

Example:
[root@w4n conf]# /opt/APG/bin/manage-modules.sh list installed | grep -i event-log | grep -i smarts
* event-log-processor                 smarts                 : Event-Processing Event-Log-Processor                 v1.7u2    r60124   linux-x64


2) Edit APG_HOME/APG/Event-Processing/Event-Processing-Manager/<EPM_ INSTANCE_NAME>/conf/processing.xml of the EPM to appear in the following way. Set correct EPM_INSTANCE_NAME in config for name="FLOW-ID" & name="EVENT-VALIDATOR".

<processing-element enabled="true" name="FLOW-ID" config="Event-Log-Processor/<EPM_ INSTANCE_NAME>/conf/event-log.xml"
data="ES-WRITER" />

<processing-element name="EVENT-VALIDATOR" config="Event-Processing-Utils/<EPM_ INSTANCE_NAME>/conf/rpe-event-validator.xml" type="EventValidator" data="FLOW-ID" />

 

3) Take a backup of the file APG_HOME/Event-Processing/Event-Log-Processor/<EPM_ INSTANCE_NAME>/conf/event-log.xml and truncate the same. Add below content to it and restart EPM.

-------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!--
* Copyright (c) 2017, EMC Corporation.
* All Rights Reserved.

* This software contains the intellectual property of EMC Corporation
* or is licensed to EMC Corporation from third parties.
* Use of this software and the intellectual property contained therein
* is expressly limited to the terms and conditions of the License
* Agreement under which it is provided by or on behalf of EMC.
-->
<!--
***********************************************************************************************
* This file has been auto-generated from SolutionPack code and should not be edited manually. *
* Any manual changes in this file can potentially be lost. *
* Edit this file only on formal recommendations from EMC. *
***********************************************************************************************
-->
<rules xmlns="http://www.watch4net.com/Events/EventLogProcessor" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.watch4net.com/Events/EventLogProcessor EventLogProcessor.xsd">
<!--
<format name="Unique ID" pattern="{0}" to="@Id" type="message">
<field name="Id" type="LONG" />
</format>
-->
<copy name="CopyID" field="Id" to="@Id" />
<!-- Simply forward events to the data stream without modifying them. -->
<forward stream="data" />
</rules>

-------------------------------------------------------------------

Pick a device, check for closed notifications count in Smarts and match the count in the events report of the same device in the W4N.

Below is the log snippet from EPM logs (in FINEST Log Level) while bulk posting to ES after applying the fix:

The logs now indicates Document id is same as the id from the events_live MySQL table of an event which is always unique for an event.

 

APG_HOME/APG/Event-Processing/Event-Processing-Manager/<EPM-Instance>/logs/

FINE     -- [2021-02-12 12:57:53 IST] -- Wire::wire(): http-outgoing-2 >> "{"index":{"_index":"smarts-1613001600000","_type":"alert","_id":"6928275235198043986"}}[\n]"
FINE     -- [2021-02-12 12:57:53 IST] -- Wire::wire(): http-outgoing-2 >> "{"Id":6928275235198043986,"Name":"NOTIFICATION-Host_Host41_Down","OpenedAt":1613114782000,"Source":"INCHARGE-SA-PRES","ClassName":"Host","InstanceName":"Host41","EventName":"Down","ClassDisplayName":"Host","InstanceDisplayName":"Host41","EventDisplayName":"Down","ElementClassName":"","ElementName":"","SourceDomainName":"INCHARGE-AM-PM","SourceEventType":"UNKNOWN","Active":false,"ClosedAt":1613114782000,"Duration":0,"LastChangedAt":1613114792000,"IsRoot":true,"IsProblem":false,"Acknowledged":false,"EventType":"MOMENTARY","Category":"","EventText":"","Severity":1,"Impact":0,"Certainty":100.0,"InMaintenance":false,"TroubleTicketID":"","Owner":"","UpdatedAt":1613114799000,"UserDefined1":"","UserDefined2":"","UserDefined3":"","UserDefined4":"","UserDefined5":"","UserDefined6":"","UserDefined7":"","UserDefined8":"","UserDefined9":"","UserDefined10":"","UserDefined11":"","UserDefined12":"","UserDefined13":"","UserDefined14":"","UserDefined15":"","UserDefined16":"","UserDefined17":"","UserDefined18":"","UserDefined19":"","UserDefined20":""}[\n]"

FINE     -- [2021-02-12 12:57:53 IST] -- Wire::wire(): http-outgoing-2 >> "{"index":{"_index":"smarts-1613001600000","_type":"alert","_id":"6928275239005354345"}}[\n]"
FINE     -- [2021-02-12 12:57:53 IST] -- Wire::wire(): http-outgoing-2 >> "{"Id":6928275239005354345,"Name":"NOTIFICATION-Host_Host42_Down","OpenedAt":1613114783000,"Source":"INCHARGE-SA-PRES","ClassName":"Host","InstanceName":"Host42","EventName":"Down","ClassDisplayName":"Host","InstanceDisplayName":"Host42","EventDisplayName":"Down","ElementClassName":"","ElementName":"","SourceDomainName":"INCHARGE-AM-PM","SourceEventType":"UNKNOWN","Active":false,"ClosedAt":1613114783000,"Duration":0,"LastChangedAt":1613114793000,"IsRoot":true,"IsProblem":false,"Acknowledged":false,"EventType":"MOMENTARY","Category":"","EventText":"","Severity":1,"Impact":0,"Certainty":100.0,"InMaintenance":false,"TroubleTicketID":"","Owner":"","UpdatedAt":1613114800000,"UserDefined1":"","UserDefined2":"","UserDefined3":"","UserDefined4":"","UserDefined5":"","UserDefined6":"","UserDefined7":"","UserDefined8":"","UserDefined9":"","UserDefined10":"","UserDefined11":"","UserDefined12":"","UserDefined13":"","UserDefined14":"","UserDefined15":"","UserDefined16":"","UserDefined17":"","UserDefined18":"","UserDefined19":"","UserDefined20":""}[\n]"