Message Gateway probe stops processing for no reason

book

Article ID: 140166

calendar_today

Updated On:

Products

DX Infrastructure Management NIMSOFT PROBES

Issue/Introduction

What we noticed in several different Customer environments, is that there were messages that were not being sent.

Adding some of logs from message gateway, related to the occurrence. For one environment, there are errors reported:

Oct 25 09:41:03:386 [/generic_rundeck, messagegtw] UimMessageListener.processIncomingMessages:233:                Dropping invalid message:     custom_1=, webhook=generic_rundeck, rundeck_company=rundeck.scger.corp, arrival=1571989250, custom_2=, pri=1, subject=rundeck, custom_5=/opt/WebSphere8.5.5/AppServer/profiles, origin=<ORIGIN>, hop=0, prid=cdm, source=<IPADRESS>, nimts=1571989263, robot=<ROBOTNAME>, nimid=ST14249130-76703, hostname=<HOSTNAME>, rundeck_job_id=6ed924b4-6ea1-4517-b7e3-a6bf2ed3dacc, supp_key=disk//opt/WebSphere8.5.5/AppServer/profiles, tz_offset=-7200, domain=<DOMAINNAME>, hop0=<HUBNAME>, suppression=y+000000000#disk//opt/WebSphere8.5.5/AppServer/profiles, alarmid=ST14249130-76703        udata: subsys=Disk, nimid=ST14249130-76703, visible=1, level=4, alarmid=ST14249130-76703, message=Average (1 samples) disk free on /opt/WebSphere8.5.5/AppServer/profiles is now 0%, which is below the error threshold (5%) out of total size 9.7 GB @Action:

port: Sistemas WEB ES @Tags: #RUNDECK:6ed924b4-6ea1-4517-b7e3-a6bf2ed3dacc#

(1) error, failed to load rules: IO Error: Broken pipe

                at com.ca.uim.agileops.gateways.common.nisdb.TNT2MetricIdResolver.resolveByProbe(TNT2MetricIdResolver.java:134)

                at com.ca.uim.agileops.metricutil.CMDBInformationCache.get(CMDBInformationCache.java:46)

                at com.ca.uim.agileops.gateways.common.MessageConverter.convert(MessageConverter.java:104)

                at com.ca.uim.agileops.gateways.common.UimMessageListener.processIncomingMessages(UimMessageListener.java:231)

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 

For the other environment, there is a big gap in time :

oct 24 18:14:29:469 [/generic_rundeck, messagegtw] UimMessageListener.processIncomingMessages:249:    Listener /generic_rundeck successfully took 1 messages off the queue

oct 25 08:33:00:912 [/generic_rundeck, messagegtw] UimMessageListener.processIncomingMessages:210:    received data...

 

oct 25 15:01:18:085 [/generic_rundeck, messagegtw] UimMessageListener.processIncomingMessages:249:    Listener /generic_rundeck successfully took 1 messages off the queue

oct 26 00:07:23:262 [/generic_rundeck, messagegtw] UimMessageListener.processIncomingMessages:210:    received data...

oct 26 00:07:23:502 [/generic_rundeck, messagegtw] DefaultHttpclient.sendHttpRequest:106:                sending request without authentication

Environment

Environment 1:

 UIM vers 8.5,1 SP1 running on Windows Server 2008 R2 Ent, UMP vers. 8.5.1GA, running on Linux RedHat6, DB - Oracle

message gateway -1.28

controller - 7.80HF21

wasp - 8.51

nas - 8.56HF5


Environment 2

UIM vers 8.5.1 SP1 running on Windows Server 2012R2, UMP vers. 8.5.1GA, DB - SQL SE2016SP1

message gateway -1.28

controller - 7.80HF21

wasp - 8.51

nas - 8.56HF5

Resolution

change the memory setting on the messagegtw probe as follows

from:

<startup>
   <opt>
      java_opts = -server -XX:ErrorFile=./hs_err_pid.log
      java_mem_max = -Xmx1024m
      java_mem_init = -Xms64m
   </opt>
</startup>

to:

<startup>
   <opt>
      java_opts = -server -XX:ErrorFile=./hs_err_pid.log
      java_mem_max = -Xmx2g
      java_mem_init = -Xms1g
   </opt>
</startup>

 

change the bulk size on the messagegtw probe from 10 to 100

from:

<listeners>
   <generic_rundeck>
      uim_queuename = rundeck
      attach_to_queue = true
      send_exclusive = true
      alert_on_failure = true
      bulk_size = 10

to

<listeners>
   <generic_rundeck>
      uim_queuename = rundeck
      attach_to_queue = true
      send_exclusive = true
      alert_on_failure = true
      bulk_size = 100

restart the robot service.

I would suggest you update the robot and hub to version 7.93 and see if that does not resolve the issue for you.

http://support.nimsoft.com/Files/Archive/00001/hub-7_93.zip

http://support.nimsoft.com/Files/Archive/00055/robot_update-7_93.zip