DX UIM tool outage – no access to IM or OC
search cancel

DX UIM tool outage – no access to IM or OC

book

Article ID: 397390

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

  • Today we experienced this outage of the UIM tool: both the Infrastructure Manager and the Operator Console became unresponsive. In order to regain access, it was necessary to restart the service directly on the server.
  • The Infrastructure Manager (IM), hub and OC seems to have crashed and the hub as well so we would like to do a root cause analysis to find out what happened.
  • Could you help us identify the root cause of this incident?

Environment

  • UIM 23.4 CU2

Cause

  • IM needed to be updated to the correct version
  • wasp java memory configuration

Resolution

  1. Update IM on the Primary hub and all machines where UIM admins are using IM to 23.4.2 version.

    To resolve this issue, navigate to http://<primary_hub>/uimhome and download the IM installer, then rt-click "Run As Administrator" and upgrade the IM instances and then restart the system.

  2. 16 GB and 40 GB is far too much and the wasp doesn't need that much max memory as per the logs and in general. All wasp instances were set to 40 GB max and this is way to much and a waste of memory allocation on each machine as well. Check each wasp.log for each wasp instance and note the Used versus Available memory.

  3. Set the min and max appropriately.

    The last wasp instance we checked in the IM list for each Operator Console (OC) showed 8 GB used so in that case set the min to 8 GB and max to 10 GB but check each system.

    Then let the machines run for 24 hours and check the wasp memory utlization tomorrow. You may also want to monitor the OC java process using 'name plus command line' (to differentate the processes) to check the utilization on each wasp machine using the processes probe to see the trends and wasp java memory utilization over time.

Additional Information

hub/robot:

Windows events showed no hub.exe nor controller.exe errors.

IM crash:

Faulting application name: NimBUSManager.exe, version: 2.0.4.6, time stamp: 0x66b5fc90
    Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
    Exception code: 0xc0000005
    Fault offset: 0x00000000
    Faulting process id: 0xba3c
    Faulting application start time: 0x01dbc412dd77d653
    Faulting application path: C:\Program Files (x86)\Nimsoft\bin\NimBUSManager.exe
    Faulting module path: unknown
    Report Id: fd57a01d-84fa-4212-8ec9-18fa4056fe7c
    Faulting package full name: 
    Faulting package-relative application ID:

wasp.log:

may 12 13:39:15:820 DEBUG [Catalina-utility-3, com.nimsoft.nimbus.probe.service.wasp.WaspLifecycleListener] Memory Status: Max Limit: 37211MB, Allocated: 37211MB, Free: 34023MB, Used: 3188MB

The wasp probe did generate multiple warnings related to the memory/a potential memory leak for webapps like the dashboard portlet below and the ump_operatorconsole portlet etc.

Examples shown below:

\par may 12 13:40:48:011 WARN  [Catalina-utility-4, org.apache.catalina.loader.WebappClassLoaderBase] The web application [dashboard] appears to have started a thread named [AlarmCacheUpdateThread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
\par  sun.misc.Unsafe.park(Native Method)

\par may 12 13:40:49:512 WARN  [Catalina-utility-4, org.apache.catalina.loader.WebappClassLoaderBase] The web application [operatorconsole_portlet] appears to have started a thread named [DefaultQuartzScheduler_Worker-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

\par may 12 13:40:50:574 WARN  [Catalina-utility-4, org.apache.catalina.loader.WebappClassLoaderBase] The web application [samlsso] appears to have started a thread named [Timer-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

\par may 12 13:40:51:308 WARN  [Catalina-utility-4, org.apache.catalina.loader.WebappClassLoaderBase] The web application [mps] appears to have started a thread named [FileWatchdog] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:

\par may 12 13:40:49:527 WARN  [Catalina-utility-4, org.apache.catalina.loader.WebappClassLoaderBase] The web application [operatorconsole_portlet] appears to have started a thread named [DefaultQuartzScheduler_QuartzSchedulerThread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: