The following is a list of common issues related to the DX OI integration with UIM
DX Operational Intelligence 2x, SaaS
Root cause:
The default log level is 1-Error.
Solution:
You may change the log level using the UI. The log levels 5-Trace & 4-Debug provides very detailed information and those should be used only for troubleshooting purposes. These may degrade performance on a production system.
The default logsize is 10240 (10 MB); you may update the logsize can from the Raw Configure setup/logsize option.
Root cause:
The issue is related to the PPM probe, which helps in rendering the configuration UI.
Solution:
Check the version of the PPM probe. Make sure the PPM version matches the UIM release version. Verify the logs. Restart PPM probe.
It may also happen if there is any connectivity issue between PPM robot (in general it is a primary hub) and robot hosting oi_connector probe (if different from PPM robot). You may check connectivity using telnet or any other tool.
Root cause:
The issue is related to PPM probe, which helps in rendering the configuration UI.
Solution:
Check the version of the adapter JAR file (oi_connector-adapter-*.jar) within PPM probe folder. Delete the JAR from PPM folder (Nimsoft\probes\service\ppm\adapterlib). Delete files present in (\Nimsoft\probes\service\ppm\cache\AttributionClient) folder. Restart OI_Connector probe.
If the issue remains, backup oi_connector.cfg file. Delete the OI_Connector probe & redeploy it and restore configuration file. Restart the probe.
Root cause:
The customers integrating on premise UIM with SaaS OI, these options does not work.
Solution:
These options are not applicable when integrating with SaaS OI. However, we recommend verifying the endpoint URL & Tenant ID.
Root cause:
The username & password fields are mandatory fields from UI validation.
Solution:
Provide some dummy values for username & password for proxy. Open the raw configure option of the probe and remove the values in the user name (resource/properties/proxy_user) & password (resource/properties/proxy_password) fields. Save the configuration. The probe will be restarted and uses Proxy without authentication.
Root cause:
Some network issues prevent probe connecting to Jarvis & NASS.
Solution:
From the robot on which the probe is deployed (typically primary hub), verify the network connectivity to Jarvis & NASS Hosts. Please check if there is any firewall/proxy causing the connection issue.
Root cause:
Some internal processing errors may prevent Probe from posting the data. This issue typically happens when there is some problem with connecting to the queues.
Solution:
Verify any exceptions towards the end of the connector logs to identify the issue. If logs specify any of the below messages, restarting the probe would resolve the issue.
*Got NimException connecting to queue group_info*
*Unable to open a client session for 127.0.0.1:48002*
*Unable to open a client session for 127.0.0.1:48002: Connection refused: connect*
We recommend monitoring the logs using Logmon and perform the probe restart when probe logs has above messages logged.
Root cause:
The probe is not configured properly.
Solution:
Open the probe configuration and verify the probe list is configured for the required probes. Make sure your group selection includes all Groups or leave it empty.
If no group is selected, the probe sends data for all groups in the system. This configuration takes care of any new groups created after initial configuration.
Root cause:
The metrics/alarms/inventory generated in the environment is large to be handled by default configuration.
Solution:
The default configuration of the probe requires maximum of 2 GB of heap space. This works fine with inventory size of 10000 servers. Increase memory by 2GB for every 10000-inventory items.
Open probe Raw Configuration, update -Xms1g -Xmx2g parameters of startup/options section.
Root cause:
The issue is related to cache handling of the probe.
Solution:
Probe may not be able to close all alarms due to some processing exceptions.
However, alarm reconciliation address these issues if configured to run at specified intervals.
Root cause:
The configuration is not optimal for the incoming QOS/Alarm message.
Solution:
We recommend updating configuration parameters. For detailed information, refer to the link.
Root cause:
The issue occurs when there is an issue in device lookup for the alarm.
Solution:
Check if the following query returns data.
SELECT CD.* FROM NAS_ALARMS NA
INNER JOIN CM_CONFIGURATION_ITEM_METRIC CIM ON CIM.ci_metric_id = NA.met_id
INNER JOIN CM_CONFIGURATION_ITEM CI ON CI.ci_id = CIM.ci_id
INNER JOIN CM_DEVICE CD ON CD.dev_id = CI.dev_id
WHERE NA.met_id = ' xxxxxx'
If no data returned by the query, it means there is an issue with discovery and require investigation from discovery point of view.