ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

Troubleshooting UIM Robot-Hub connectivity or communication issues

book

Article ID: 199577

calendar_today

Updated On:

Products

CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) NIMSOFT PROBES DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We are facing communication errors on 15 windows servers where robots are installed. Also we are not able to install probes on those servers. Cannot connect to these robots via IM->Tools->Connect. Some robots have their service running but send robot inactive alarms and appear red in the Infrastructure Manager/Admin Console.

Environment

Release : 9.0.2 or higher

Component : UIM - ROBOT

Resolution

Check if anything changed on the date when the robots first started exhibiting connectivity/communication issues, e.g, upgrade of the robot, change in IP address, change in configuration, networking/routing changes, security changes, etc.

Run the hostname command on the robot machine to make sure it displays the correct/expected hostname

Test if you can ping the robot from the hub and vice versa but we realize that sometimes ping is not allowed due to security.

See if you can resolve the host from each respective host

   From the hub, nslookup <robot_hostname>

   From the robot, nslookup <hub_hostname>

See if you can connect successfully to and from the hub and robot via telnet, e.g.,

   telnet TO the robot FROM the hub on port 48000
   telnet TO the hub FROM the robot on port 48002

If telnet is disabled, enable it if possible or use the Putty utility if there is not other option.

On the robots, check if the following robot probes show as green and show ports AND PIDs?
   
   - controller, hdb, and spooler

Are the controller, hdb, and spooler processes running? If not, it's possible that they are either being blocked or the robot installation was not 100% successful.

On Windows, check the Nimsoft Robot Watcher Service to make sure it's running and in the IM, controller and hdb and spooler should have a port and pid on a robot.

On UNIX/Linux, to check the robot processes:

   - ps -ef | grep nim should display those 3 processes

# ps -ef|grep nim

root       5910      1  0 13:57 ?        00:00:00 ./nimbus /opt/nimsoft
root       5937   5910  0 13:57 ?        00:00:01 nimbus(controller)
root       6652   5937  0 14:13 ?        00:00:00 nimbus(spooler)
root       6654   5937  0 14:13 ?        00:00:00 nimbus(hdb)

On the robot, if only the controller is running and not the hdb and spooler, then it's possible that either the installation did not complete or there is a local firewall enabled and blocking the ports/protocols.

Run a tracert (Windows) / traceroute (Linux/UNIX) command to see if the robot has a network route/path to follow to reach the hub.
 
 - From hub to robot and robot to hub, does it successfully complete?

Check for any crashes or problematic events in the Windows event logs (Application/System), on each machine. For example, crash dumps, Anti-virus blocking - and note that some AV software may generate events that are not categorized as ERROR, they may be categorized as Informational but still causing an issue, e.g., blocking a process/subprocess, etc.. CB/Carbon Black is notorious for this.

Does the Global/Local Anti-Virus configuration contain a full exclusion for all Nimsoft Programs?

Install the Infrastructure Manager on the hub that the problematic robot reports to, then login to see if the robot displays in the IM with no issues (green status). If it does, then the issue may simply be that you cannot access the robot from the location where your IM is installed, e.g., laptop over a VPN connection.

Was there any new/additional security software installed back on <date> when the issue started occurring?

Check to make sure that no local (or intermediate-remote) firewall is enabled/blocking any TCP or UDP traffic to/from the hub and robot.

- Use service iptables status command or firewalld commands (RHEL 7)

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/sec-viewing_current_status_and_settings_of_firewalld

Check with your security team to make sure no Intrusion Prevention System (IPS) / Intrusion Detection System (IDS) is interfering with the connections. Note that this may be a factor given different regions/locations based on where the firewalls are located and/or how they are configured.

Also, if firewall(s) are enabled, check with your firewall team to make sure the proper rules are in place to ALLOW connectivity between the robots and their hubs and/or any related hub-to-hub connections (or tunnels via port 48003).

To make your analysis of communication/connectivity between robot<->hub complete, youhave to check the firewall.

Windows firewall (Check firewall status)

RHEL 6
iptables

service iptables status
service iptables stop
iptabes -F (flush the rules)


RHEL 7, 8
firewalld

firewall-cmd --state
systemctl stop firewalld

https://www.tecmint.com/start-stop-disable-enable-firewalld-iptables-firewall/

AIX

- Check commands online for the given AIX OS Version

Solaris

- Check commands online for the given Solaris OS Version

UIM (Nimsoft) Protocols for all components are TCP except for controller, hub, and spooler, which also require UDP.  UDP broadcast is used for the discovery of the hub, spooler, and controller components. All other core communications are done via TCP.

Firewall Port Reference Help doc:
https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/unified-infrastructure-management/9-0-2/installing/pre-installation-planning/firewall-port-reference.html

For deeper troubleshooting purposes, set the hub loglevel to 6 and controller loglevel to 5, and logsize to 100000, then restart and check the logs.

Quick Checklist:

- Check that the robot.cfg has the correct hub and robot info/configuration.
- Check to make sure that the robot.cfg has not been corrupted or truncated
- ping successful? (Hub<->robot)
- tracert successful? (Hub<->robot)
- telnet TO the Hub from the Robot on port 48002 to see if it succeeds consistently without any intermittent failures.
- telnet TO the robot FROM the hub on port 48000 so it appears that communication from hub TO robot on port 48000 is being filtered/blocked.
- Check for any related communication/connectivity errors in the local robot's controller.log.
- Verify via netstat that the robot is LISTENING on port 48000 so that the hub can contact the robot on that port.

robot listens on port 48000, for example:

netstat -an|findstr "48000"
TCP    0.0.0.0:48000          0.0.0.0:0              LISTENING
UDP    0.0.0.0:48000          *:*

hub listens on port 48002, for example:

netstat -an|findstr "48002"
  TCP    0.0.0.0:48002          0.0.0.0:0              LISTENING
  TCP    10.xx.xxx.120:48002    10.xx.xxx.120:49201    ESTABLISHED
  TCP    10.xx.xxx.120:48002    10.xx.xxx.120:49213    ESTABLISHED
  ...
  ...
  TCP    10.74.240.120:65509    10.74.240.120:48002    ESTABLISHED
  UDP    0.0.0.0:48002          *:*

- On Windows systems, check the Application/System event log for crashes and/or anti-virus interference. Note again that events generated from AV software may not necessarily be categorized as an Error - instead, blocked process/programs may be categorized as an Informational event.

- Check latest robot version/hotfix release notes for similar issues that may have been fixed.

- As an example,

with any robots currently running robot v9.20 and exhibiting crashing of the controller:

if one or more of the servers in this set exhibits crashes in the Windows Application or System event log, upgrade the robot using robot_update.

Hotfix site: https://support.broadcom.com/external/content/release-announcements/CA-Unified-Infrastructure-Management-Hotfix-Index/7233

Robot package:

robot_update_9.20_HF20.zip

Release notes:

robot_update_9.20_HF20.txt

- If telnet to/from Hub<-> robot fails, the underlying issue could be related to hosts/subnet or IP address range's being allowed/disallowed, network packet filtering, network routing, intermediate firewalls, but note that you need to allow Protocols 'TCP' AND 'UDP' bidirectionally for Hub<->robot communication. That is a requirement.

Note also that in general, hubs/robots with Anti-Virus installed, need to have an exception created for ALL Nimsoft Programs on the local Hub or robot machine before installation.

- Also check your AV log carefully to be sure nothing is being blocked
- If after checking all of the factors mentioned above, the Robot<->Hub communication issue(s) persist, then the next step is to do a network test using wireshark to perform a trace at the same time when you're trying to reach the robot on port 48000 or the hub on port 48002. All robots listen on their default port 48000. All hubs listen on their default port 48002.
- Security team/firewall team resources should check security software/firewall logs while running the telnet test between robot and Hub.
- Network team should check route when applicable (tracert from Hub to robot, and vice versa should work successfully) and check communication between hub and robot using verbose wireshark trace. For example, you may see tcp errors are high.  Out-of-Order, Duplicate ACKs, and Retransmissions which may indicate network issues.

Additional Information

When a robot is installed in the DMZ and the hub is outside the DMZ, it may be worth it for the network team to run a wireshark trace on the robot to see if its sending/receiving packets to the hub, and also on the hub to view the current traffic. Also, you can ask the firewall administrator to check the firewall logs to see what if any traffic to/from the robot is being blocked when its sent to the hub.

Robots installed in a DMZ

You could use security rules (allow hub outside the DMZ to communicate with the robots on port 48000, or limit communication for specific source:destinations), or install UIM hub within DMZ and establish tunnel to hub ‘outside’ the DMZ and use either the default tunnel port 48003 (recommended) or port 443 if the security team prefers it. Tunnel could be hub inside DMZ to remote hub or back to the Primary Hub itself.

OS-specific notes

AIX security software/firewalls

lsfilt: List filters rules present in the table. When created, each rule is assigned a number, which can be easily seen using this command.

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/m_commands/mkfilt.html 

http://unixswing.blogspot.com/2019/03/sample-firewall-in-aix.html

telnet TO hub on port 48002 from the DMZ robots worked fine but not in the other direction to the robot on port 48000, that failed.

End result was in this case, some security software installed on the given machines in the DMZ prevented incoming connections because the port was not 'whitelisted.' Access was granted and communication was then bi-directional between the robots/agents and the hubs. Illumio Adaptive Security software.

Attachments