Robot 7.70 crashing repeatedly / won't start
search cancel

Robot 7.70 crashing repeatedly / won't start

book

Article ID: 34334

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

After upgrading to robot 7.70 the robot may appear to fail to start, or crash repeatedly.

At level 5, a log sequence such as the following will appear in the controller.log:
?
May 19 10:02:11:875 [47439732251776] Controller: expire/gen_key KrsfYDyfIha8EP/xf5W0Yp6HEHoAIFtzjl5lfV+tENE=
May 19 10:02:11:875 [47439732251776] Controller: expire/VerifyCrc - open crc file and compare
May 19 10:02:11:875 [47439732251776] Controller: expire/VerifyCrc - found: 1
May 19 10:02:11:876 [47439732251776] Controller: expire/fetch_expire - ok; read
May 19 10:02:11:876 [47439732251776] Controller: expire/fetch_expire - done
May 19 10:02:11:876 [47439732251776] Controller: Change directory to /opt/nimsoft
May 19 10:02:11:876 [47439732251776] Controller: validating character encoding of config file: /opt/nimsoft/robot/controller.cfg
May 19 10:02:11:876 [47439732251776] Controller: nimCharsetValidateFile: /opt/nimsoft/robot/controller.cfg: no target charset
May 19 10:02:11:876 [47439732251776] Controller: nimSessionServerStrict - host 141.202.231.116, port = 48000
May 19 10:02:11:876 [47439732251776] Controller: SSL - skipping SSL server setup - this is a hub or robot and ssl_mode is 0 (off)
May 19 10:02:19:879 [47320437931136] Controller: MyPutEnv NIM_QOS_SOURCE=rhel511
May 19 10:02:19:879 [47320437931136] Controller: ? ?NIM_QOS_SOURCE=rhel511
May 19 10:02:19:879 [47320437931136] Controller: --------------------------------------------------------------------------------------------------------
May 19 10:02:19:879 [47320437931136] Controller: ----- Robot controller 7.70 [Build 7.70.2507, Mar 18 2015] started -----
May 19 10:02:19:879 [47320437931136] Controller: ?Name ? = rhel511, IP = 141.202.231.116, Port = 48000
May 19 10:02:19:879 [47320437931136] Controller: ?OS ? ? = UNIX / Linux / Linux 2.6.18-398.el5 #1 SMP Tue Aug 12 06:26:17 EDT 2014 x86_64
May 19 10:02:19:879 [47320437931136] Controller: ?Domain = PMIDomain
May 19 10:02:19:879 [47320437931136] Controller: ?Primary HUB = /PMIDomain/RHEL511/rhel511 141.202.231.116
May 19 10:02:19:879 [47320437931136] Controller: ?Loglevel = 5, Logfile = controller.log
May 19 10:02:19:879 [47320437931136] Controller: ?System Uptime QoS = no
May 19 10:02:19:880 [47320437931136] Controller: ?major=LINUX minor=LINUX_25_64
May 19 10:02:19:880 [47320437931136] Controller: ?Robot device ID = DE150A169045B469699506A60DE63CFD9
May 19 10:02:19:880 [47320437931136] Controller: MyPutEnv NIM_DEVICE_ID=DE150A169045B469699506A60DE63CFD9
May 19 10:02:19:880 [47320437931136] Controller: ? ?NIM_DEVICE_ID=DE150A169045B469699506A60DE63CFD9
May 19 10:02:19:880 [47320437931136] Controller: ciOpen - cache path: /opt/nimsoft/niscache
May 19 10:02:19:880 [47320437931136] Controller: ciOpen - initializing global CI cache
May 19 10:02:19:880 [47320437931136] Controller: ciSave - saving CI ? [C58C100214057E3ECA3BC3DFD478BE221]
May 19 10:02:19:880 [47320437931136] Controller: ciSaveMetric - saving MET [MABC9412FEF7CF2FFA51F5827179A75D3] 10.2:5
May 19 10:02:19:880 [47320437931136] Controller: ciClose - [C58C100214057E3ECA3BC3DFD478BE221]
May 19 10:02:19:880 [47320437931136] Controller: ?Robot state metric ID = MABC9412FEF7CF2FFA51F5827179A75D3
May 19 10:02:19:880 [47320437931136] Controller: ?Running as user root (0)
May 19 10:02:19:880 [47320437931136] Controller: -----
May 19 10:02:19:880 [47320437931136] Controller: restored values from /opt/nimsoft/robot/robot_env.sds
May 19 10:02:19:880 [47320437931136] Controller: Controller - fetch expire information
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - find full path to expire.cfg
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - allocate space for /opt/nimsoft/robot/expire.cfg
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - check access to /opt/nimsoft/robot/expire.cfg
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - CRC check
May 19 10:02:19:880 [47320437931136] Controller: expire/VerifyCrc - /opt/nimsoft/robot/expire.cfg, /opt/nimsoft/robot/expire.crc
May 19 10:02:19:880 [47320437931136] Controller: expire/VerifyCrc - generate checksum based on robot name
May 19 10:02:19:880 [47320437931136] Controller: expire/gen_key /opt/nimsoft/robot/expire.cfg
May 19 10:02:19:880 [47320437931136] Controller: expire/gen_key 4sG4eY1GAAU/SXpBQ7WMzw==
May 19 10:02:19:880 [47320437931136] Controller: expire/gen_key rhel511
May 19 10:02:19:880 [47320437931136] Controller: expire/gen_key KrsfYDyfIha8EP/xf5W0Yp6HEHoAIFtzjl5lfV+tENE=
May 19 10:02:19:880 [47320437931136] Controller: expire/VerifyCrc - open crc file and compare
May 19 10:02:19:880 [47320437931136] Controller: expire/VerifyCrc - found: 1
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - ok; read
May 19 10:02:19:880 [47320437931136] Controller: expire/fetch_expire - done
May 19 10:02:19:880 [47320437931136] Controller: Change directory to /opt/nimsoft
May 19 10:02:19:880 [47320437931136] Controller: validating character encoding of config file: /opt/nimsoft/robot/controller.cfg
May 19 10:02:19:880 [47320437931136] Controller: nimCharsetValidateFile: /opt/nimsoft/robot/controller.cfg: no target charset
May 19 10:02:19:880 [47320437931136] Controller: nimSessionServerStrict - host 141.202.231.116, port = 48000
May 19 10:02:19:880 [47320437931136] Controller: SSL - skipping SSL server setup - this is a hub or robot and ssl_mode is 0 (off)


On a Linux system, /var/log/messages will display the following:
May 19 10:02:19 localhost kernel: controller[9247] general protection rip:358ee78ca0 rsp:7fff90033378 error:0
May 19 10:02:27 localhost kernel: controller[9248] general protection rip:358ee78ca0 rsp:7fff6e3fe148 error:0
May 19 10:02:35 localhost kernel: controller[9249] general protection rip:358ee78ca0 rsp:7fffe54ac6b8 error:0
May 19 10:02:43 localhost kernel: controller[9250] general protection rip:358ee78ca0 rsp:7fff106a6ba8 error:0
May 19 10:02:51 localhost kernel: controller[9252] general protection rip:358ee78ca0 rsp:7fff2bbd2328 error:0
May 19 10:02:59 localhost kernel: controller[9253] general protection rip:358ee78ca0 rsp:7fff6b788408 error:0
May 19 10:03:07 localhost kernel: controller[9254] general protection rip:358ee78ca0 rsp:7fffead8d9e8 error:0
May 19 10:03:15 localhost kernel: controller[9256] general protection rip:358ee78ca0 rsp:7fff80ea65f8 error:0
May 19 10:03:23 localhost kernel: controller[9257] general protection rip:358ee78ca0 rsp:7fffd0dea298 error:0
May 19 10:03:31 localhost kernel: controller[9258] general protection rip:358ee78ca0 rsp:7fff812322a8 error:0


This behavior is caused by having "strict_ip_binding" set to 'yes' in the robot.cfg while having a loglevel set to 4 or higher.

To correct the issue, set the loglevel to 3 or lower, or set strict_ip_binding to a value of "no" and the robot will be able to start.

This will be corrected in a future release.

Environment

Release:
Component: UIMROB