Post upgrade to 20.4 mon_config_service probe gets a pid but no port
search cancel

Post upgrade to 20.4 mon_config_service probe gets a pid but no port

book

Article ID: 273135

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Post upgrade to 20.4, mon_config_service probe gets a pid but no port. Redeploying from scratch after delting the prone and folder makes no difference, it wil not start up.

McsProbeConfig.getConfigValue:306: Failed to retrieve config key: timed/max_payload_size value from the mon_config_service probe

The config key does exist. Once this point is reached, nothing else is written to the log and the probe just sits there with a PID and no port.

main, mon_config_service] java.lang.NumberFormatException: For input string: ""

Environment

  • Release: 20.4 CU8
  • Database: Oracle 19c
  • mon_config_service 20.45
  • robot 9.39
  • hub 9.39

Cause

mon_config_service could not get a port and threw a somewhat misleading error in the log, but basically it was finding/defaulting to the wrong (old) hub and as a result could not finish its startup routine due to inability to 'get' config and find/fetch max_payload_size parm value, even though it was present in the cfg, sample log extracts below:

[main, mon_config_service] RetryNimRequest.retryOnCommSessionError:99: Detected session error sending request. Received status (4) on response (for sendRcv) for cmd = 'probe_config_get' name = 'mon_config_service' Code 4

and

[main, mon_config_service] McsProbeConfig.getConfigKeyValue:306: Failed to retrieve config key: /timed/max_payload_size value from the mon_config_service probe
[main, mon_config_service] java.lang.NumberFormatException: For input string: ""
 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

Resolution

Based on the mon_config_service log as it started up ad recognized hubs in the environment, we could see that the actual Primary hub was not identified and the probe was defaulting to use a different hubs' NimBUS address. As a result, when it was trying to run the probe callback to probe_config_get, it could NOT obtain the max_payload_size parameter and value.

Steps to resolve:

1. hub-robot niscache: we deleted the niscache folder on the Primary hub robot to make sure there were no leftover entries for the old/defunct Primary hub (which was still being displayed within IM) navigation window

2. Shutdown the old/defunct hub-robot service (it turned red in the IM navigation window but was still displayed).

3. Restarted the Primary hub-robot

4. Checked in the Primary hub->'Name Services' Tab on the Primary and we found that there was still an entry for the old/defunct hub, so we deleted it and clicked Apply and the hub was restarted again.

5. Rt-clicked on the hub in the Hubs Tab window in the IM GUI and removed the old/defunct hub

5. Rt-clicked on the hub robot in he IM navigation window and chose remove and did the same for the hub.

If the Names Services tab still contains an entry for an old hub, this could prevent a customer from removing an old hub/Primary hub entry/reference that they are trying to completely remove using Infrastructure Manager (IM).

Note that this could also cause a probe, e.g., mon_config_service to not obtain a port if its finding/defaulting to the wrong 'defunct hub.'

Finally, once we finished these cleanup tasks, we redeployed mon_config_service v20.45 from the local archive on the Primary hub (which was the only probe not obtaining a port and no starting up,) and then it obtained a port and started up just fine as it found the proper Primary hub NimBUS address and was then abe to fetch 'get' the config via probe_config_get callback.