search cancel

Alarm policies are not being updated due to errors in multi OC environment

book

Article ID: 211758

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

- The issue is when there are multiple policy management nodes being processed, only one node can process the alarm policies. However, this will cause other nodes to error causing exceptions in the log files.
 
- Errors in policy_management.log below
2021-03-26 16:11:37,730 DEBUG com.ca.uim.policy.management.events.service.HeartBeatService:registerThisNode:243 [Timer-0]   - Registering the policy node to [email protected]://#.#.#.#:443/adminconsoleapp
2021-03-26 16:11:37,743 ERROR com.ca.uim.policy.management.events.service.HeartBeatService:registerThisNode:307 [Timer-0]   - Registration of this policy node to master failed, javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Environment

Release : 20.3

Component : UIM - ALARM POLICY

Cause

- This can happen when HA mode for policy_management_ws is configured but the https path to adminconsoleapp is not defined

Resolution

Attached patch (policy_management_ws_0.27T1.zip) fixed the issue allowing multiple-nodes to process the policies.

This fix is also included as policy_management_ws_0.27HF2 in the UIM 20.3.3 Sept Patch.

Instructions:

- Import the zip into the local archive.
- Deploy the imported package to each OC robot
- raw configure wasp on the primary
- verify the following are set under webapps/adminconsoleapp/custom/uncrypted  (case sensitive entries)

ha_mode = HA
no_failed_attempts = 1
heartbeat_interval_min = 5

On each of the OC servers perform the following:

- raw configure wasp
- add the following key under webapps/policy_management_ws/custom/uncrypted

controller_url = http(s)://<system.domain_name>/adminconsoleapp/

Note: A valid certificate must be in place for the AC machine. If not it will fail. Try falling back this entry to http, or remove it completely to acquire the HA connection.

- delete the following key if it is present

webapps/policy_management_ws/custom/uncrypted/policy_processing

Once this is completed you should start to see the following in your policy_management.log on each OC:

2021-11-18 11:02:01,692 INFO  com.ca.uim.policy.management.config.PolicyManagementConfig:readNimConfig:167 [Timer-1]   - Config file successfully loaded.
2021-11-18 11:02:01,692 DEBUG com.ca.uim.policy.management.events.service.HeartBeatService:registerThisNode:282 [Timer-1]   - Registering the policy node to [email protected]://<system.domain_name>/adminconsoleapp/
2021-11-18 11:02:01,712 DEBUG com.ca.uim.policy.management.events.service.HeartBeatService:registerThisNode:312 [Timer-1]   - Successfully contacted the node master at http://<system.domain_name>/adminconsoleapp/ Response={"heartbeat_interval_min":5,"processing_mode":"HA","hubIP":"10.xx.xx.xx","primary_policy_node":"10.xx.xxx.x","no_failed_attempts":1}
2021-11-18 11:02:01,713 DEBUG com.ca.uim.policy.management.events.service.HeartBeatService:setPrimary:190 [Timer-1]   - Setting policy processing mode to true

Additional Information

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/unified-infrastructure-management/20-3/configuring-and-viewing-monitoring-data/manage-alarms-with-centralized-alarm-policies.html#concept.dita_f16e8c44518bae1bd89925d2bc51fa0a323a025d_PolicyManagementinHighAvailabilityMode

Attachments

policy_management_ws_0.27T1 (1)_1627059881884.zip get_app