Recommended stop/start order for OS patching the Performance Management environment

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

Patching plays a pivotal role in upholding system security and stability within the DX NetOps Performance management system. This Knowledge Base (KB) article aims to address the essential best practices concerning patching procedures. Specifically, it focuses on delineating a systematic approach to patching various components of the DX NetOps Performance management system. The primary goal is to mitigate risks and ensure seamless system functioning, particularly during Linux Monthly Patching schedules.

The objective of this documentation initiative is to establish a structured plan for updating PM servers, with a keen emphasis on sequencing the patching process. Given the capability to stage servers for sequential patch execution, there is a quest to discern the optimal practice for orchestrating the patching sequence. The primary concern lies in ensuring the proper staging of Vertica Database (DB), Data Aggregator (DA), and NetOps Portal servers. While Data Collectors (DCs) may not mandate strict sequencing, a deliberate approach to their patching is also sought.

To address these concerns, it becomes imperative to determine the recommended order for both bringing down and restoring the system environment during regularly scheduled system patching at the OS level. By adhering to established best practices and meticulously planning the patching sequence, the aim is to uphold system integrity, minimize disruptions, and sustain optimal performance across the DX NetOps Performance management system.

Environment

All supported Performance Management releases

Cause

The need for implementing these patching best practices arises from various factors, including the necessity to address system recovery after patching activities, preserve data repository integrity, maintain service reliability, and prevent future issues proactively.

Resolution

Recommended Steps for the Patching Process:

1.Data Aggregator (FT):

Objective: Ensure clean shutdown of the Data Aggregator during patching.

Steps:

Log in to the data aggregator host as the root or sudo user.

Activate Maintenance Mode on each Data Aggregator:

<installation_directory>/scripts/dadaemon maintenance

Confirm Maintenance Mode activation on both data aggregators before proceeding.

If Non-fault DA

stop the service

service dadaemon stop

Patch the server

service dadaemon start

2. Data Repository - Node at a Time:

Objective: Safeguard database integrity and availability during patching.

Steps:

Prioritize database backup following the provided guidelines.

Stop the database on each node using the provided script:

su - <dradminaccount> -c "/opt/vertica/bin/admintools -t stop_node -s <hostname>"

After patching the server, restart the node from a Vertica perspective using the provided script:

su - <dradminaccount> -c "/opt/vertica/bin/admintools -t restart_node -s <hostname> -d <DBNAME> -p <DBPASSWORD> -i"

Manually start each node after it comes back up.

3.Data Collector:

Objective: Ensure continuous data collection post-patching.

Steps:

Stop the database collector service:

service dcmd stop

Proceed with server patching.

Restart the data collector service:

service dcmd start

Verify the status:

service dcmd status

4.NetOps Portal:

Objective: Maintain service continuity during patching.

Steps:

Backup Portal Database as per the guidelines.

Stop all services:

service caperfcenter_console stop && service caperfcenter_devicemanager stop && service caperfcenter_eventmanager stop && service caperfcenter_sso stop && service mysql stop

Patch the server and then start the services:

service mysql start && service caperfcenter_sso start && service caperfcenter_eventmanager start && service caperfcenter_devicemanager start && service caperfcenter_console start

These steps provide a structured approach to minimize risks and maintain system reliability, security, and performance during and after the patching process. Adhering to these guidelines ensures seamless operations and enhances overall system stability.