Cleaning up excess ROS containers in a CA Performance Manager (CAPM) Data Repository cluster

book

Article ID: 35310

calendar_today

Updated On:

Products

CA Infrastructure Management CA Infrastructure Management CA Performance Management - Usage and Administration CA Performance Management - Data Polling

Issue/Introduction

Question(s):

Where does the mergeout_partitions.sh script come from? Why would it be used? How is it used?

I have many more ROS containers on some nodes versus others. How can they be cleaned up?

 

NOTE: Do NOT run this script in a 2.7 or newer installation of CA Performance Manager (CAPM). Doing so can will cause unwanted performance impacts on the Vertica Database. If excessive ROS container counts are seen in CAPM releases 2.7 or newer, please open a new case with the Support team for investigation.

 

Problem:

I have observed these errors in the Data Aggregator karaf.log file:

Caused by: java.sql.SQLException: [Vertica][VJDBC](5065) ERROR: Too many ROS containers exist for the following projections:

<<<more specific info per error instance cut here for brevity>>>

This script will help clean up the excess ROS containers triggering that errors appearance.

 

Additional information:

The script is first referenced here in CAMM 2.5 Wiki docs:

https://docops.ca.com/display/CAPM250/Upgrade%20the%20Data%20Repository

In short, the way CAMM pushes large sets of data to the system, it causes potential ROS container mergeout operations to back up. When this occurs stability and performance problems may be seen from the Data Repository cluster involved. To alleviate this, CAMM was released with this script as part of the install. You installed CAMM and configured a CRON job to run the script on a regular bases to prevent these problems from occurring.

Can this script be used to achieve the same result, on a non-CAMM CAPM environment? Yes, it can.

 

Common questions and answers when using this on a system without CAMM without the script present by default

Q1 - Appears since you give it the user/password info when launching it, it can be owned and launched by any user that has access to it and the ability to write the log file specified. Is that correct? It doesn't have to be done as root or dradmin users?

A1 - Correct, because the script is provided user access details at run time, it can be owned by and launched as any user on the system.

 

Q2 - Does the script require special permissions aside from having execute rights as the user that owns it?

A2 - Not but one caveat is to ensure the user that owns and runs the script is allowed to write its log file to the location specified.

 

Q3 - If using on a lab system where time of day it is launched is not a concern, does it get launched any differently than via the cron job? Would the command be:

./mergeout_partitions.sh <da_user> <da_pass> <output_logfile> > /dev/null 2>&1

A3 - Yes, that is correct.

 

Q4 - Where can I get the identify_top_partitions.sql referenced in the comments of the script? Anywhere aside from a CAMM 2.5 or newer install?

A4 - That is an old version of the script. It was used with CAMM only up until CAMM 2.5. Newer CAMM versions of the script, and versions of the script for CAPM without CAMM present, no longer reference that file. As such the command was removed about the .sql file since it is no longer needed being a standalone script.

 

Q5 - How many containers will the script touch at one time, per run of the script?

A5 - By default, mergeout_partitions.sh is configured to operate on up to 500 table projections with each run. To increase or decrease the value, modifying the MAX_TABLES_TO_MERGE setting in the mergeout_partitions.sh script. Do please be aware of the possible impact to system resources increasing this limit may have. Increase this limit with caution.

 

Q6 - Does the script need to be run on one node in a multi-node cluster, or can it be run on any node?

A6 - It can be run on any node in the cluster.

 

Q7: How many ROS containers do my nodes have? Do I need to run the script?

A7: To determine ROS container counts prior to the script running, to get a sense of how many runs it may take to complete, run the following query in the vsql prompt of the Data Repository DB server.

select node_name, count(*) from storage_containers where storage_type ilike '%ROS%' group by node_name;

The simplest way to do so is:

  1. Log in to the server as the dradmin or equivalent user
  2. Launch adminTools UI from the /opt/vertica/bin directory (default install location) with the command "./adminTools"
  3. When prompted for a password, specify the same one that would be used to stop/start the DB from the adminTools UI
  4. Run the query above.

An example of the output would be:

select node_name, count(*) from storage_containers where storage_type ilike '%ROS%' group by node_name;

node_name | count

-----------------------+-------

node0001 | 14991

node0002 | 15282

node0003 | 14949

In this example we can see that node 1 and node 3 have a 42 container difference. Meanwhile node 3 has 291 more than node 1, and 333 more than node 3. In this instance it is worth running the script to clean up the node 2 system having that many more containers than the other nodes.

Any difference approaching 500 or more ROS containers from node to node is indication it may be worth running the script.

Lastly, this script is unlikely to be needed in CAPM releases 2.7 or newer. The version of Vertica installed with CAPM 2.7 does a much better job keeping current with ROS container mergeout operations, so the script should no longer be needed after that.

 

How to use the script?

NOTES/TIPS:

- To run the script we recommend setting up a scheduled cron job to run the script at night. It is best if this script is set to run during hours when fewer users are on the system.

- The script may need to be run mulitple times to complete its task due to the 500 containers per run limitation. This is done to limit the time spent taking up resources on the server.

- To determine whether the script is running, or determine the progress of the script, use the tail command on the output_logfile by the output_logfile parameter. The output is similar to the following example:

Found 500 projections to merge 

Processing Table 1 - nrm_qos_cos_contract_rate_super_seg_b1

Processing Table 2 - reach_rate_super_seg_b0

Processing Table 3 - nrm_mpls_segment_out_rate_super_seg_b1

Processing Table 4 - nrm_mpls_segment_in_rate_super_seg_b1

 

Configure the script:

  1. Download the script mergeout_partitions.sh.
  2. Place it on the node it will be run on, preferably in the home directory of the user that will own and run it.
  3. Ensure the user that will run the script owns it, and the script is set with permission that allow it to be executable by the user that owns it.

 

Manually launch the script:

Run the script manually, on-demand without scheduling a cron job with the following:

./mergeout_partitions.sh <da_user> <da_pass> <output_logfile> > /dev/null 2>&1

The values given to the script are:

  • db_admin_user: The Vertica admin user (often per the docs this is the user named 'dradmin')
  • da_user: The database user that the Data Aggregator uses to connect to the database
  • da_pass: The password of the database user
  • output_logfile: The logfile to write status information to. The logfile is created in the same directory where the script resides.

NOTE: If the da_user and da_pass are not known, check the following file on the DA host:

$DA_HOME/IMDataAggregator/apache-karaf-2.3.0/etc/dbconnection.cfg

Default /opt install home location:

/opt/IMDataAggregator/apache-karaf-2.3.0/etc/dbconnection.cfg

If we were to run this on a system set up per the defaults it might look like this:

00 03 * * * /dradmin/mergeout_partitions.sh dauser dbpass /tmp/logfile.txt > /dev/null 2>&1

That would run the tool out of the dradmin users home directory, and write a log file called logfile.txt to the /tmp directory.

 

Schedule the script run:

Set up a cron job to schedule the scripts execution for the user that owns it on the node it will run on. Use the following example as a template to configure your cron job if the goal is scheduling the run of the script:

00 03 * * * <db_admin_user_home_directory>/mergeout_partitions.sh <da_user> <da_pass> <output_logfile> > /dev/null 2>&1

The values given to the script are:

  • db_admin_user: The Vertica admin user (often per the docs this is the user named 'dradmin')
  • da_user: The database user that the Data Aggregator uses to connect to the database
  • da_pass: The password of the database user
  • output_logfile: The logfile to write status information to. The logfile is created in the same directory where the script resides.

NOTE: If the da_user and da_pass are not known, check the following file on the DA host:

$DA_HOME/IMDataAggregator/apache-karaf-2.3.0/etc/dbconnection.cfg

Default /opt install home location:

/opt/IMDataAggregator/apache-karaf-2.3.0/etc/dbconnection.cfg

If we were to run this on a system set up per the defaults it might look like this:

00 03 * * * /dradmin/mergeout_partitions.sh dauser dbpass /tmp/logfile.txt > /dev/null 2>&1

That would run the tool out of the dradmin users home directory, and write a log file called logfile.txt to the /tmp directory.

Environment

Release: IMDAGG99000-2.5-Infrastructure Management-Data Aggregator
Component:

Attachments

1558534178297TEC1573702.zip get_app