search cancel

More than a dozen Redbacks each have ports that have stopped polling

book

Article ID: 112641

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration

Issue/Introduction

We've found a number of Redback devices that have stopped polling an interface on the device. Some we have stopped/started polling and that fixed the issue and the interface starts polling and showing data. This all happened around then end of July for these devices. No changes were made to the system. 

Cause

The setting is related to how vertica dumps WOS (memory writes) to ROS (disk files). The problem with 0 was it would only cache a small amount of writes in memory and then flush to disk. With a 2GB memory cache, we can cache many writes, but 0 setting resulted in very little used of the cache. With the 240 setting, we'll wait longer to dump WOS to disk, so it will batch up bigger writes to the same table which creates less ROS containers on disk. We won't hit the Too many ROS containers as often (hopefully never) unless WOS gets full and takes too long to dump to disk, and while it's dumping, we're doing tons of small writes to the same table that drives up ROS file count over 1024. At that point writes to the table start failing with Too many ROS containers.
It fixes future writes. It doesn't go back and fix broken devices, polling configs, etc. We need to either restart DA, run rest requests, stop/start polling to resolve various issues.
There are still possible polling issues that could result in need to do stop/start polling that aren't related to ROS count.
 

Environment

CAPM 3.x

Resolution

Look at this value in vsql and edit it if needed:
cd /opt/Vertica/bin
./vsql -Udradmin -wdbpass (your admin user and password may vary)
vertica=> select get_config_parameter('MoveOutMaxAgeTime');
If that's below 240, set it to 240 following these instructions:
select set_config_parameter('MoveOutMaxAgeTime', 240);
Once that's done, stop/start polling to get everything working again.