telemetry-collector pre-start script times out and fails during Apply Changes in Operations Manager
search cancel

telemetry-collector pre-start script times out and fails during Apply Changes in Operations Manager

book

Article ID: 293866

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

The pre-start script for the telemetry-collector job fails during an Apply Changes or deployment of the telemetry tile with a timeout error when making a request to Operations Manager (Ops Manager) API endpoint, /api/v0/installations, to collect data. 

In /var/vcap/sys/log/telemetry-collector/pre-start.stderr.log, you will see the following error message:
Error: Failed collecting from Operations Manager: Failed retrieving ops_manager installations: Failed GET /api/v0/installations: failed submitting request: Get "https://ops.xxx/api/v0/installations": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The request fails when the number of installation logs collected from Ops Manager is not processed in time before the request from telemetry-collector times out.

By default Ops Manager, does not automatically prune previous installation logs. All installation log records are saved within Ops Manager and in environments where hundreds or thousands of deployments have been executed, then the request from telemetry-collector can time-out due to the payload size.

Environment

Product Version: Other

Resolution

The Telemetry tile development team plans on addressing the timeout issue by allowing operators the ability to adjust the timeout values for telemetry-collector. In the meantime, to allow the Apply Changes to succeed on the telemetry tile, you can prune the installation logs from Ops Manager by following the workaround below.

The workaround requires using Operations Manager Interactive Ruby console to interface with its database and remove all installation logs during a certain timeframe. Before performing the steps below, make sure to take a backup for Operations Manager VM.


For Operations Manager version 2.4 and later

1. SSH into the Operations Manager VM and then sudo to root:
sudo su -

2. Change directories:
cd /home/tempest-web/tempest/web

3. Connect to the Ruby IRB using the following command. 

Note: You need to replace TEMPEST_INFRASTRUCTURE with the actual infrastructure you are using (vsphere, aws, azure, gcp, openstack). In the example here, we are using 'vsphere': 
RAILS_ENV='production' TEMPEST_INFRASTRUCTURE=$INFRASTRUCTURE TEMPEST_WEB_DIR='/home/tempest-web' DATA_ROOT='/var/tempest' LOG_DIR='/var/log/opsmanager' SECRET_KEY_BASE='secret' su tempest-web --command 'bundle exec rails console'

4. Create a variable specifying the timeframe of which logs to keep. In this example, we are keeping installation logs which are no more than a year old.
prune_age = 1.year.ago

5. Run this command to prune the table which contains the installation logs:
Tempest::Install.where(Tempest::Install.arel_table[:created_at].lt(prune_age)).destroy_all

6. At this stage, installation logs which are more than a year old have been removed permanently from Operations Manager. You can proceed with rerunning the Apply Changes on telemetry tile.