EM running TIM Collection Service randomly fails to retrieve defects from TIM with "Connection reset" in EM log.

book

Article ID: 6638

calendar_today

Updated On:

Products

APP PERF MANAGEMENT CA Application Performance Management Agent (APM / Wily / Introscope) CUSTOMER EXPERIENCE MANAGER INTROSCOPE

Issue/Introduction

EM running TIM Collection Service randomly fails to retrieve defects from TIM.

This message is visible in the EM log:

3/21/17 03:40:02.034 AM JST [ERROR] [TimPollThreadPool.Thread5] [Manager.com.timestock.tess.services.tim.TimIo] Error retrieving URL http://xxx.xxx.xxx.xxx:80/ca/apm/tim/mod_python/timfiles/listDefects: Connection reset

The TIM httpd access_log (/etc/httpd/logs/access_log) shows a corresponding successful GET a few seconds before:

[21/Mar/2017:03:40:01 +0900] "GET /ca/apm/tim/mod_python/timfiles/listDefects HTTP/1.1" 200 3

Cause

The Linux logrotate feature had been enabled for the httpd process on a daily basis and the /etc/logrotate.d/httpd included a postrotate option to restart the httpd process:

postrotate /sbin/service httpd graceful > /dev/null 2>/dev/null || true

From the access_log the actual web server request to get the data was successful but before the data could be returned to the EM the httpd process was recycled for the logrotate causing the premature socket closure.

Environment

APM 9.x, 10.x

Resolution

The problem will only occurs occasionally because it would require the httpd restart to occur exactly during an EM data polling request which normally is very short. The default EM polling interval is 5 seconds for defects & events and 7 seconds for btstats (see tess-default.properties file for details).