Temperature Alarm

book

Article ID: 167863

calendar_today

Updated On:

Products

XOS

Issue/Introduction

Occasional temperature alarms being sent from chassisTemperature alarm is being raised, but after checking, the temperature in the server room is found to be at the correct level.


CBS# show alarms active
Active Alarms Summary:

Source Critical Major Minor
------ -------- ----- -----
cp1           0     0     1
Total         0     0     1

CBS# show alarms active minor
Active Alarms Summary:

Source Critical Major Minor
------ -------- ----- -----
cp1           0     0     1
Total         0     0     1


* indicates an alarm that can be cleared with the 'clear alarms' CLI command

Minor:

ID Date Source Description
-- ---- ------ -----------
2079 Dec 5 08:10:26 cp1 Intake temperature (41C)

Resolution

Temperature often fluctuates and even if the temperature is at the right level in the server room, there might be situations when an alarm is triggered.

The external temperature should be verified at the time of an alarm to ensure that the alarm has not been caused by a change in room temperature.

Check the log files and look for patterns where an alarm is raised and then is cleared shortly afterward. This might happen at certain times during the day.

If a pattern is found,  it may indicate that the alarm is legitimate. The reason for the higher intake temperature would then need to be investigated. Depending on the rack layout where the chassis is installed, the high intake temperature may be traced to a cause that is external to the Crossbeam chassis.

Below are example logs showing such temperature fluctuations. Alarm is raised and then cleared:

###
# OCCURRED
Dec 8 23:50:43 wheatstone cbshmonitord: [N] Violation (s=1, alarm) occurred 3 times: module:9, item:2202 (H_ID_IN_TEMP), time:"Thu Dec 8 23:50:23 2011", value: 41, norm:0-40, minor:0-42, major:0-44
Dec 8 23:50:43 wheatstone cbsalarmlogrd: AlarmID 2191 | Thu Dec 8 23:50:43 2011 | minor | ap7 | intakeAirTemperatureExceeded | Intake temperature (41C)
Dec 8 23:50:45 wheatstone cbshmonitord: [I] chassis fault counters: 2 0 0

# CLEARED
Dec 8 23:51:03 wheatstone cbshmonitord: [N] Threshold violation is clear: module:9, item:2202 (H_ID_IN_TEMP), alarm, value: 40, time:"Thu Dec 8 23:51:03 2011"
Dec 8 23:51:03 wheatstone cbsalarmlogrd: AlarmID 2192 | Thu Dec 8 23:51:03 2011 | clear | ap7 | intakeAirTemperatureExceeded | Intake temperature (40C) | CorrelationID 2191
Dec 8 23:51:05 wheatstone cbshmonitord: [I] chassis fault counters: 1 0 0

# OCCURRED
Dec 8 23:53:23 wheatstone cbshmonitord: [N] Violation (s=1, alarm) occurred 3 times: module:9, item:2202 (H_ID_IN_TEMP), time:"Thu Dec 8 23:53:03 2011", value: 41, norm:0-40, minor:0-42, major:0-44
Dec 8 23:53:23 wheatstone cbsalarmlogrd: AlarmID 2193 | Thu Dec 8 23:53:23 2011 | minor | ap7 | intakeAirTemperatureExceeded | Intake temperature (41C)
Dec 8 23:53:25 wheatstone cbshmonitord: [I] chassis fault counters: 2 0 0

# CLEARED
Dec 8 23:55:43 wheatstone cbshmonitord: [N] Threshold violation is clear: module:9, item:2202 (H_ID_IN_TEMP), alarm, value: 40, time:"Thu Dec 8 23:55:43 2011"
Dec 8 23:55:43 wheatstone cbsalarmlogrd: AlarmID 2194 | Thu Dec 8 23:55:43 2011 | clear | ap7 | intakeAirTemperatureExceeded | Intake temperature (40C) | CorrelationID 2193
Dec 8 23:55:45 wheatstone cbshmonitord: [I] chassis fault counters: 1 0 0
###

As shown above, the alarm is cleared a few seconds after it is raised. Also,  the alarm is inactive most of the time even during those periods when it is being raised. This suggests that the temperature fluctuates and occasionally exceeds the set threshold.


If you see patterns such as these, take a look at the logs on other nearby systems and see if similar events can be identified. Sometimes, an increase in temperature is caused by a higher load.