Changing the wait time before an APM is reset

book

Article ID: 167992

calendar_today

Updated On:

Products

XOS

Issue/Introduction

CPM monitors the state of all the modules in the system using HealthCheck polls. If the modules do not respond to these HealthCheck polls, cbssysctrld process on the CPM will reset the slot/module.
X-Series Module HealthCheck Polls:

HeathCheck polls are originated by the CPM's cbshmonitord daemon. There are 3 different types of polls.
  • Fast polls- these are sent 1 per second and queries for rapidly changing info like insertion and removal of blades, alarm LEDs conditions, etc.
  • Medium polls - sent every 10 seconds and queries other info (link states, temperatures)
  • Slow polls - sent every 30 seconds and queries more slowly change information on each blade.(cpu utilization, voltage fluctuation, power and fan status,etc)
Responses to HC polls are originated by the "cbshagentd" on every blade, and runs in user space (ie. not kernel prioritized).

Missed Heartbeats or HealthCheck Polls:
  • If a blade misses 2 seconds of heartbeats(8) on each path and misses 3 HC polls, the cbssysctrld daemon will reset the non-responding module.
  • If a blade misses 60 seconds of HC polls but is receiving heartbeats, the cbssysctrl daemon will reset the module.

Resolution

n/a

Workaround

Configure the vap-group vg-reset-wait time at the CLI to delay resetting of a VAP if it is CPU bound/busy and fails to send heartbeats for a period of time.
Possible configurable values are from 0 to 60 seconds, default value is 5 seconds.


Configuration example:

#
vap-group fw xslinux_v5_64
  vap-count 2
  max-load-count 2
  ap-list ap3 ap4
  load-balance-vap-list 1 2 3 4 5 6 7 8 9 10
  ip-forwarding
  vg-reset-wait-time 60   <<<<<<
  ip-flow-rule fw_lb
    action load-balance
    activate

 

This command is available in XOS 9.5.x and later.