Network Processor Module failed to pass the boot up phase

book

Article ID: 170137

calendar_today

Updated On:

Products

NPM

Issue/Introduction

Network Processor Module (NPM) which handles the actual data traffic passing through the X-Series chassis failed to pass the boot up phase, and no matter reseat or reload the module did not resolve the issue.

Cause

Issue seems to be related to the current configuration files not able to contain on NPM default reload timeout which is the 300 seconds. If the customer has a fairly large interface configuration, NPM boot process may take longer than that. NPMs must fetch larger configuration from configuration manager running on CPM and then process it, which takes longer time with larger configuration, resulting in longer startup times.

Environment

1. Check for similar Network Processor Module (NPM) logs from the Central Processing Module (CPM) /var/log/messages which shows the module failed to load within 300 seconds:

Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] Flushing NpmModuleConfigTable...
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] 04 CA_STATE_INIT_CONFIG_BEGIN
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] This slot (# 2) is up.
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] Flushing NpmFpmStateTable...
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] Flushing NpmGlobalTrafficCleaningTable...
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] NPM in slot 2 shows physical port count of 12
..
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] 05 CA_STATE_INIT_CONFIG_END
Sep 13 09:13:56 npm2 cbsnpmcfgd[571]: [I] [npm2 X.X.X.x] 06 CA_STATE_NORMAL_RUNNING_MODE

Subsequent NPMs failed to boot :
-->
Sep 13 17:38:41 Host1 cbssysctrld: [W] Slot 1 failed to load within allotted 300 secs. Substate=2
Sep 13 17:38:41 Host1 cbssysctrld: [W] Slot 2 failed to load within allotted 300 secs. Substate=2

2. Check the configured amount of circuits and its associate interface:

#show running-config

i. <check the ammount of circuit+ associated interface>
Circuit #> 200+
ii. <if group interface is configured, check how many circuits configured per group>
i.e group IF CircuitX- IF 1/11,1/12,2/11,2/12 accomodate most of the data circuits

Resolution

Rule of thumb in associating number of circuits to interfaces is to increase the NPM reload timeout if the circuit numbers starting to increase more than 200. This options is made configurable even based on XOS command reference guide. The default NPM reload timeout is 300 seconds. Below changes is made is to increase the timeout to 500 seconds to accommodate more than 200 circuits bind to the group interface;

1. Change the NPM reload timeout value to 500 seconds from XOS CLI:
# configure np-reload-timeout 500

2. Attempt to boot each NPM in maintenance mode first. Run the commands below one by one, wait until each modules is brought up:
# configure module 1 maintenance
# configure module 2 maintenance (if there are 2 NPMs)

3. Check if they're able to boot and return them back to normal mode:
# configure module 1 enable
# configure module 2 enable (if there are 2 NPMs)
# wr

5) If the timeout value from step 3) does not help, customer may try to increase it.