APM does not reboot in Standby state

book

Article ID: 168105

calendar_today

Updated On:

Products

XOS

Issue/Introduction

This article describes an issue in which an APM cannot boot into standby state after an outage and explains how to fix a corrupted APM standby boot image.A spare APM does not boot in Standby state after a power outage or other unexpected halt. The messages file indicates "no export entry" and after several boot attempts, the APM goes into the Down Permanently state, as shown below.

Messages:
Apr 28 13:37:01 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:37:01 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:37:01 jcyl1 cbstftpd[4374]: tftpd: trying to get file: /tftpboot/rdImage.nb2uni
Apr 28 13:37:03 jcyl1 cbstftpd 7.1.2 [Aug 8 2006 20:48:24] () @(#) Copyright (c) 1983 Regents of the University of California. All rights reserved.
Apr 28 13:37:23 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:37:23 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:37:24 jcyl1 rpc.mountd: refused mount request from 1.1.1.42 for /tftpboot/Spare_12 (/): no export entry

Apr 28 13:39:01 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:39:01 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:39:02 jcyl1 cbstftpd[4531]: tftpd: trying to get file: /tftpboot/rdImage.nb2uni
Apr 28 13:39:03 jcyl1 cbstftpd 7.1.2 [Aug 8 2006 20:48:24] () @(#) Copyright (c) 1983 Regents of the University of California. All rights reserved.
Apr 28 13:39:23 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:39:23 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:39:24 jcyl1 rpc.mountd: refused mount request from 1.1.1.42 for /tftpboot/Spare_12 (/): no export entry

Apr 28 13:41:01 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:41:01 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:41:02 jcyl1 cbstftpd[4587]: tftpd: trying to get file: /tftpboot/rdImage.nb2uni
Apr 28 13:41:03 jcyl1 cbstftpd 7.1.2 [Aug 8 2006 20:48:24] () @(#) Copyright (c) 1983 Regents of the University of California. All rights reserved.
Apr 28 13:41:23 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:41:23 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:41:24 jcyl1 rpc.mountd: refused mount request from 1.1.1.42 for /tftpboot/Spare_12 (/): no export entry

Apr 28 13:43:01 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:43:02 jcyl1 cbstftpd[4637]: tftpd: trying to get file: /tftpboot/rdImage.nb2uni
Apr 28 13:43:03 jcyl1 cbstftpd 7.1.2 [Aug 8 2006 20:48:24] () @(#) Copyright (c) 1983 Regents of the University of California. All rights reserved.
Apr 28 13:43:23 jcyl1 dhcpd: BOOTREQUEST from 00:03:d2:00:01:0c via eth0
Apr 28 13:43:23 jcyl1 dhcpd: BOOTREPLY for 1.1.1.42 to Spare_12 (00:03:d2:00:01:0c) via eth0
Apr 28 13:43:24 jcyl1 rpc.mountd: refused mount request from 1.1.1.42 for /tftpboot/Spare_12 (/): no export entry

Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [W] SlotTmoExp Slot 12 started 120 sec ago. Still no heartbeat
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [E] Fault Slot 12 is down PERMANENTLY
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [I] Stopping slot 12
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [I] excessive failures on slot 12, retry limit exceeded, disabling permanently.
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [E] Fault Slot 12 is down PERMANENTLY
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [I] Stopping slot 12
Apr 28 13:44:54 jcyl1 cbssysctrld[4105]: [I] APM slot 12 (VAP 0) state change: loading -> down
Apr 28 13:44:54 jcyl1 cbsalarmmond[4028]: [I] Module slot 12 New State Down



Cause

When a standby APM reboots, it tries to boot from /tftpboot/rdImage.nb2uni. If this file is corrupt, the APM cannot successfully boot as a standby APM.

If the APM cannot boot, the cbssysctrld attempts to restart it but after several failed attempts, the APM goes to a permanently down state.

Notes:

If the APM is associated with a VAP group, it boots normally because it uses /tftpboot/vap-group_N/boot/bzImage.nb to boot.

You can view the /etc/dhcpd.conf file to confirm APM details. In the case of a standby APM, the /etc/dhcpd.conf file includes the following information:

# VAP 0 to apm slot 5 state loading
host Spare_5 {
hardware ethernet 00:03:d2:00:01:05;
fixed-address 1.1.1.35;
filename "/tftpboot/rdImage.nb";
option host-name "Spare_5";
}

Description:

VAP 0 to apm slot 5 state loading:   Indicates that the APM does not belong to a VAP (VAP 0) and its state is "loading"
host Spare_5:     The identification for a dhcpd entry. Must be unique for each dhcp client.
hardware ethernet:    The MAC address of the APM (00:03:d2 - Crossbeam reserved. 00:01:05 - 00:sysid:slotnumber
fixed-address:     The IP address, which is built as follows: 1.1.<sysid>.[33-42];
Where 33 is slot 3, 34 slot 4 (when populated with an APM), 35 slot 5, ..., 41 slot 11, 42 slot 12.
filename:     The kernel used for Standby APMs.
option host-name:     The name of the APM. When time an APM goes to Standby, the system gives it a name of the form "Spare_<slot number>".


Resolution

To Replace a Corrupt Standby Boot Image

1.   Check rdImage.nb2uni file integrity against a system with the same version of XOS installed.

Example

# md5sum /tftpboot/rdImage.nb2uni (corrupt image)
9d2eb3fd1c2f26913d3548ab529f8b41 /tftpboot/rdImage.nb2uni


In a different chassis running the same version of XOS

# md5sum /tftpboot/rdImage.nb2uni (good image)
db1186a63c92b1c476555f4f5b1a573a /tftpboot/rdImage.nb2uni

 
2.   Make a backup copy of the corrupted file and copy the good file to the tftpboot directory.
 
# cp /tftpboot/rdImage.nb2uni /tftpboot/rdImage.nb2uni.cpt
#
cp /tftpboot/.private/home/admin/rdImage.nb2uni /tftpboot/rdImage.nb2uni
 
3.   Disable and then enable the module to boot it into the Standby state.
 
CBS# configure module 12 disable
CBS#
configure module 12 enable

Workaround

N/A