On restart of the appliance, a kernel panic preceded by an XFS code call stack similar to the following is displayed:
RIP [<ffffffff883cf607>] :xfs:xfs_error_report+0xf/0x58
RSP <ffff81028c817c28>
CR2: 0000000000000118
<0> Kernel panic – not syncing – Fatal exception
The CA Multi-Port Monitor appliance uses the high performance Linux XFS file system on two partitions:
/dev/sda4 mounted on /nqxfs
Hosts the Vertica metrics database.
/dev/sdb1 mounted on /data
Hosts the CA Multi-Port Monitor packet capture storage.
XFS file system corruption typically occurs when the appliance experiences a power outage or hardware hang.
The Linux kernel panic is mostly likely to occur on the /nqxfs partition shortly after restarting the appliance when the Vertica metrics database starts.
Repair XFS File System Corruption
Repair a damaged or corrupt XFS file system using the xfs_repair command on the affected partition. After you repair XFS file system corruption on the:
Estimated time to complete XFS repair:30-60 minutes
Follow these steps:
1. If the Multi-Port Monitor terminal displays a kernel panic and system halt message, and is unresponsive, shut down the appliance by holding down the Power button for several seconds. Otherwise, shut down the appliance (see page 12) normally.
2. Press the Power button to start the appliance.
3. After BIOS scans, the initial CentOS boot screen will appear. Hit any key before the countdown reaches zero seconds to enter the boot menu.
4. The default boot kernel will already be selected. Press a to modify kernel boot parameters.
5. The cursor will be at the end of the line of kernel parameters. Add the parameter single to the end of the line, as shown in the example below, and press Enter:
<Please see attached file for image>
src="/servlet/servlet.FileDownload?file=0150c000004AJmaAAG" alt="Single.png" width="592" height="109">
6. When the kernel finishes booting, a command prompt will be displayed. There is no login prompt as the system is running in single user mode.
Note: In single user mode, the appliance can only be accessed from the terminal display.
7. To repair the:
/nqxfs partition
umount it and execute xfs_repair for its block device:
umount /nqxfs
xfs_repair /dev/sda4
/data partition
umount it and execute xfs_repair for its block device:
umount /data
xfs_repair /dev/sdb1
8. In either case, a successful repair produces text output similar to the following:
Phase 1 - find and verify superblock...
Phase 2 - zero log...
- scan file system freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
...
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- clearing existing “lost+found” inode
- deleting existing “lost+found” entry
- check for inodes claiming duplicate blocks...
- agno = 0
imap claims in-use inode 242000 is free, correcting imap
- agno = 1
- agno = 2
...
Phase 5 - rebuild AG headers and trees...
- reset superblock counters...
Phase 6 - check inode connectivity...
- ensuring existence of lost+found directory
- traversing file system starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 242000, moving to lost+found
Phase 7 - verify and correct link counts...
Done
9. Enter reboot to leave single user mode and restart the appliance.
10. Assess whether the XFS repair has returned the partition to normal operations.