Check Point R77.10 in larger VSX deployments may take too long to start

book

Article ID: 168143

calendar_today

Updated On:

Products

VIP Enterprise Gateway APM

Issue/Introduction

​Customers with larger VSX deployments may experience a very slow transition of VAP members from "UP" state into "ACTIVE" state.
This transition may even take about an hour.

1. "cphaprob state" correctly reports all cluster members as "ACTIVE"
2. Application monitor correctly reports all monitored Check Point components on the cluster member as "UP":
#/crossbeam/apps/app_status -v
cpd is RUNNING
fwd is RUNNING
fwk is RUNNING
VSX is READY
HA  is READY
Reporting application state: UP


3. VAP members are still only reported in "UP" state and not "ACTIVE" state as shown in "show chassis":
<snip>
5 Yes ap3 AP9600 Up 0 days, 00:20 
6 Yes ap4 AP9600 Up 0 days, 00:20
<snip>

Cause

The issue is configuration fetch + NFS file system related.

 

Resolution

To significantly improve the the "UP" -> "ACTIVE"  transition times, upgrade to R77.10 MR1 or to R77.30.

Here is a snippet of R77.10 MR1 release notes:

ID 102652 Adds R77.10-specific optimizations to reduce overall NFS network load at APM boot time.

Workaround

1. Boot each blade individually instead of starting all VAP members at the same time. This should significantly shorten the UP -> ACTIVE transition time.

2. You may monitor the state transition process by regularly running the command:
# ps aux | grep Z
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       315  0.0  0.0      0     0 ?        Z    07:32   0:00 [vs_start.bash] <defunct>
root      2805  0.0  0.0      0     0 ?        Z    07:33   0:00 [vs_start.bash] <defunct>
root      5071  0.0  0.0      0     0 ?        Z    07:33   0:00 [vs_start.bash] <defunct>
root      5732  0.0  0.0      0     0 ?        Z    07:33   0:00 [vs_start.bash] <defunct>
root      6094  0.0  0.0      0     0 ?        Z    07:33   0:00 [vs_start.bash] <defunct>
root      6970  0.0  0.0      0     0 ?        Z    07:34   0:00 [vs_start.bash] <defunct>
root     21189  0.0  0.0      0     0 ?        Z    07:28   0:00 [vs_start.bash] <defunct>
root     21445  0.0  0.0      0     0 ?        Z    07:28   0:00 [vs_start.bash] <defunct>
root     21653  0.0  0.0      0     0 ?        Z    07:28   0:00 [vs_start.bash] <defunct>
root     22043  0.0  0.0      0     0 ?        Z    07:29   0:00 [vs_start.bash] <defunct>
root     22173  0.0  0.0      0     0 ?        Z    07:29   0:00 [vs_start.bash] <defunct>
root     22450  0.0  0.0      0     0 ?        Z    07:29   0:00 [vs_start.bash] <defunct>
root     22814  0.0  0.0      0     0 ?        Z    07:29   0:00 [vs_start.bash] <defunct>
root     23263  0.0  0.0      0     0 ?        Z    07:29   0:00 [vs_start.bash] <defunct>

 
As soon as all zombie (<defunct>) processes disappear, the module should be in "ACTIVE" state.

3. Another way of monitoring is to watch "/var/log/audit_trail.log" for appearance of this line indicating the whole process is finished:
Sep 25 10:05:16 <hostname> cli: COMMAND: CBS# copy running-config > copy running-config startup-config