Security Intelligence status showing Down and NAPP shows degraded on the UI

Products

VMware NSX VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptom

After logging into NSX Manager navigate to System>NSX Application Platform

You will see NSX Application Platform (NAPP) Status Degraded, also Security Intelligence is showing down.

1. SSH into NSX Manager using root credentials and execute the below command:

napp-k get pods | grep -vi running | grep -vi completed

Output may show some pods are in "Pending" state:

nsxi-platform contextcorrelator 0/2 Pending
nsxi-platform infraclassifier 0/2 Pending
nsxi-platform overflowcorrelator 0/2 Pending
nsxi-platform rawflowcorrelator 0/2 Pending

2. Describe the failing pod to check the details:

napp-k describe pod contextcorrelator

From the above output you might see the below error in events:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m13s (x535 over 44h) default-scheduler 0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.

3. Check PVC Status

napp-k get pvc -n nsxi-platform

You may observe some PVCs stuck in a Pending state as shown below:

contextcorrelator-xxxxxxxxxx-exec-1-pvc-0 Pending
infraclassifier-xxxxxxxxxx-exec-1-pvc-0 Pending
infraclassifier-xxxxxxxxxx-exec-1-pvc-1 Pending
overflowcorrelator-xxxxxxxxxx-exec-1-pvc-0 Pending
overflowcorrelator-xxxxxxxxxx-exec-1-pvc-1 Pending
rawflowcorrelator-xxxxxxxxxx-exec-1-pvc-0 Pending

4. Describe one of the pending pvc:

napp-k describe pvc contextcorrelator-xxxxxxxxxx-exec-1-pvc-0 -n nsxi-platform

From the above output you might see the below error in events:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 2m persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'csi.vsphere.vmware.com'

5. Check CSI controller logs

napp-k get pods -A | grep vsphere-csi-controller

Review the logs using:

napp-k logs vsphere-csi-controller-xxxx -n vmware-system-csi  -c vsphere-csi-controller

Sample log excerpt indicating datastore issue:

"CreateVolume failed with error: ServerFaultCode: A specified parameter was not correct: InputSpec.datastore"
"Volume creation failed for PVC due to invalid datastore reference"

If any one of the above symptoms do not match, this KB is not a relevant match for your problem statement.

Environment

NAPP 4.2.0.0

Cause

Security Intelligence pods remained in a Pending state because their Persistent Volume Claims (PVCs) could not be fulfilled. This was due to the StorageClass referencing an invalid or stale datastore, which prevented the vSphere CSI driver from provisioning the required volumes.

Resolution

Validate whether the Tanzu Kubernetes Cluster is healthy.

- Login to vCenter UI.

- Navigate through "Workload Management" - Option is available on the left menu.

- Select "Supervisors" from the top menu.

- Make sure that the "Config Status" is "Running".

- Select the namespace on which NAPP is deployed - Naming convention will be similar to (napp-ns-default).

- Click on "Compute" on the top menu.

- Under VMware Resources, select "Tanzu Kubernetes Clusters".

- Make sure that the "Phase" shows as "Running".

If either Supervisor or the Workload cluster is not in Running state(it could be either in Configuring/Error state), engage Tanzu Technical Support by opening an SR.

If the above is healthy, you may proceed to check further:

Review StorageClass Configuration

Check if the StorageClass being used points to a valid datastore.
If it references an old or deleted datastore signature, update it to a correct and accessible datastore.

Please refer the below link to Configure a VM Storage Policy for the NSX Application Platform:

https://techdocs.broadcom.com/us/en/vmware-security-load-balancing/vdefend/vmware-nsx-application-platform/4-2/deploying-and-managing-the-nsx-application-platform/auto-deploy-the-nsx-application-platform-automation-appliance/configuring-your-environment-for-nappaa-deployment/configure-napp-storage-policy.html

Once the StorageClass is corrected, the CSI driver should successfully provision volumes, PVCs will bind, and affected pods should move to Running state. Security Intelligence should then become fully operational.

Additional Information

If the issue persists after verifying the storage policy and datastore configuration, please contact Broadcom support.