NSX Intelligence activation fails due to failed pre-checks.

Products

VMware vDefend Firewall with Advanced Threat Prevention VMware vDefend Firewall

Issue/Introduction

During the activation of NSX Intelligence, the process fails at the "Run pre-check" stage with the error:

Security Intelligence Activation: Pre-check failed at Cluster Resource Capacity Check.

Error Message:
"Failed to invoke API on NAPP. Please check logs on the pre-check job pod for more information."

When executing the following command on the NSX Manager after logging in using root credentials:

napp-k logs job/nsx-intelligence-precheck-jobs -c validate-capacity -n nsxi-platform

The following errors were observed in the logs:

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cluster-api'. Adding certificate verification is strongly advised.
calling POST https://cluster-api:443/report/precheck {'id': 'capacity', 'name': 'feature.precheck.capacityName', 'desc': 'feature.precheck.capacityDesc', 'feature': 'intelligence', 'status': 'INPROGRESS', 'reason': ''}
REST OK
calling POST https://cluster-api:443/features/intelligence/capacity/validate {'remaining_cpu_percent': 25, 'remaining_memory_percent': 25}
calling POST https://cluster-api:443/report/precheck {'id': 'capacity', 'name': 'feature.precheck.capacityName', 'desc': 'feature.precheck.capacityDesc', 'feature': 'intelligence', 'status': 'FAILED', 'reason': 'feature.precheck.nappApiInvocationFailedReason'}
REST FAILED: Internal Server Error

This indicates that the issue lies with the failed API invocation during the capacity validation.

When we check the cluster-api pod logs by executing:

napp-k get pods | grep cluster-api
napp-k logs <cluster-api-pod-name-selected>

The logs show the following error:

2024-08-20T11:52:15.576268979Z stdout F {"time":"2024-08-20T11:52:15.576196673Z","level":"ERROR","prefix":"-","file":"service.go","line":"136","message":"failed to fetch licenses from configmap: invalid character 'A' looking for beginning of value"}
2024-08-20T11:52:15.576298914Z stdout F AUDIT: method=POST uri=/features/intelligence/capacity/validate remote_ip=192.168.2.48 host=cluster-api id= latency=456.998264ms status=500 error= user=

Environment

Any NAPP version.

Cause

The failure is caused by the absence of license information in the configmap. This occurs because the NSX Manager's common-agent-service has not yet streamed the required license data.

Resolution

To resolve this issue, you need to trigger the common-agent to stream the license information by re-applying a license on the NSX side.

Step 1. Identify the NSX Manager node that is the leader for COMMON_AGENT_SERVICE:

su admin -c "get cluster status verbose" | grep COMMON_AGENT_SERVICE

Step 2. Get the IP address of the manager using the UUID from the above output:

su admin -c "get cluster status" | grep <uuid-of-manager-from-step-1>

Step 3. SSH into the identified NSX Manager node:

ssh root@<nsx-ip-from-2>

Step 4. Verify Keystore:

Run the following command to check the keystore:

keytool -list -keystore /home/secureall/secureall/.store/.napp_kafka_keystore -storepass $(cat /home/secureall/secureall/.store/.napp_kafka_keystore_pw)

Sample output of successful execution:

root@ansnsx1:~# keytool -list -keystore /home/secureall/secureall/.store/.napp_kafka_keystore -storepass $(cat /home/secureall/secureall/.store/.napp_kafka_keystore_pw)
Keystore type: ABC
Keystore provider: DEF

Your keystore contains 1 entry

k8s_msg_client, Feb 20, 2025, PrivateKeyEntry,
Certificate fingerprint (SHA-256): 00:00:AB:00:00:00:00:00:AB:00:AB:00:00:AB

Warning:
The DEF keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /home/secureall/secureall/.store/.napp_kafka_keystore -destkeystore /home/secureall/secureall/.store/.napp_kafka_keystore -deststoretype pkcs12".

Step 5. If successful, Restart "proton" service using the command:

systemctl restart proton

Step 6. If the command fails, follow "Regenerate Kafka Client Certificate (If Keystore Check Failed)" below to regenerate the Kafka client certificate.

Regenerate Kafka Client Certificate (If Keystore Check Failed):

Navigate to System > Certificates in the NSX Manager UI.

Filter by 'Issued To: k8s-msg-client'.

Click on the 3 dots next to 'Message bus client for NSX Application platform' and select 'Replace Certificate'.

Choose 'Generate Self-Signed Certificate' and save the certificate.

After completing the above steps, the license information should be successfully streamed, resolving the pre-check failure.

NOTE: You may not see the "Replace Certificate" option in some old releases of NSX Manager

Additional Information

In some scenarios, when we check the /var/log/proton/nsxapi.log on the Common_Agent_Leader NSX Node(Please refer to the resolution section above on the process of finding the leader node):

The logs show the following error - This symptom is only observed when there is a communication failure between the NSX manager and NAPP.

2025-02-13T12:01:14.020Z WARN CommonAgentClusterCertProcessor NetworkClient 2464895 [Consumer clientId=abcd-nsx-manager, groupId=nsx-manager] Bootstrap broker <napp-messaging-ip>:9092 (id: -1 rack: null) disconnected

2025-02-13T12:01:14.020Z WARN kafka-producer-network-thread | producer-79 NetworkClient 2464895 [Producer clientId=producer-79] Connection to node -1 (/<napp-messaging-ip>:9092) could not be established. Broker may not be available.

This is because TCP port 9092 is blocked between the NSX Manager network and the NSX Application Platform's Frontend Network. This port is essential for license synchronization between the NSX Manager and the NAPP cluster. Blocking this port can cause the communication failure.

The solution is to open TCP port 9092 between the NSX Manager network and the NSX Application Platform's Frontend Network. This can be done by modifying firewall rules to ensure that the port is not blocked.