NSX Malware Prevention - Communication Error (Error Code: 502)

Products

VMware vDefend Firewall

Issue/Introduction

Malware Prevention notification with error code: 502
Error message: "Error in communication between MPS components (Security Hub and RAPID) on the transport node.Please check the status of these components on the transport node."

Malware Prevention sending files for proper classification.

Cause

This document outlines the troubleshooting steps for the reported error code 502 in NSX Malware Prevention, indicating communication issues between Security Hub and RAPID components.

Resolution

Verify Service Status on SVM:

Login to the Security Virtual Machine (SVM).
Check the status of Security Hub and RAPID services:
/etc/init.d/nsx-sh status
/etc/init.d/nsx-lastline-rapid status

Security Hub should start RAPID automatically. If either service is down, restart it and capture any errors or warnings.

Additional Information

Troubleshooting Steps: (all the commands can be executed as root user. )

1. Verify Service Status on SVM:

Login to the Security Virtual Machine (SVM).
Check the status of Security Hub and RAPID services:
/etc/init.d/nsx-sh status
/etc/init.d/nsx-lastline-rapid status

Security Hub should start RAPID automatically. If either service is down, restart it and capture any errors or warnings.

2. Verify Docker Container Status on SVMs.

Run 'docker ps' to view the status of Docker containers on the SVM.
Ensure all containers are running. Restarting containers might indicate ingress connectivity issues.
If any container has failed, run

docker inspect--format "{(json.State))" <container name>" (eg: "docker inspect--format "((json.State )}" nsx-lastline-rapid_analyst-sdk-malscape-completion_1")

3. Check for Known Issue (NSX versions 3.2.3, 4.1.0, 4.1.1):

Review the Broadcom Knowledge Base article (KB# https://knowledge.broadcom.com/external/article?legacyld=96444) for a known issue where DNS settings might fail on SVMs deployed using an IP pool.

4. Enable Debug Logs:
For Sucurity-hub and Lastline-Rapid-service

Step 1- Update the log level in /etc/vmware/nsx-sh/sh_config.json (replace 4 with 5)
Step 2- On SVM find the process id of nsx-sh binary using the command: $ps-eaf | grep nsx-sh (#Note the p_id of this process: /opt/vmware/nsx-sh/nsx-sh)
Step 3 Run the following command to send a SIGHUP signal to the process: $ kill -s SIGHUP <Noted p_id of nsx-sh binary>

Step1 Configure log level to RAPID's override.conf
$echo LOG_VERBOSITY_PYTHON=debug>>/config/vmware/nsx-lastline-rapid/conf.d/override.conf

Step 2- Restart RAPID

5. Manual Intervention (Root Required):

Restart the resolvconf service on the SVM: /sbin/service resolvconf restart

Execute /sbin/resolvconf -u to update DNS settings.

6. Collect Support Bundle:
for svm admin access- https://docs.vmware.com/en/VMware-NSX/4.2/administration/GUID-E2DAD6E5-0984-41FB-BF6A-9BD8C288683B.html

On the SVM, generate a support bundle: SVM> get support-bundle file <filename.tgz>
Transfer the bundle to a remote machine using SVM> copy file svm-support-bundle.tgz url scp://admin@<remote_ip>/<destination_directory>.

7. Next Steps:

If the issue persists after following these steps, collect all the information gathered during troubleshooting (service status, container logs, debug logs, and support bundle) and open a service request.

all the commands can be executed as root user.