Partner Channel Down Alarm (NSX 3.2.0.0)

Products

VMware NSX VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Symptoms:

The following is observed in the NSX UI:

1. Alarms dashboard shows "Partner Channel Down" alarm for "Endpoint Protection" feature. Alarm state is open.

2. When you go to "System"→"Service Deployments"→"Service Instances" and select either <Partner EPP Service> or "MPS Service" from the Partner Service dropdown, we see a page which shows an active alarm against a listed service instance. Clicking on this alarm shows the alarm details as

"Partner Channel Down" for feature "Endpoint Protection"

NOTE: The "Partner Channel Down" alarm can be seen for both agentless Antivirus service provided by NSX Guest Introspection agentless AV partners such as Trend Micro, McAfee, BitDefender etc. or can also be seen for "VMware NSX Malware Prevention Service"

If the alarm is generated by a partner service, it will be visible for relevant service instance(s) listed under <Partner EPP Service> selection from the Partner Service dropdown. If it is generated by the "VMware Malware Prevention Service" then it will be visible for relevant service instance(s) listed under "MPS Service" selection from the Partner Service dropdown.

Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center
VMware NSX-T

Cause

Third party agentless AV solutions and VMware NSX Malware Prevention Service use the NSX Guest Introspection platform for delivering security to workload VMs. Guest Introspection has multiple components that connect workload VMs to the security service deployed in the form of an SVM on each ESXi host. Workload VMs connect to the SVM using the Guest Introspection host module.

A Partner Channel Down alarm is generated when any of the following is true:

1. NSX Guest Introspection host module loses connectivity with a third party (partner) security solution (SVM) on an ESXi host

2. NSX Guest Introspection host module loses connectivity with the VMware NSX Malware Prevention (SVM) on an ESXi host.

3. NSX Guest Introspection host module is not operational

Resolution

Follow these steps to correctly identify the Service and the corresponding Service instance that lost connectivity with the host module:

1. On the alarms dashboard, for each Open "Endpoint Protection" "Partner Channel Down" alarm, note down the "Entity Name" corresponding to the alarm. This is the service instance id of the service that generated the alarm.

2. On the UI, go to "System"→ "Service Deployments"→ "Service instances"

a. Select <Partner EPP Service> from the Partner service dropdown.

b. Search for the string noted from the alarm description. If one of the service instances generated this alarm you should be able to locate it on this page using the copied service instance id.

c. Active alarms will be non zero against the service instance.

3. Step 2 above should be repeated by selecting "MPS Service" from Partner Service dropdown, for any service instance id that is not found under EPP Service.

4. If you do not find the [Entity Name] field corresponding to the alarm in the Service instances , you are hitting a BUG

This issue has been fixed in 3.2.4 ad above and 4.1.2 and above. Currently, you can resolve this stale alarm by trying this workaround:

(a) On the two problematic host, login and stop the opsagent. (/etc/init.d/nsx-opsagent stop)
(b) On the NSX UI, resolve the alarm.
(c) Start the opsagent on the host. (/etc/init.d/nsx-opsagent start)

Loss of connectivity could be temporary and may restore on its own. In such case alarm will move to "Resolved" state.

If this doesn't happen in minutes, it is advisable to migrate critical workload VMs to another host in the same cluster where the security service is running and healthy. This will ensure continued protection for important assets.

On the host that generated the alarm, check the /var/log/syslog.log file for messages similar to:

ContextMux[XXXX]: [WARNING] (EPSEC) [XXXX] :CheckConnected():253: SolutionHandler[XXXX] failed to connect to solution[<SOLUTION_ID>] at [<SOLUTION_IP:SOLUTION_PORT>]: Connection refused (111) ....
ContextMux[XXXX]: [WARNING] (EPSEC) [XXXX] :Reconnect():359: SolutionHandler[XXXX] scheduling reconnect to solution[<SOLUTION_ID>] at <SOLUTION_IP> .....

This indicates that there is some problem in the third party (partner) security solution (SVM) or the VMware NSX Malware Prevention Service (SVM) on the host. Depending on the service instance which generated the alarm, open a Service Request with VMware or the third party security solution provider.

Check the status of the NSX Guest Introspection host module on the ESXi host that generated the alarm:

/etc/init.d/nsx-context-mux status

If the service is not running, try to start the service:

/etc/init.d/nsx-context-mux start

If there is a problem in starting the service contact VMware support.

For VMware Malware Prevention Service or NSX Guest Introspection host module issues, open a Service Request with VMware, and attach NSX and ESXi log bundles.

For partner services, contact the agentless AV security provider to gather appliance specific logs and open a Service Request with the partner.

Additional Information

Impact/Risks:
All file activity on Workload VMs is intercepted and analysed for security over the NSX Guest Introspection pipeline via host module to the security service (SVM). Loss of connectivity between the host module and the SVM leads to break in this pipeline. This can lead to loss of protection from malware for the workload VMs on the impacted ESXi host where such connectivity loss is observed.