The following is observed in the NSX UI:
"Partner Channel Down" for feature "Endpoint Protection"
NOTE: The "Partner Channel Down" alarm can be seen for both agentless Antivirus service provided by NSX Guest Introspection agentless AV partners such as Trend Micro, McAfee, BitDefender etc. or can also be seen for "VMware NSX Malware Prevention Service"
If the alarm is generated by a partner service, it will be visible for relevant service instance(s) listed under <Partner EPP Service> selection from the Partner Service dropdown. If it is generated by the "VMware Malware Prevention Service" then it will be visible for relevant service instance(s) listed under "MPS Service" selection from the Partner Service dropdown.
Third party agentless AV solutions and VMware NSX Malware Prevention Service use the NSX Guest Introspection platform for delivering security to workload VMs. Guest Introspection has multiple components that connect workload VMs to the security service deployed in the form of an SVM on each ESXi host. Workload VMs connect to the SVM using the Guest Introspection host module.
A Partner Channel Down alarm is generated when any of the following is true:
Follow these steps to correctly identify the Service and the corresponding Service instance that lost connectivity with the host module:
1. On the alarms dashboard, for each Open "Endpoint Protection" "Partner Channel Down" alarm, note down the "Entity Name" corresponding to the alarm. This is the service instance id of the service that generated the alarm.
2. On the UI, go to "System"→ "Service Deployments"→ "Service instances"
a. Select <Partner EPP Service> from the Partner service dropdown.
b. Search for the string noted from the alarm description. If one of the service instances generated this alarm you should be able to locate it on this page using the copied service instance id.
c. Active alarms will be non zero against the service instance.
3. Step 2 above should be repeated by selecting "MPS Service" from Partner Service dropdown, for any service instance id that is not found under EPP Service.
4. If you do not find the [Entity Name] field corresponding to the alarm in the Service instances , you are hitting a BUG
This issue has been fixed in 3.2.4 ad above and 4.1.2 and above. Currently, you can resolve this stale alarm by trying this workaround:
(a) On the two problematic host, login and stop the opsagent. (/etc/init.d/nsx-opsagent stop)
(b) On the NSX UI, resolve the alarm.
(c) Start the opsagent on the host. (/etc/init.d/nsx-opsagent start)
Loss of connectivity could be temporary and may restore on its own. In such case alarm will move to "Resolved" state.
If this doesn't happen in minutes, it is advisable to migrate critical workload VMs to another host in the same cluster where the security service is running and healthy. This will ensure continued protection for important assets.
On the host that generated the alarm, check the /var/log/syslog.log file for messages similar to:
ContextMux[XXXX]: [WARNING] (EPSEC) [XXXX] :CheckConnected():253: SolutionHandler[XXXX] failed to connect to solution[<SOLUTION_ID>] at [<SOLUTION_IP:SOLUTION_PORT>]: Connection refused (111) ....
ContextMux[XXXX]: [WARNING] (EPSEC) [XXXX] :Reconnect():359: SolutionHandler[XXXX] scheduling reconnect to solution[<SOLUTION_ID>] at <SOLUTION_IP> .....
This indicates that there is some problem in the third party (partner) security solution (SVM) or the VMware NSX Malware Prevention Service (SVM) on the host. Depending on the service instance which generated the alarm, open a Service Request with VMware or the third party security solution provider.
For VMware Malware Prevention Service or NSX Guest Introspection host module issues, open a Service Request with VMware, and attach NSX and ESXi log bundles.
For partner services, contact the agentless AV security provider to gather appliance specific logs and open a Service Request with the partner.