NSX Malware Prevention Service (MPS) Alarms and Resolutions

search cancel

NSX Malware Prevention Service (MPS) Alarms and Resolutions

book

Article ID: 319820

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

This article describes details regarding 5 Alarms that may be raised in the context of the NSX Malware Prevention functionality.

Environment

Product Versions: NSX 4.0.1.1

Resolution

Service Status Down

Symptoms

Following is observed on NSX UI:

1. 1. Alarms dashboard shows "Service Status Down" alarm for "Malware Prevention" feature. Alarm state is open. The Alarm description states:

"Service Malware Prevention is not running on <transport_node_name_or_ip>."

NOTE: "Service Status Down" alarm corresponds to all the Malware Prevention Service services (NSX Security Hub, NSX RAPID) running on Malware Prevention Service SVM and NSX Edge node.

Cause

On NSX Edge

Service Status down alarm is generated when any of the following is true:

1. 1. On NSX Edge, NSX Security Hub service is not running or the corresponding service is not responding to the health probe
  2. On NSX Edge, NSX RAPID container services are not running or are not responding to the health probe
  3. On NSX Edge, NSX Security Monitor service is not running

In order to correctly identify the Service and the corresponding Transport Node that lost connectivity, following should be done.

1. 1. On the alarms dashboard, for each Open "Malware Prevention" "Service Status Down" alarm, note down the "Entity Name" corresponding to the alarm. This is the Edge IP/FQDN of the service that generated the alarm.

On Malware Prevention Service SVM

[ Note :Please also refer to KB NSX Malware Prevention Service VM Fails To Register With NSX to see possible resolutions ]

Service Status down alarm is generated when any of the following is true:

1. 1. On Malware Prevention Service SVM, NSX Security Hub service is not running or is not responding to the health probe
  2. On Malware Prevention Service SVM, NSX RAPID services are not running or are not responding to the health probe
  3. On Malware Prevention Service SVM, NSX Security Monitor service is not running

In order to correctly identify the Service and the corresponding Service instance that lost connectivity with the host module following should be done.

1. 1. On the alarms dashboard, for each Open "Malware Prevention" "Service Status Down" alarm, note down the "Entity Name" corresponding to the alarm. This is the host IP/FQDN of the service that generated the alarm.
  2. On the UI, go to "System"→ "Service Deployments"→ "Service instances"
    1. Select <Malware Prevention Service> from the Partner service drop down.
    2. Search for the host details noted from the alarm description in the service instance host column. The service instance details would be for the Malware Prevention Service SVM on which this alarm was raised.

Impact / Risks

All Malware Prevention Service functionality on the NSX Edge or the service virtual machine (SVM) is provided by NSX Security Hub and NSX RAPID services. Service down would lead to loss of malware prevention for Network Files Extracted on the NSX Edge or for the workload VMs on the impacted ESXi host where such service loss is observed.

Resolution

Service failures could be temporary and may restore on its own. In such case alarm will move to "Resolved" state.

If this doesn't happen in minutes then it is advisable to migrate critical workload VMs to another host in the same cluster where the security service is running and healthy. This will ensure continued protection for the important assets.

File Extraction Service Unreachable

Symptoms

Following is observed on NSX UI

1. 1. Alarms dashboard shows "File Extraction Service Unreachable" alarm for "Malware Prevention" feature. Alarm state is open. The Alarm description states -

"Service Malware Prevention is degraded on <transport_node_name_or_ip>. Unable to communicate with file extraction functionality. All file extraction abilities on the <transport_node_name_or_ip> are paused."

Cause

On NSX Edge

File extraction service unreachable alarm is generated when any of the following is true:

1. 1. On NSX Edge, NSX IDS service is not running or is not connected to the NSX Security Hub service

In order to correctly identify the Service and the corresponding Transport Node that lost connectivity, following should be done.

1. 1. On the alarms dashboard, for each Open "Malware Prevention" "File Extraction Service Unreachable" alarm, note down the "Entity Name" corresponding to the alarm. This is the Edge IP/FQDN of the service that generated the alarm.

On Malware Prevention Service SVM

File extraction service unreachable alarm is generated when any of the following is true:

1. 1. On Malware Prevention Service SVM, the NSX Security Hub service is disconnected with the Guest Event Collector pipeline.

In order to correctly identify the Service and the corresponding Service instance that lost connectivity with the host module following should be done.

1. 1. On the alarms dashboard, for each Open "Malware Prevention" "File Extraction Service Unreachable" alarm, note down the "Entity Name" corresponding to the alarm. This is the host IP/FQDN of the service that generated the alarm.
  2. On the UI, go to "System"→ "Service Deployments"→ "Service instances"
    1. Select <Malware Prevention Service> from the Partner service drop down.
    2. Search for the host details noted from the alarm description in the service instance host column. The service instance details would be for the Malware Prevention Service SVM on which this alarm was raised.

Impact / Risks

File Extraction Service Unreachable would lead to loss of malware prevention for Network Files Extracted on the NSX Edge or for the workload VMs on the impacted ESXi host where such service loss is observed.

Resolution

File Extraction service unreachable could be temporary and may restore on its own. In such case alarm will move to "Resolved" state.

If this doesn't happen in minutes then it it is advisable to migrate critical workload VMs to another host in the same cluster where the file extraction service(s) are reachable. This will ensure continued protection for the important assets.

Database Unreachable

Symptoms

Following is observed on NSX UI

1. 1. Alarms dashboard shows "Database Unreachable" alarm for "Malware Prevention" feature. Alarm state is open. The Alarm description states -

"Service Malware Prevention functionality is degraded on NSX Application Platform. It is unable to communicate with Malware Prevention database."

Cause

On NSX Application Platform

Database unreachable alarm is generated when any of the following is true:

1. 1. On NSX Application Platform, Malware Prevention Service Database pod(s) <postgresql-ha-*> are not running or the corresponding service(s) are not responding to the health probe

The issue is specific to the NSX Application Platform.

Impact / Risks

Database unreachable issue would lead to the loss of following malware prevention functionality.

1. 1. All Malware Prevention events generated after the Malware Prevention Service Database is unreachable shall not be visible in NSX UI / API.
  2. Any NSX Edge / ESX host that restarts after the Malware Prevention Service Database is unreachable shall experience loss of malware prevention capabilities.

Resolution

Database unreachable could be temporary and may restore on its own. In such a case alarm will move to "Resolved" state.

If this doesn't happen in minutes then it it is advisable to collect the NSX Application platform support bundle and raise a support ticket with VMware support team.

Analyst API Service Unreachable

Symptoms

Following is observed on NSX UI

1. 1. Alarms dashboard shows "Analyst API Service Unreachable" alarm for "Malware Prevention" feature. Alarm state is open. The Alarm description states -

"Service Malware Prevention is degraded on NSX Application Platform. It is unable to communicate with analyst_api service. Inspected file verdicts may not be up to date."

Cause

On NSX Application Platform

Analyst API Service Unreachable alarm is generated when any of the following is true:

1. 1. On NSX Application Platform, Malware Prevention Service Scheduler service is not able to access the Analyst API provided by the NSX Defender services.

The issue is specific to the NSX Application Platform.

Impact / Risks

Analyst API Service unreachable issue would lead to the loss of following malware prevention functionality.

1. 1. There is an increased possibility of false positives (e.g., files being marked as Malicious even when they are published by a trusted source).

Resolution

In the short term, customers who wish to mitigate the false positives may leverage the allow-listing capability of Malware Prevention Service functionality. Analyst API Service unreachable could be temporary and may restore on its own. In such a case alarm will move to "Resolved" state.

If this doesn't happen in minutes then it is advisable to collect the NSX Application platform support bundle and raise a support ticket with VMware support team.

NTICS Reputation Service Unreachable

Symptoms
Following is observed on NSX UI

Alarms dashboard shows "NTICS Reputation Service Unreachable" alarm for "Malware Prevention" feature. Alarm state is open. The Alarm description states -
"Service Malware Prevention is degraded on the NSX Application Platform. It is unable to communicate with external NTICS service. Inspected file verdicts may not be up to date."

Cause
On NSX Application Platform
NTICS Reputation Service Unreachable alarm is generated when any of the following is true:

On the NSX Application Platform, Malware Prevention Reputation Service is not able to access the cloud hosted NTICS services.
The issue is specific to the NSX Application Platform.

Impact / Risks
NTICS Service unreachable issue would lead to the loss of following malware prevention functionality.

There is an increased possibility of false positives (e.g., files being marked as Malicious even when they are published by a trusted source).

Resolution
In the short term, customers who wish to mitigate the false positives may leverage the allow-listing capability of Malware Prevention Service functionality. Ensure your NSX proxy settings are configured correctly and outbound network access is allowed to external NTICS service. NTICS service unreachable could be temporary and may restore on its own. In such a case the alarm will move to "Resolved" state.

Feedback

thumb_up Yes

thumb_down No