vDefend SSP Alarm: Metrics service CPU usage is high or very high
search cancel

vDefend SSP Alarm: Metrics service CPU usage is high or very high

book

Article ID: 384109

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Alarm with the below description 

"The CPU usage of Metrics service {service-name }/{ pod } is currently { value }%, which is above the threshold value."

Example:

The CPU usage of Metrics service metrics-manager/metrics-manager-59959c48b7-zgjx7 is currently 95%, which is above the threshold value.

The CPU usage of Metrics service metrics-postgresql-ha-postgresql/metrics-postgresql-ha-postgresql-0 is currently 93%, which is above the threshold value.

  • The {service-name} will be any of the following
    1. metrics-manager
    2. metrics-query-server
    3. metrics-app-server
    4. metrics-db-helper
    5. metrics-postgresql-ha-postgresql
    6. metrics-postgresql-ha-pgpool
  • The { pod } is dynamic and will be prefixed by the service name i.e. metrics-manager-xxxxxxxxxx-xxxxx
  • The { value } will be dynamic and will represent the current CPU usage of the { pod }

    If the alarm stays open for more than 30 minutes or if its occurring multiple times, please proceed to the Resolution section

Environment

vDefend SSP >= 5.0

Cause

The current CPU limits for {service-name} is not enough to process the current load.

Resolution

Maintenance window required for remediation?
No
Try re-starting the deployment/statefulset. This should take care of any transient issues
    • Log into SSPI root shell
    • If {service-name} is metrics-postgresql-ha-postgresql
      • Run 'k rollout restart statefulset metrics-postgresql-ha-postgresql -n nsxi-platform'

                       else

      • Run 'k rollout restart deployment {service-name} -n nsxi-platform'
                           Wait for ~20 minutes and check if the alarm is auto-resolved 

If the alarm persists, follow the steps below
if {service-name} is metrics-manager , metrics-query-server or metrics-app-server follow the steps below to scale-out the service
    •   On the Security Services Platform UI
      1. On the Security Services Platform UI
      2. Navigate to System | Platform & Features | Core Services
      3. On the Metrics tile click Scale Out
    • Log into SSPI root shell
    • Run 'k rollout restart deployment  {service-name} -n nsxi-platform'
    • Run 'k rollout restart deployment metrics-manager -n nsxi-platform'
For any other service, please open a ticket with Broadcom Support.