vDefend SSP Alarm:Security Services Platform cluster CPU usage is high or very high
search cancel

vDefend SSP Alarm:Security Services Platform cluster CPU usage is high or very high

book

Article ID: 384123

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

You are running SSP 5.0 or later and have encountered an alarm with the description:
"The CPU usage of Security Services Platform cluster {{ .ResourceID }} is currently {{ .Value }}%, which exceeds the threshold value."

This indicates that one or more nodes in the SSP cluster are experiencing high or very high CPU usage, potentially impacting cluster performance and workload stability.

Environment

vDefend SSP Version: 5.0 and later

Cause

High Load Applications:

  • Certain services or applications are running workloads that require more CPU than allocated.
  • Pods may be unevenly distributed across nodes, causing CPU pressure on certain nodes.

Resolution

Analyze Cluster Load

Cluster level CPU usage is aggregated value of all node CPU  usage

  • Log into SSPI root shell. The following commands can help to figure out which pods are consuming high resources.

    k get nodes
    k describe node <worker node name> -> will show all the pods in this node
    k top pods -n nsxi-platform --sort-by=cpu

  • On the SSPI UI, Navigate to Lifecycle Management → Instance management → Edit Deployment Size to increase the nodes by 1.
  • Another option would be to scaleout.  

On the Security Services Platform UI:

Navigate to System > Platform & Features > Core Services

If the CPU intensive applications are any of the following, scale out corresponding category. (From 'k top pods -n nsxi-platform --sort-by=cpu' we can know if the CPU intensive applications are part of the list)

rawflowcorrelator, overflowcorrelator / druid-middle-manager, druid-broker/ latestflow   - Analytics
kafka-controller, kafka-broker - Messaging
minio - Data Storage 
metrics-manager, metrics-app-server,metrics-query-server - Metrics (Refer to KB: 384109 for metrics specific CPU spike issues)