You are running SSP 5.0 and later and are seeing an alarm with the description:
"The CPU usage of Security Services Platform node {{ .ResourceID }} is currently {{ .Value }}%, which is above the threshold value."
This alarm indicates that one or more worker nodes in your Security Services Platform (SSP) cluster are experiencing high CPU usage, which may impact platform performance or service availability.
vDefend SSP >= 5.0
High CPU usage on worker nodes typically occurs when the current CPU limits and resources are insufficient to handle the workload.
Log into SSPI root shell. The following commands can help to figure out which pods are consuming high resources.
k get nodes
k top nodes
k describe node <worker node name>
->
will show all the pods in this nodek top pods -n nsxi-platform --sort-by=cpu
k -n nsxi-platform get pod <pod-name> -o jsonpath='{.metadata.ownerReferences[0].kind}'
If the output is StatefulSet
, follow the StatefulSet
restart steps.
If the output is ReplicaSet
, it belongs to a Deployment
If {.ResourceID }
is stateful set, run:
k -n nsxi-platform rollout restart statefulset {.ResourceID }
Otherwise, run:
k -n nsxi-platform rollout restart deployment {.ResourceID }
Wait for ~20 minutes and check if the alarm is auto-resolved. (k -n nsxi-platform get pods
to check restarted pod are up)
this should reschedule the pods to a newer node, and hence bring down CPU usage in the affected node.
On the Security Services Platform UI:
Navigate to System > Platform & Features > Core Services
If the CPU intensive applications are any of the following, scale out corresponding category.
rawflowcorrelator, overflowcorrelator / druid-middle-manager, druid-broker/ latestflow
- Analyticskafka-controller, kafka-broker
- Messagingminio
- Data Storage
metrics-manager, metrics-app-server,metrics-query-server
- Metrics (Refer KB: 384109 for metrics specific CPU spike issues)
For further troubleshooting if node is degraded / down.
https://techdocs.broadcom.com/us/en/vmware-security-load-balancing/vdefend/security-services-platform/5-0/security-services-platform-installer/troubleshooting-sspi/troubleshooting-workload-cluster.html