vDefend SSP Alarm: Platform Services Disk usage is high or very high
search cancel

vDefend SSP Alarm: Platform Services Disk usage is high or very high

book

Article ID: 384119

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

You are running SSP 5.0 or later and have encountered an alarm with the description:
"The disk usage of Security Services Platform service {{ .ResourceID }}/{{ .ObjID }} is currently {{ .Value }}%, which exceeds the threshold value."

This indicates that one or more services under Platform Services are consuming excessive disk space, potentially leading to degraded performance or system instability.

Environment

vDefend SSP Version: 5.0 and later

Cause

  • Data Growth: Increased data storage requirements as workload spikes.
  • Insufficient Disk Resources: Current disk size is inadequate for the workload and storage requirements.

Resolution

  • From SSPI, restart the pod (deployment/statefulset) from the alarm description {{ .ResourceID }}:

    k -n nsxi-platform get pod <pod-name> -o jsonpath='{.metadata.ownerReferences[0].kind}'

    If the output is StatefulSet, follow the StatefulSet restart steps.
    If the output is ReplicaSet, it belongs to a Deployment 

    If {{ .ResourceID }} is stateful set, run:

    k -n nsxi-platform rollout restart statefulset {{ service-name }} (note: service-name is taken from pod-name without hash)

    Otherwise, run:

    k -n nsxi-platform rollout restart deployment {{ service-name  }}

    Wait for ~10 minutes and check if pods are up and running. (k -n nsxi-platform get pods to check restarted pod are up)

    Note: When pod terminates, it causes a temporary unavailability of any services it provides until a new pod is scheduled and becomes ready.

  • If disk usage is still high, and then repeat the above step for additional pods if needed.
  • If the above steps do not help, on the SSPI UI, Navigate to Lifecycle Management → Instance management . Select Edit Deployment Size and increment the worker node count by +1.

    • Once scale is done, Validate that new nodes are operational

    k get nodes

    Expected output : 
    NAME           STATUS   ROLES    AGE     VERSION
    node-1         Ready    <role>   xxm     v1.xx.x
    node-2         Ready    <role>   xxm     v1.xx.x
    new-node       Ready    <role>   xxm     v1.xx.x  # Ensure the new node is Ready

  • Verify Pod Distribution: Check if pods are distributed across nodes using  'k -n nsxi-platform get pods -o wide', and repeat first two steps to get services across the new node. 
  • If issue persists open a ticket with Broadcom.

Additional Information

  • If minio disk usage is high, we can scale up in SSP UI,  System > Platform and Features > Core Services > Data Storage , click Actions → Manage DataStorage, increase the storage by 10% of existing. This would resize and remediate the issue at that point, but since the data might keep increasing in minio PVC, open a ticket with Broadcom to debug issue further.