vCenter /storage/seat partition fills up due to no space on Supervisor Control Plane VM
search cancel

vCenter /storage/seat partition fills up due to no space on Supervisor Control Plane VM

book

Article ID: 420418

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere Kubernetes Service VMware Avi Load Balancer VMware Tanzu Application Platform

Issue/Introduction

  • The vCenter /storage/seat partition is getting filled at a faster rate

  • Increasing the vCenter SEAT partition does not help as its getting filled

  • vxpd service goes down

  • AVI Load Balancer logins are being called excessively in a short span of time

  • Supervisor Control Plane VM partition is full

  • AVI controller logs located at /var/lib/avi/log/cc* | zgrep "Processing Key: VirtualMachine" shows lots of VM updates per minute

  • AVI controllers Cloud agent log located at /var/lib/avi/log/cc_agent_go_Default-Cloud* | xargs -I {} sh -c 'echo {};zgrep "Processing Key: VirtualMachine" "{}" | awk "{print substr(\$0, index(\$0, \"YYYY-MM-DDTHH:MM:SS\"), 19)}" | sort | uniq -c | awk "\$1 > 200"' shows lots of VM updates

  • In VC, /var/log/vmware/vpxd/vpxd-profiler-**.log:--> /MoRegistryStats/Class='11DatastoreMo'/ERProviderMixin/Overflows/total ##### shows high number for ERProviderMixin/

  • In VC, /var/log/vmware/vpxd/vpxd-profiler-**.log--> /ActivationStats/Task/Actv='vim.AuthorizationManager.setEntityPermissions'/TotalTime/numSamples ###### shows high number of setEntityPermissions

  • In VC /var/log/vmware/vpxd/vpxd.log shows
    YYYY-MM-DDTHH:MM:SS.608Z info vpxd[2487065] [Originator@6876 sub=vpxLro opID=wcp-699e98e7-2cdc05d1-801a-47de-8b4f-af387dd42b83-65] [VpxLRO] -- BEGIN lro-489843407 -- AuthorizationManager -- vim.AuthorizationManager.setEntityPermissions -- 52105ecb-cbfd-0bae-0676-3991b0c6630f(52259e96-7206-4596-4510-4002fff8d71b)

  • In VC, wcpsvc.log located at /var/log/vmware/wcp shows
    wcpsvc-YYYY-MM-DDTHH:MM:SS.570.log:2YYYY-MM-DDTHH:MM:SS. debug wcp [vclib/authz.go:54] [opID=699e9708] Successfully set permissions [{{} <nil> wcp-storage-user-############@vsphere.local false 1090 true}] on entity Datastore:datastore-######
    wcpsvc-YYYY-MM-DDTHH:MM:SS.967Z debug wcp [nodechecker/node_check.go:70] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] wcp-sv-state-checker log output when running checks '[ConnectToLoadBalancer]' on VM VirtualMachine:vm-####. stdout: {"ConnectToLoadBalancer":{"id":"ConnectToLoadBalancer","status":"SetupFailure","time_completed":"YYYY-MM-DDTHH:MM:SS.563961415Z","conditions":[{"type":"SetupFailure","message":{"severity":"ERROR","details":{"id":"vcenter.wcp.node_state_check.setup_failure","default_message":"An internal error on the control plane VM (420137a0706d0fda0b17bc4e903996ac) prevented the check ConnectToLoadBalancer from completing successfully. Error: Unable to fetch valid load balancer configs. Err: Unauthorized.","args":["420137a0706d0fda0b17bc4e903996ac","ConnectToLoadBalancer","Unable to fetch valid load balancer configs. Err: Unauthorized"]}}}],"description":{"id":"wcp.healthcheck.connect_to_loadbalancer.description","default_message":"Checks to see if the Control Plane VM is able to connect to any configured load balancers. This check can only be run in a VDS environment, after the Kubernetes API Server is up.","args":null}}}, stderr: time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=info msg="Running checks: [ConnectToLoadBalancer]"
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=debug msg="Parsed node configuration: &{HostName:######### VCenterPNID:<fqdn>:443 ManagementNetwork:{DNSFromDHCP:false DNSServers:[###.##.##.#3] DNSSearchDomains:[<name>]} WorkloadNetwork:{IPAddress:##.##.##.## DNSServers:[###.##.##.##]} KubernetesConfig:{CertificateAuthority:0xc######### InitialAPIServer:https://##.##.##.##:6443} NSXManagerConfig:[] NetworkProvider:1 LoadBalancerProvider:HA_PROXY}"
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=info msg="Check [ConnectToLoadBalancer] running"
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=info msg="Attempting to connect to the Kubernetes Server, using configuration file path: '/etc/kubernetes/admin.conf'" check=ConnectToLoadBalancer
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=info msg="Fetching all loadBalancerConfigs..." check=ConnectToLoadBalancer
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=error msg="Failed to fetch haProxy list. Err: Unauthorized" check=ConnectToLoadBalancer
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=error msg="Failed to fetch load balancer config info. Err: Unable to fetch valid load balancer configs. Err: Unauthorized" check=ConnectToLoadBalancer
    time="wcpsvc-YYYY-MM-DDTHH:MM:SS" level=info msg="Check [ConnectToLoadBalancer] completed. Result: SetupFailure"
    wcpsvc-YYYY-MM-DDTHH:MM:SS debug wcp [nodechecker/node_check.go:95] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Check 'ConnectToLoadBalancer' was unsuccessful on node VirtualMachine:vm-####. Status: SetupFailure
    wcpsvc-YYYY-MM-DDTHH:MM:SS error wcp [kubelifecycle/load_balancer.go:68] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Unable to verify load balancer connection from nodes. node checks on control plane VM VirtualMachine:vm-#### failed for indeterminate reasons
    wcpsvc-YYYY-MM-DDTHH:MM:SS info wcp [kubelifecycle/load_balancer.go:69] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Reconcile load balancer exited
    wcpsvc-YYYY-MM-DDTHH:MM:SS debug wcp [kubelifecycle/controller.go:506] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Supervisor configuration retry.

  • In VC, wcpsvc.log located at /var/log/vmware/wcp/ shows "No space left on device"
    YYYY-MM-DDTHH:MM:SS error wcp [vclib/guestop.go:338] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Kubenode guest command failed. RC: 1, Out: , Err: INFO:__main__:Loaded 1 key from /dev/shm/secret
    Traceback (most recent call last):
    ........
    encrypted.write(encrypt(plain_bytes, key))
    OSError: [Errno 28] No space left on device
    YYYY-MM-DDTHH:MM:SS error wcp [kubelifecycle/master_node.go:704] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Failed to encrypt desired node configuration. Err Guest operation failed for the Master node VM with identifier vm-####., stdout: , stderr: INFO:__main__:Loaded 1 key from /dev/shm/secret
    Traceback (most recent call last):
    File "/usr/lib/vmware-wcp/hypercrypt.py", line 292, in <module>
    .........
    encrypted.write(encrypt(plain_bytes, key))
    OSError: [Errno 28] No space left on device
    YYYY-MM-DDTHH:MM:SS error wcp [kubelifecycle/controller.go:2062] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Failed to update desired config of MasterNode VirtualMachine:vm-####. Err: Guest operation failed for the Master node VM with identifier vm-####.
    YYYY-MM-DDTHH:MM:SS error wcp [kubelifecycle/controller.go:2231] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Error configuring API server on cluster 2cdc05d1-801a-47de-8b4f-af387dd42b83 Guest operation failed for the Master node VM with identifier vm-####.
    YYYY-MM-DDTHH:MM:SS warning wcp [kubelifecycle/controller.go:1014] [opID=699eb397-2cdc05d1-801a-47de-8b4f-af387dd42b83] Unable to configure agent in cluster domain-c8. Err Guest operation failed for the Master node VM with identifier vm-####.

  •  vm-#### is the VM ID of a Supervisor Control Plane VM and upon checking shows its partition is full

    <user>@NodeID [ ~ ]# df -h /dev/root
    Filesystem  Size Used Avail Use%  Mounted on
    /dev/root   32G  32G   0    100%   /

Environment

  • vCenter 8.x
  • vSphere Kubernetes Service
  • NSX Application Platform(NAPP)
  • VMware AVI Load Balancer

Cause

Supervisor Control Plane VM has no space left on device

Resolution

To resolve this issue, follow the steps from KB : vSphere Supervisor Disk Space Clean Up Scripts

To workaround this issue of  vCenter SEAT partition getting full, Configure Event Retention in vCenter for 7 days using below steps.

1. Open the vSphere Client and log in to vCenter Server.

2. Navigate to Administration → vCenter Server Settings

Go to the Database Retention Policy section.

Below options are available
Task retention
Event retention

3. Set the desired number of days.
Event retention: 7 days

4. Click OK or Save to apply the changes.

Additional Information

NOTE: Running du -sh /* and re running the command on the directory consuming the highest disk shows the actual consumer

Increasing the disk space for the vCenter Server Appliance in vSphere 6.5, 6.7, 7.0 and 8.0