VMware Secure Token Service (STS) fails with "No healthy upstream" error
search cancel

VMware Secure Token Service (STS) fails with "No healthy upstream" error

book

Article ID: 410218

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

When attempting to access the vCenter Server Management Interface (VAMI) or the vCenter Server Appliance (vCSA) user interface, you may observe the following:

  • The vCenter UI displays the error: "no healthy upstream".
  • The VMware Secure Token Service (STS) is in a Stopped state in the VAMI.
  • Manual attempts to restart the STS service fail, or the service stops shortly after starting.
    • In the vCenter log file /var/log/vmware/vmon/vmon.log, the following error is recorded:
      Wa(03) host-2765 Failed to write pid 1937942 to file. Error: No space left on deviceWa(03) host-2765 Service exited. Exit code 120Er(02) host-2765 Service reached max quick failure count. Give up!!!
    • The STS pre-start script fails as recorded in /var/log/vmware/sso/sts-prestart.log:
      In(05) host-2765 Constructed command: /usr/bin/python /usr/lib/vmidentity/install/sts-prestart-script.pyWa(03) host-2765 Service pre-start command completed successfully.Wa(03) host-2765 Service exited. Exit code 120
  • Checking space usage on VCSA using df -h you see similar output

Filesystem                                   1K-blocks      Used  Available Use% Mounted on
devtmpfs                                          4096         0       4096   0% /dev
tmpfs                                         29819644      2292   29817352   1% /dev/shm
tmpfs                                         11927860      1348   11926512   1% /run
tmpfs                                             4096         0       4096   0% /sys/fs/cgroup
/dev/mapper/vg_root_0-lv_root_0               49222292  19977688   26711844  43% /
/dev/sda3                                       498900     37428     424776   9% /boot
tmpfs                                         29819648      4900   29814748   1% /tmp
/dev/sda2                                        10202      1978       8224  20% /boot/efi
/dev/mapper/vg_lvm_snapshot-lv_lvm_snapshot 1030987928        28  978542924   1% /storage/lvm_snapshot
/dev/mapper/db_vg-db                          51282400   3352864   45292124   7% /storage/db
/dev/mapper/lifecycle_vg-lifecycle           102618040   3961596   93397592   5% /storage/lifecycle
/dev/mapper/dblog_vg-dblog                    25618660   2965552   21326416  13% /storage/dblog
/dev/mapper/vtsdblog_vg-vtsdblog              25618660     32804   24259164   1% /storage/vtsdblog
/dev/mapper/log_vg-log                        25618660  22443040    1848928  93% /storage/log    
/dev/mapper/netdump_vg-netdump                10210580        24    9670296   1% /storage/netdump
/dev/mapper/autodeploy_vg-autodeploy          25618660        40   24291928   1% /storage/autodeploy
/dev/mapper/archive_vg-archive               205305832 184772116   10031984  95% /storage/archive
/dev/mapper/core_vg-core                     102618040  68137136   29222052  70% /storage/core
/dev/mapper/imagebuilder_vg-imagebuilder      25618660        36   24291932   1% /storage/imagebuilder
/dev/mapper/updatemgr_vg-updatemgr           102618040   7156960   90202228   8% /storage/updatemgr
/dev/mapper/seat_vg-seat                    1474794768  12839344 1386966268   1% /storage/seat
/dev/mapper/vtsdb_vg-vtsdb                  1474794768     45328 1399760284   1% /storage/vtsdb

 

 



Environment

VMware vCenter Server 8.x

Cause

This issue occurs when the /storage/log partitions on the vCenter Server Appliance reach 100% capacity (or near capacity). When these partitions are full, the vCenter Service Manager (vmon) cannot write the Process ID (PID) files necessary to track and manage service states, causing the services to exit with Code 120.

Resolution

To resolve this issue, you must identify and clear space on the affected partitions.
Note: Always take an offline snapshot of the vCenter Server before performing disk cleanup or configuration changes.

  1. Identify the Full Partition:

    • Log in to the vCenter Server Appliance via SSH as the root user.
    • Run the following command to check disk space:
      df -h
    • Locate the partitions for /storage/log . If the Use% is 95% or higher, proceed to the next step.
  2. Clear Disk Space:

  3. Check for Known Log Growth Issues (vCenter 8.0 Update 3):

  4. Restart vCenter Services:

    • Once space has been freed, restart all services to ensure they can write their PID files correctly:
      service-control --stop --all && service-control --start --all
  5. Verify Service Status:

    • Confirm all critical services are running:
      service-control --status --all
    • Verify that the vCenter UI is now accessible and the "No healthy upstream" error is resolved.