vCenter Server 8.0 crashes intermittently resulting in outages of service
search cancel

vCenter Server 8.0 crashes intermittently resulting in outages of service

book

Article ID: 380921

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

Symptoms:

  • VMware-vpxd service on the vCenter crashes intermittently and generates core.vpxd.worker.##### files in /storage/core/ partition
  • Storage utilization of the /storage/core/ partition increases and may generate alarms in vCenter after the utilization exceeds 70%
  • In the var/log/vmware/vmon/vmon.log, you may find entries similar to:

YYYY-MM-DDTHH:MM:SS.821Z Wa(03) host-XXXX <vpxd> Service exited. Exit code 1
YYYY-MM-DDTHH:MM:SS.821Z Wa(03) host-XXXX <vpxd> Service exited unexpectedly. Crash count 0. Taking configured recovery action.
YYYY-MM-DDTHH:MM:SS.821Z In(05) host-XXXX <vpxd> Restarting service.

  • In the var/log/vMonCoredumper.log, you may find entries similar to:

YYYY-MM-DDTHH:MM:SS.317Z In(05) host-XXXX Notify vMon about vpxd-worker dumping core. Pid : XXXX
YYYY-MM-DDTHH:MM:SS.329Z In(05) host-XXXX Successfully notified vMon.
YYYY-MM-DDTHH:MM:SS.792Z In(05) host-XXXX Successfully generated core file /var/core/core.vpxd-worker.XXXX.

  • In most cases, the vmware-vpxd service will auto restart(outage is minimal)
  • If the failure rate is high, vmware-vpxd service may stay in a stopped state resulting in permanent vCenter outage
  • A reboot of vCenter temporarily resolves the issue
  • At the time of service crash, in the var/log/vmware/vpxd/vpxd-XXX.log you will find entries related to a login attempt:

YYYY-MM-DDTHH:MM:SS info vpxd[2858939] [Originator@6876 sub=vpxLro opID=xxxxxxxx Authz-e2] [VpxLRO] -- BEGIN lro-909100 -- AuthorizationManager -- vim.AuthorizationManager.hasUserPrivilegeOnEntities -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx(xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx)
YYYY-MM-DDTHH:MM:SS info vpxd[2858939] [Originator@6876 sub=UserDirectorySso opID=xxxxxxxx Authz-e2] GetUserInfoInternal(Domain\Username, false) res: Domain\Username
YYYY-MM-DDTHH:MM:SS info vpxd[2858939] [Originator@6876 sub=vpxLro opID=xxxxxxxx Authz-e2] [VpxLRO] -- FINISH lro-909100

YYYY-MM-DDTHH:MM:SS info vpxd[2858710] [Originator@6876 sub=UserDirectorySso opID=Run-Http2ServerSession-41] GetUserInfoInternal(Domain\Username, false) res: Domain\Username
YYYY-MM-DDTHH:MM:SS info vpxd[2858710] [Originator@6876 sub=AuthorizeManager opID=Run-Http2ServerSession-41] [Auth]: User Domain\Username

  • Prior to the above login attempt, we see multiple failed login attempts for the same user in the Journal logs

journalctl -b 0 | grep AlreadyAuthenticatedSessionEvent

Event [43805188] [1-1] [YYYY-MM-DDTHH:MM:SS.416929Z] [vim.event.AlreadyAuthenticatedSessionEvent] [info] [Domain\Username] [] [43805188] [User cannot logon since the user is already logged on]
Event [43805189] [1-1] [YYYY-MM-DDTHH:MM:SS.450867Z] [vim.event.AlreadyAuthenticatedSessionEvent] [info] [Domain\Username] [] [43805189] [User cannot logon since the user is already logged on]
Event [43805190] [1-1] [YYYY-MM-DDTHH:MM:SS.486669Z] [vim.event.AlreadyAuthenticatedSessionEvent] [info] [Domain\Username] [] [43805190] [User cannot logon since the user is already logged on]

  • From the backtrace of the core.vpxd-worker.XXXX, you will see that the service crashed during the login attempt from the same user.

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007f65ffe6b536 in __GI_abort () at abort.c:79
#2  0x00007f6605e653c0 in Vmacore::System::SignalTerminateHandler (info=0x7f65fca33530, ctx=0x7f65fca33400) at bora/vim/lib/vmacore/posix/defSigHandlers.cpp:62
#3  <signal handler called>
#4  0x0000000000000000 in ?? ()
#5  0x00007f6605d564a3 in Vmacore::Authorize::UserCache::UpdateSessionAuthData (this=0x, token=token@entry=0x,
    changedData=changedData@entry=0x) at bora/vim/lib/vmacore/authorize/roles.cpp:1408
#6  0x00007f6605d48d07 in Vmacore::Authorize::AuthorizeManager::UpdateTokenInUserCaches (this=<optimized out>, token=token@entry=0x,
    changedData=changedData@entry=0x) at bora/vim/lib/vmacore/authorize/authorizemgr.cpp:3199
#7  0x00007f6605d57bcd in Vmacore::Authorize::UserData::UpdateAuthTokenHelper (this=this@entry=0x, token=token@entry=0x,
    isTenant=isTenant@entry=true) at bora/vim/lib/public/vmacore/Ref.h:220
#8  0x00007f6605d5c8bf in Vmacore::Authorize::UserCache::Register(Vmacore::Session*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Vmacore::System::AuthTokenHelper*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, bool, std::function<Vmacore::System::DateTime (Vmacore::Authorize::UserData*, Vmacore::System::DateTime const&)>) (this=this@entry=0x, session=<optimized out>, session@entry=0x7f659408a0d0, user="Domain\Username", token=0x,
    delegationChain=std::vector of length 0, capacity 0, fullAccess=fullAccess@entry=false, tenantUser=<optimized out>, refresh=...)
    at bora/vim/lib/vmacore/authorize/roles.cpp:1569

 

Environment

VMware vCenter Server 8.0 U3b
VMware vCenter Server 8.0 U3d
VMware vCenter Server 8.0 U2c

Cause

This issue occurs when there is a login attempt from a same user with incorrect credentials to vCenter for an already authentication session. In this scenario, the vmware-vpxd service crashes due to a dangling Session pointer in session cache management

  • On a failed login attempt, an incomplete cleanup ends up leaving a unreferenced pointer in the session data structure.
  • On subsequent logins, an attempt to update the expiration time for all sessions accesses this dangling pointer which leads to a crash.

Resolution

  • This issue will be fixed in the upcoming vCenter 8.x release 

Workaround:

  • To prevent this issue from occurring, you need to identify the solution which is trying to login to the vCenter for an already authentication session.
  • From the journal logs, identify the ip address of the solution which is attempting logins using the username mentioned in the vpxd logs.
  • In the below example journal log, you can see the ip address of the client where the login attempts are coming from

Event [43805072] [1-1] [YYYY-MM-DDTHH:MM:SS.98434Z] [vim.event.UserLoginSessionEvent] [info] [Domain\Username] [] [43805072] [User Domain\[email protected] logged in as JAX-WS RI 2.3.1 svn-revision#]

  • Update the solution with the correct vCenter credentials to prevent the invalid login attempts

Note: If you are still seeing the invalid login attempts even though the correct vCenter credentials are updated, then there could be a compatibility issue between the solution and the vCenter version. In one instance it was identified that an incompatible adapter in Aria operations was causing this issue.

To temporarily mitigate the crashing issue, you can apply the below workaround until the solution which is causing this issue is identified.

Modify the vpxd configuration file to change session management settings:

  1. SSH to vCenter via root

  2. Edit the following file:

    /etc/vmware-vpx/vpxd.cfg
  3. Locate the <vpxd> section and ensure the following setting is present. If it is not present, manually enter it in.

    <authorize><sessionCanOutliveToken>true</sessionCanOutliveToken></authorize>
  4. Save the changes to the configuration file.

  5. Restart the vCenter service to apply the changes.

This workaround prevents the vCenter from crashing when handling authentication errors.

Note: The above workaround will keep the sessions alive even after the token is expired. This could lead to a security issue and hence use it with caution and only as a temporary workaround