[vCenter] vmware-sps service does not start
search cancel

[vCenter] vmware-sps service does not start

book

Article ID: 323292

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Assist with starting SPS service


Symptoms:
  • Customer can't create VM's on vSAN Datastores
  • vmware-sps service is stopped and does not start manually

 

  • In the /var/log/vmware/vmware-sps/sps.log:

[main] WARN opId=sps-Main-xxxxxx-xx com.vmware.vim.storage.common.serviceclient.vpxd.impl.VpxdClientImpl - loginByToken request timedout, cancelling the task scheduled XX.XX.XX.XX [main] ERROR opId=sps-Main-xxxxxx-xx com.vmware.vim.storage.common.task.retry.CallableRetryDecorator - Caught exception - com.vmware.vim.storage.common.serviceclient.vpxd.VpxdException: Error while doing login to VPXD service

 

  • Command "service-control --start vmware-sps" fails after some time.

 

  • sps.log shows:

Caused by: java.util.concurrent.TimeoutException  at com.vmware.vim.vmomi.core.impl.BlockingFuture.get(BlockingFuture.java:102) ~[vlsi-core.jar:?]   at com.vmware.vim.storage.common.serviceclient.vpxd.impl.VpxdClientImpl.loginByToken(VpxdClientImpl.java:169) ~[storage-commons-1.0.jar:?]  ... 208 more

[main] INFO opId=sps-Main-xxxxxx-xxx com.vmware.vim.storage.common.util.OperationIdUtil - OperationID present in invoker thread, adding suffix and re-using it - sps-Main-xxxxxx-xxx-xxxxxx-xxx.

 [main] INFO opId=sps-Main-xxxxxx-xxx com.vmware.vim.storage.common.util.OperationIdUtil - OperationID present in invoker thread, adding suffix and re-using it - sps-Main-xxxxxx-xxx-xxxxxx-xxx.

 [main] INFO opId=sps-Main-xxxxxx-xxx com.vmware.vim.storage.common.util.OperationIdUtil - OperationID present in invoker thread, adding suffix and re-using it - sps-Main-xxxxxx-xxx-xxxxxx-xxx.

[pool-3-thread-11] INFO opId=sps-Main-xxxxxx-xxx com.vmware.vim.storage.common.task.CustomThreadPoolExecutor - [VLSI-client] Active thread count is: 11, Core Pool size is: 20, Queue size: 0, Time spent waiting in queue: 1 millis

[jaeger.RemoteReporter-QueueProcessor] WARN opId=sps-Main-xxxxxx-xxx io.jaegertracing.internal.reporters

.RemoteReporter - FlushCommand execution failed! Repeated errors of this command will not be logged.

io.jaegertracing.internal.exceptions.SenderException: Failed to flush spans.

       at io.jaegertracing.thrift.internal.senders.ThriftSender.flush(ThriftSender.java:116) ~[jaeger-thrift-1.8.0.jar:1.8.0]   at io.jaegertracing.internal.reporters.RemoteReporter$FlushCommand.execute(RemoteReporter.java:158) ~[jaeger-core-1.8.0.jar:1.8.0]  at io.jaegertracing.internal.reporters.RemoteReporter$QueueProcessor.run(RemoteReporter.java:179) [jaeger-core-1.8.0.jar:1.8.0] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]

Caused by: io.jaegertracing.internal.exceptions.SenderException: Could not send 2 spans  at io.jaegertracing.thrift.internal.senders.UdpSender.send(UdpSender.java:86) ~[jaeger-thrift-1.8.0.jar:1.8.0]  at io.jaegertracing.thrift.internal.senders.ThriftSender.flush(ThriftSender.java:114) ~[jaeger-thrift-1.8.0.jar:1.8.0]  ... 3 more
Caused by
...org.apache.thrift.transport.TTransportException: Cannot flush closed transport at io.jaegertracing.thrift.internal.reporters.protocols.ThriftUdpTransport.flush(ThriftUdpTransport.java:151) ~[jaeger-thrift-1.8.0.jar:1.8.0]  at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) ~[libthrift-0.14.1.jar:0.14.1] at org.apache.thrift.TServiceClient.sendBaseOneway(TServiceClient.java:66) ~[libthrift-0.14.1.jar:0.14.1]  at io.jaegertracing.agent.thrift.Agent$Client.send_emitBatch(Agent.java:70) ~[jaeger-thrift-1.8.0.jar:1.8.0]   at io.jaegertracing.thrift.internal.senders.ThriftSender.flush(ThriftSender.java:114) ~[jaeger-thrift-.8.0.jar:1.8.0] ... 3 more
 

  • In [ /var/log/vmware/vpxd ]# you see:
    
    grep -i "sps-Main-240307-157-240309-910" vpxd-126.log 
    info vpxd[05694] [Originator@6876 sub=vpxLro opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [VpxLRO] -- BEGIN lro-5678 -- ServiceInstance -- vim.ServiceInstance.retrieveContent -- ########-####-####-####-########8814
    info vpxd[05694] [Originator@6876 sub=vpxLro opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [VpxLRO] -- FINISH lro-5678
    info vpxd[05686] [Originator@6876 sub=vpxLro opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [VpxLRO] -- BEGIN lro-5679 -- ServiceInstance -- vim.ServiceInstance.retrieveInternalContent -- xxxxxx-xxx-xxxxxx-xxx-xx
    info vpxd[05686] [Originator@6876 sub=vpxLro opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [VpxLRO] -- FINISH lro-5679
    info vpxd[05623] [Originator@6876 sub=vpxLro opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [VpxLRO] -- BEGIN lro-5680 -- SessionManager -- vim.SessionManager.loginByToken -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    info vpxd[05623] [Originator@6876 sub=UserDirectorySso opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] GetUserInfoInternal(VSPHERE.LOCAL\sps-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, false) res: VSPHERE.LOCAL\sps-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    info vpxd[05623] [Originator@6876 sub=AuthorizeManager opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] [Auth]: User VSPHERE.LOCAL\sps-xxxxxx-xxx-xxxxxx-xxx-xxxxxxxxxxxx
    warning vpxd[05623] [Originator@6876 sub=AuthorizeManager opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] Refresh function is not configured.User data can't be added to scheduler.User name: VSPHERE.LOCAL\sps-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    warning vpxd[05623] [Originator@6876 sub=Vmomi opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] VMOMI activation LRO failed; <<xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, <TCP 'xxx.x.x.x : 8085'>, <TCP 'xxx.x.x.x : 47688'>>, SessionManager, vim.SessionManager.loginByToken, <vim.version.v8_0_1_0, internal, 8.0.1.0>, {stm: {<io_obj p:0x00007f3b8400c1b0, h:91, <TCP '127.0.0.1 : 8085'>, <TCP 'xxx.x.x.x : 47688'>>, id: 105, state(in/out): 3/1}, session: <xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, <TCP 'xxx.x.x.x : 8085'>, <TCP 'xxx.x.x.x : 47688'>>, req: {POST, /sdk}}>, N3Vim5Fault12NoPermission9ExceptionE(Fault cause: vim.fault.NoPermission
    info vpxd[05623] [Originator@6876 sub=vpxLro opID=sps-Main-240307-157-240309-910-9f] [VpxLRO] -- FINISH lro-5680
    error vpxd[05623] [Originator@6876 sub=Default opID=sps-Main-240307-157-240309-910-9f] [VpxLRO] -- ERROR lro-5680 -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -- SessionManager -- vim.SessionManager.loginByToken: :vim.fault.NoPermission
    ----
    warning vpxd[05623] [Originator@6876 sub=Vmomi opID=sps-Main-xxxxxx-xxx-xxxxxx-xxx-xx] VMOMI activation L
    RO failed; <<xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, <TCP 'xxx.x.x.x : 8085'>, <TCP 'xxx.x.x.x : 47688'>>, SessionManager, vim.Sessi
    onManager.loginByToken, <vim.version.v8_0_1_0, internal, 8.0.1.0>, {stm: {<io_obj p:0x00007f3b8400c1b0, h:91, <TCP 'xxx.x.x.x : 80
    85'>, <TCP 'xxx.x.x.x : 47688'>>, id: 105, state(in/out): 3/1}, session: <xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, <TCP 'xxx.x.x.x : 
    8085'>, <TCP 'xxx.x.x.x : 47688'>>, req: {POST, /sdk}}>, N3Vim5Fault12NoPermission9ExceptionE(Fault cause: vim.fault.NoPermission
    --> )
    --> [context]zKq7AVECAQAAAKD8SAEYdnB4ZAAA0cdUbGlidm1hY29yZS5zbwAAYQpGAFEGRwB1zUyBhWkPAXZweGQAgcyHGgGBW464AYERp7gBgeuyuAGBTxu4AYGE7
    rcBgps0RQFsaWJ2aW0tdHlwZXMuc28AgQjVSwIDrRMdbGlidm1vbWkuc28AgZSZLQKBGqVKAoFitUoCgdvdSQKBn4tKAgA5mjsAEuo7AOK2UgSHfwBsaWJwdGhyZWFkLnN
    vLjAABS82D2xpYmMuc28uNgA=[/context]
    info vpxd[05623] [Originator@6876 sub=vpxLro opID=sps-Main-240307-157-240309-910-9f] [VpxLRO] -- FINISH l
    ro-5680
    error vpxd[05623] [Originator@6876 sub=Default opID=sps-Main-240307-157-240309-910-9f] [VpxLRO] -- ERROR 
    lro-5680 -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -- SessionManager -- vim.SessionManager.loginByToken: :vim.fault.NoPermission
    --> Result:
    --> (vim.fault.NoPermission) {
    -->    faultCause = (vmodl.MethodFault) null, 
    -->    faultMessage = <unset>, 
    -->    object = 'vim.Folder:########-####-####-####-########4e95:group-d1', 
    -->    privilegeId = "System.View", 
    -->    missingPrivileges = (vim.fault.NoPermission.EntityPrivileges) [
    -->       (vim.fault.NoPermission.EntityPrivileges) {
    -->          entity = 'vim.Folder:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:group-d1', 
    -->          privilegeIds = (string) [
    -->             "System.View"
    -->          ]
    -->       }
    -->    ]
    -->    msg = ""
    --> }
    --> Args:
    --> 
    --> Arg locale:
    --> "en_US"

 

Environment

VMware vCenter Server 8.0
VMware vCenter Server 8.0.1
VMware vCenter Server 8.0.2

Cause

Missing permission for the vpxd-svcs solution user in VCSA.
From above  vpxd.log we see It is looking for VSPHERE.LOCAL\sps-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Resolution

  1. Locate the User that has a missing permission in the vpxd.log
example :VSPHERE.LOCAL\sps-<########-####-####-####-########2065>
  1. Go to the vCenter web UI client and login as [email protected] 
  2. Click on the Root object of the VCSA   
  3. Go to permissions tab
  4. Click Add.
  5. search for the string of the user that the VC is looking for example sps->xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
  6. Check  the  "Propagate to children" option and select Administrator role.
  7. Save the change.
  8. Restart the sps service with command vmon-cli -r sps
Note: If you get and error from the GUI please follow the workaround of this KB

Workaround:
The workaround understood is to re-add the Service account to the Administrators group. You should be able to find out the SPS service account as highlighted above in vpxd.log. (you may filter by the keyword GetUserInfoInternal) Then Run the below command as mentioned:
 
  1. Take offline snapshots of all the vCenters in ELM mode
  2. Take SSH for VC in question with root privileges and run below command : /usr/lib/vmware-vmafd/bin/dir-cli group modify --name Administrators --add sps-xx-xx-xx-xx-xx
sample: /usr/lib/vmware-vmafd/bin/dir-cli group modify --name Administrators --add  sps-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  1. Restart the sps service with command vmon-cli -r sps


Additional Information


Impact/Risks:

High since the customer's environment will be impacted since sps controls the vMotion and the creation of VM's.