vMotion operations failed due to Storage Policy Service (SPS) instability for VVOL VMs.
search cancel

vMotion operations failed due to Storage Policy Service (SPS) instability for VVOL VMs.

book

Article ID: 410629

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

Vol VMs are failing to migrate with error: "A general System Error Occurred" due to Storage Policy Service (SPS) instability. 

Log evidence:

In Envoy-access logs, we observed occurrences of http2.remote_reset and stream_idle_timeout errors.

----------------
Envoy-access logs
-----------------
envoy-access.log: YYYY-MM-DDT16:56:30.237Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT16:26:28.081Z >POST /sms/sdk 0 http2.remote_reset - 450 0 - 1800004 - - 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:43902 HTTP/1.1 - 127.0.0.1:22000 - "QuerySessionManager"
envoy-access.log: YYYY-MM-DDT16:56:44.261Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT16:56:38.121Z POST /sms/sdk 500 via_upstream - 1050 625 - 2 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:33856 HTTP/1.1 - 127.0.0.1:22000 - "SelectCapabilities"
envoy-access.log: YYYY-MM-DDT16:57:13.790Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT16:57:08.159Z GET /sms/sdk/service 404 via_upstream - 0 745 - 1 0 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:59986 HTTP/1.1 - 127.0.0.1:22000 - -
envoy-access.log: YYYY-MM-DDT17:26:44.402Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT16:56:38.159Z POST /sms/sdk 0 http2.remote_reset - 450 0 - 1800004 - - 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:33872 HTTP/1.1 - 127.0.0.1:22000 - "QuerySessionManager"
envoy-access.log: YYYY-MM-DDT17:26:55.295Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT17:26:48.190Z POST /sms/sdk 500 via_upstream - 1050 625 - 1 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:41374 HTTP/1.1 - 127.0.0.1:22000 - "SelectCapabilities"
envoy-access.log: YYYY-MM-DDT17:27:25.308Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT17:27:18.232Z GET /sms/sdk/service 404 via_upstream - 0 745 - 2 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:57986 HTTP/1.1 - 127.0.0.1:22000 - -
envoy-access.log: YYYY-MM-DDT20:57:50.857Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT20:27:48.763Z POST /sms/sdk 0 http2.remote_reset - 450 0 - 1800004 - - 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:56340 HTTP/1.1 - 127.0.0.1:22000 - "QuerySessionManager"
envoy-access.log: YYYY-MM-DDT20:57:59.775Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT20:57:58.807Z POST /sms/sdk 500 via_upstream - 1050 625 - 1 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:56052 HTTP/1.1 - 127.0.0.1:22000 - "SelectCapabilities"
envoy-access.log: YYYY-MM-DDT20:58:29.743Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT20:58:28.846Z GET /sms/sdk/service 404 via_upstream - 0 745 - 1 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:36460 HTTP/1.1 - 127.0.0.1:22000 - -
envoy-access.log: YYYY-MM-DDT21:27:59.716Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT20:57:58.845Z POST /sms/sdk 0 http2.remote_reset - 450 0 - 1800005 - - 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:56056 HTTP/1.1 - 127.0.0.1:22000 - "QuerySessionManager"
envoy-access.log: YYYY-MM-DDT21:28:11.069Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT21:28:08.887Z POST /sms/sdk 500 via_upstream - 1050 625 - 1 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:34576 HTTP/1.1 - 127.0.0.1:22000 - "SelectCapabilities"
envoy-access.log: YYYY-MM-DDT21:28:40.204Z info envoy[2754] [Originator@6876 sub=Default] YYYY-MM-DDT21:28:38.928Z GET /sms/sdk/service 404 via_upstream - 0 745 - 2 1 0 127.0.0.1:51516 HTTP/2 - 127.0.0.1:1080 127.0.0.1:57946 HTTP/1.1 - 127.0.0.1:22000 - -
envoy-access.log: YYYY-MM-DDT21:52:22.028Z info envoy[2762] [Originator@6876 sub=Default] YYYY-MM-DDT13:52:19.825Z POST /sms/sdk 504 stream_idle_timeout SI 450 14 - 28799999 - - #.#.#.#:47280 HTTP/1.1 TLSv1.2 #.#.#.#:443 127.0.0.1:47238 HTTP/1.1 - 127.0.0.1:22000 - "QueryStorageManager"

The SMS service received the last QueryProviderVersionForUpgrade call. However, this call did not completed, as we do not see the corresponding ending timer set.

--------
SPS logs
---------

sps.log: YYYY-MM-DDT02:53:46.360-04:00 [pool-36-thread-5] INFO  opId=q-858238:h5ui-getProperties:urn:vmomi:Folder:g######:#####-46c5-4674-86c4-afb4166264b5:1871466536:SmsPropertyProvider:3457197-7duj9-h5:70770818 com.vmware.vim.sms.provider.vasa.VersionHandler - Upgrade of https://#.#.#.#:8084/vasa is not possible

opId.log: [YYYY-MM-DD 03:13:45.327] [ERROR] data-service-pool-773        70770818 131087 202159 Invocation of 'void com.vmware.vim.binding.sms.StorageManager.queryProviderVersionForUpgrade(java.lang.String[],com.vmware.vim.vmomi.core.Future)' to http://localhost:1080/external-vecs/http1/###########/443/sms/sdk for opId 'q-858238:h5ui-getProperties:urn:vmomi:Folder:######:#####-46c5-4674-86c4-afb4166264b5:1871466536:SmsPropertyProvider:3457197-7duj9-h5:70770818' failed in ? ms, error: com.vmware.vim.vmomi.client.exception.ConnectionException: http://localhost:1080/external-vecs/http1/###########/443/sms/sdk invocation failed with "java.net.SocketTimeoutException: Read timed out"




Since, SMS endpoint is not working properly, Vpxd gets below error during vMotion of VMs:
 
--------
VPXD logs
---------
vpxd.log: YYYY-MM-DDT23:30:00.042-04:00 warning vpxd[10823] [Originator@6876 sub=VmProv opID=WorkQueue-4ddcf94b-214ab08c-01-01] Received exception from SMS: N7Vmacore21InvalidStateExceptionE(Invalid state)

Environment

vCenter 8.x

Cause

The SPS service became unresponsive due to blocked or incomplete callback operations with VASA providers.

These failures cascaded through Envoy as HTTP/2 resets and idle timeouts, preventing vCenter from completing SMS storage policy queries during vMotion. 

Resolution

Workaround: 

  • Restart sps service. 
    • service-control --restart vmware-sps
  • However, before the workaround is attempted, please ensure that you collect hprof and the log bundle for vCenter and ESXi host (source and destination). 
    • Login in vcenter mob
    • https://<vcenter>/pbm/mob/?moid=PbmDebugManager&method=SpsGenerateHprofDump> 
    • Click on Invoke method
  • This will generate hprof with sps_hprof_* in directory /var/log/vmware/vmware-sps

Post this, collect the log bundles and reach out to Broadcom Technical Support for further investigation.