The /storage/log volume is filling up in vCenter 7.0 U2 due to growing sps-runtime.log.stderr
search cancel

The /storage/log volume is filling up in vCenter 7.0 U2 due to growing sps-runtime.log.stderr

book

Article ID: 318194

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

This article applies when:
  • You observe the storage utilization on the /storage/log volume is full, or very high
  • when running du /storage/log/vmware | sort -rn | head , one of the highest consumers of disk space is the /storage/log/vmware/vmware-sps directory
  • when running ls -lSr /storage/log/vmware/vmware-sps | tail , the sps-runtime.log*.stderr files are taking up several gigabytes of space
  • The contents of the sps-runtime.log.stderr are repeats of the following message:
 org.bouncycastle.jsse.provider.ProvTlsClient notifyAlertRaised
INFO: Client raised fatal(2) certificate_unknown(46) alert: Failed to read record
org.bouncycastle.tls.TlsFatalAlert: certificate_unknown(46)
        at org.bouncycastle.jsse.provider.ProvSSLSocketDirect.checkServerTrusted(Unknown Source)
        at org.bouncycastle.jsse.provider.ProvTlsClient$1.notifyServerCertificate(Unknown Source)
        at org.bouncycastle.tls.TlsUtils.processServerCertificate(Unknown Source)
        at org.bouncycastle.tls.TlsClientProtocol.handleServerCertificate(Unknown Source)
        at org.bouncycastle.tls.TlsClientProtocol.handleHandshakeMessage(Unknown Source)
        at org.bouncycastle.tls.TlsProtocol.processHandshakeQueue(Unknown Source)
        at org.bouncycastle.tls.TlsProtocol.processRecord(Unknown Source)
        at org.bouncycastle.tls.RecordStream.readRecord(Unknown Source)
        at org.bouncycastle.tls.TlsProtocol.safeReadRecord(Unknown Source)
        at org.bouncycastle.tls.TlsProtocol.blockForHandshake(Unknown Source)
        at org.bouncycastle.tls.TlsClientProtocol.connect(Unknown Source)
        at org.bouncycastle.jsse.provider.ProvSSLSocketDirect.startHandshake(Unknown Source)
        at org.bouncycastle.jsse.provider.ProvSSLSocketDirect.startHandshake(Unknown Source)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167)
        at com.vmware.vim.sms.provider.ProviderFactory.validateUrl(ProviderFactory.java:594)
        at com.vmware.vim.sms.provider.ProviderFactory.validateVasaSpec(ProviderFactory.java:554)
        at com.vmware.vim.sms.provider.ProviderFactory.createVasaProvider(ProviderFactory.java:187)
        at com.vmware.vim.sms.provider.ProviderFactory.createProvider(ProviderFactory.java:166)
        at com.vmware.vim.sms.StorageManagerImpl.registerProviderInt(StorageManagerImpl.java:461)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.vmware.vim.sms.task.JobHandler.run(JobHandler.java:70)
        at com.vmware.vim.storage.common.task.opctx.RunnableOpCtxDecorator.run(RunnableOpCtxDecorator.java:38)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.security.cert.CertificateException: Unable to construct a valid chain
        at org.bouncycastle.jsse.provider.ProvX509TrustManager.validateChain(Unknown Source)
        at org.bouncycastle.jsse.provider.ProvX509TrustManager.checkTrusted(Unknown Source)
        at org.bouncycastle.jsse.provider.ProvX509TrustManager.checkServerTrusted(Unknown Source)
        ... 32 more
Caused by: java.security.cert.CertPathBuilderException: Unable to find certificate chain.
        at org.bouncycastle.jcajce.provider.PKIXCertPathBuilderSpi.engineBuild(Unknown Source)
        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
        at org.bouncycastle.jsse.provider.ProvX509TrustManager.buildCertPath(Unknown Source)
        ... 35 more


Environment

VMware vCenter Server 7.0.x

Cause

When the vpxd.certmgmt.mode is set to 'thumbprint', the SPS service needs to have an entry in the SMS store in VECS for the certificate presented by the IOFILTER provider on the host in order to trust the connection (since it cannot trust the connection based on the issuer, since some hosts may still be using their self-signed certificate). If this entry does not exist, or if the current SSL certificate of the host does not match the certificate in the entry, the errors will flood in the sps-runtime.log.stderr file.

If the vpxd.certmgmt.mode is set to 'vmca' or 'custom', if the certificate of the host or the IOFilter provider on the host cannot be verified (the signing CA is not known to vCenter), then this issue can occur.

Resolution

This issue is resolved in VMware vCenter Server 7.0 U3. 

Workaround:
VMware does not recommend using the value of 'thumbprint' for the vpxd.certmgmt.mode Advanced Setting for extended periods, and would recommend changing the value to the default 'vmca', or 'custom', depending on the customer's security requirements. However, changing to one of these values would require that certificates be re-issued to the hosts, which can be time consuming unless scripted.

If the value of the vpxd.certmgmt.mode cannot be immediately changed, then the fix-sps-certs.sh script attached to this article can be run to enumerate through the hosts in the vCenter's inventory, and will check the SMS store in VECS for the existence of an entry with the alias of 'https://<hostname>:9080/version.xml', obtain the current certificate from the host on port 9080, and create the entry if it does not exist, or update it if the certificates do not match. The SPS service will be restarted, and subsequently the sps-runtime.log.stderr should not continue to be flooded with the errors.

1. Upload the fix-sps-certs.sh script to the VCSA (in this example, to the /tmp directory)
2. Make the file executable:
chmod +x /tmp/fix-sps-certs.sh
3. Ensure that no Windows carriage returns are in the file:
sed -i 's/\r//g' /tmp/fix-sps-certs.sh
4. Run the script:
./fix-sps-certs.sh
5. If the vpxd.certmgmt.mode is set to 'thumbprint', the IOFilter provider entries in the SMS store in VECS will be updated, or created (if missing).
6. If the vpxd.certmgmt.mode is set to 'vmca' or 'custom', the script will prompt if the root password for all the ESXi hosts are the same. If they are, each host in inventory will have its reverse proxy SSL certificate and IOFilter provider certificate checked to see if the issuer is trusted by vCenter, and will also check the contents of the /etc/vmware/ssl/castore.pem file to see if the ESXi host trusts the issuer of the vCenter's Machine SSL certificate, or the SPS service (SMS certificate). If the ESXi root passwords are not uniform throughout the vCenter's inventory, then only the trust of the reverse proxy SSL certificate and IOFilter provider certificate will be performed
7. If the vpxd.certmgmt.mode is set to 'thumbprint', the vmware-sps service will be restarted. Monitor the size of the sps-runtime.log.stderr file to see if it has stopped growing:
watch -n 10 ls -l /storage/log/vmware/vmware-sps/sps-runtime.log.stderr

Attachments

fix-sps-certs get_app