VCD Cell is "Inactive"; Unable to Fully Initialize the vmware-vcd Service
search cancel

VCD Cell is "Inactive"; Unable to Fully Initialize the vmware-vcd Service

book

Article ID: 342535

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • The purpose of this KB is to resolve the issue where a cell suddenly becomes "inactive" and is unable to fully initialize


Symptoms:
  • From logs, VCD applications cannot fully initialize because it cannot read the truststore from the local storage due to "EOFException" error as shown below:

2022-07-25 05:52:56,809 | ERROR | Bootstrap Application | HttpEngineSslCertificateVerifier | Certificate migration failed. |
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at com.sun.crypto.provider.JceKeyStore.engineLoad(JceKeyStore.java:799)
        at java.security.KeyStore.load(KeyStore.java:1445)
        at com.vmware.vcloud.common.ssl.CertificateStoreConfigurationUtils.loadOrGenerateKeyStoreAndPassword(CertificateStoreConfigurationUtils.java:105)
        at com.vmware.vcloud.common.ssl.CertificateStoreConfigurationUtils.configureTrustedCertificates(CertificateStoreConfigurationUtils.java:68)
        at com.vmware.vcloud.common.main.bootstrap.utilities.TrustedCertificateMigrator.prepareForMigration(TrustedCertificateMigrator.java:123)
        at com.vmware.vcloud.common.main.bootstrap.utilities.TrustedCertificateMigrator.migrateTrustedCertificates(TrustedCertificateMigrator.java:51)
        at com.vmware.vcloud.common.main.bootstrap.HttpEngineSslCertificateVerifier.migrateTrustedCertificates(HttpEngineSslCertificateVerifier.java:67)
        at com.vmware.vcloud.common.main.bootstrap.HttpEngineSslCertificateVerifier.getCertificateStorePath(HttpEngineSslCertificateVerifier.java:47)
        at com.vmware.vcloud.common.main.bootstrap.AbstractSslCertificateVerifier.canProceed(AbstractSslCertificateVerifier.java:37)
        at com.vmware.vcloud.common.main.bootstrap.HttpEngineSslCertificateVerifier.canProceed(HttpEngineSslCertificateVerifier.java:24)
        at com.vmware.vcloud.common.main.StartupVerifierRunnerStartupAction.call(StartupVerifierRunnerStartupAction.java:46)
        at com.vmware.vcloud.common.main.DelegatingStartupAction.call(DelegatingStartupAction.java:33)
        at com.vmware.vcloud.common.main.bootstrap.BootstrapApplication.start(BootstrapApplication.java:160)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.vmware.vcloud.common.main.CloudLauncher.launchCloud(CloudLauncher.java:403)
        at com.vmware.vcloud.common.main.CloudLauncher.run(CloudLauncher.java:157)
        at com.vmware.vcloud.common.main.CloudLauncher.main(CloudLauncher.java:119)
  • As a result, VCD applications could not work correctly and the status for the cell becomes "inactive"


Environment

VMware Cloud Director for Service Provider 10.x
VMware Cloud Director 10.x

Cause

  • The cell goes down because the /opt/vmware/vcloud-director/etc/truststore file unexpectedly corrupts
  • This can be identified using a number of different logs
  • In the cell-management-tool.log, you may see something similar to the following:
2022-07-24 13:12:55,656 | ERROR | main | TrustStoreImporter | Keystore is malformed or the keystore password is incorrect. | java.io.EOFException

2022-07-24 13:12:55,657 | ERROR | main | ImportTrustedCertificatesCommand | Failed to extract or import certificates, consult cell-management-tool.log. | java.io.EOFException
  • In the vcloud-container-debug.log, you may see something similar to the following:
2022-07-25 05:52:59,780 | WARN | CloudProxy Application | FileSystemTrustManager | Reloading local truststores failed |
java.io.EOFException
  • In the cell-runtime.log, you may see something similar to the following:
2022-07-25 05:52:56,809 | ERROR | Bootstrap Application | HttpEngineSslCertificateVerifier | Certificate migration failed. |
java.io.EOFException

Resolution

  • The resolution is as follows:

1. Put vmware-vcd service into maintenance mode.
 
/opt/ vmware/vcloud-director/bin/cell-management-tool -u administrator cell –maintenance true

2. Stop vmware-vcd service after it successfully enters maintenance mode.

/opt/ vmware/vcloud-director/bin/cell-management-tool -u administrator cell –shutdown

3. Stop the appliance-sync.timer.

systemctl stop appliance-sync.timer

3. Move the existing truststore it is trying to access to some other location.
 
mv /opt/vmware/vcloud-director/etc/truststore /tmp

4. Restart appliance-sync.timer.

systemctl start appliance-sync.timer

5. Check in cell-management-tool logs if the ImportTrustedCertificatesCommand is running without any errors.

6. Start vcd service.

service vmware-vcd start


Workaround:
  • The only other workaround is to re-deploy the cell


Additional Information

https://bugzilla.eng.vmware.com/show_bug.cgi?id=3011748#c16

Impact/Risks:
  • When the truststore corrupts, the cell becomes inactive and can no longer field VCD requests; it is effectively down
  • If the primary node is affected, it necessitates failover to a standby node