Disk group failed to mount on vSAN encryption enabled cluster.
search cancel

Disk group failed to mount on vSAN encryption enabled cluster.

book

Article ID: 326487

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • This article is to assist in troubleshooting and/or resolving vSAN issues with encrypted diskgroups. 

Symptoms:

  • All of the disks on ESXi vSAN hosts show In CMMDS: false.

  • Encryption shows enabled on all diskgroups.

  • When trying to mount the diskgroup manually the following message occurs. "Unable to mount: Host is not in crypto safe state". 

  • The following logs may appear in the vsansystem.log.

2018-05-15T22:02:10.373Z info vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: Get kms client key and cert, old:1
2018-05-15T22:02:10.373Z info vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: GetKmsServerCerts Old KMS certs not found
2018-05-15T22:02:10.379Z info vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: Get kms client key and cert, old:1
2018-05-15T22:02:10.379Z info vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: GetKmsServerCerts Old KMS certs not found
2018-05-15T22:02:10.379Z info vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: Create client context for server 10.x.x.47:5696
{67839} configure_backend_platform() - Configuring dynamic Linux backends
{67839} open_shared_lib() - Loaded crypto from /lib64/libcrypto.so.1.0.2
{67839} open_shared_lib() - Loaded ssl from /lib64/libssl.so.1.0.2
{67839} open_shared_lib() - Loaded qlopenssl from /usr/lib/vmware/vsan/lib64/libqlopenssl.so
{67839} try_openssl_backend() - Configured OpenSSL backend
{67839} peer_verify_cb() - Pending 1558
{67839} peer_verify_cb() - Pending 1424
2018-05-15T22:02:10.423Z error vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanUtil: Failed to connect to key server, QLC_ERR_NEED_AUTH
2018-05-15T22:02:10.423Z error vsansystem[332605D700] [Originator@6876 sub=Libs opID=Main] VsanInfoImpl: Failed to retrieve key f1feec7b-a9e7-477d-b4e4-2xxxxxxxa from KMS Cloudlink: QLC_ERR_NEED_AUTH
2018-05-15T22:02:10.423Z warning vsansystem[332605D700] [Originator@6876 sub=VsanPluginMgr opID=Main] Failed to retrieve host key from KMS: Failed to retrieve key from key management server cluster Cloudlink
2018-05-15T22:02:10.423Z error vsansystem[332605D700] [Originator@6876 sub=VsanSystemProvider opID=Main] Failed to check and load keys: Failed to retrieve key from key management server cluster Cloudlink
2018-05-15T22:02:10.423Z info vsansystem[332605D700] [Originator@6876 sub=VsanSystemProvider opID=Main] KEK unavailable, schedule again.

Environment

VMware vSAN 6.x

VMware vSAN 7.x

VMware vSAN 8.x

Cause

ESXi hosts in vSAN cluster are unable to contact to the KMS server, and therefore all disk groups are unmountable 

Resolution

  • Verify that the KMS server is up, and can be reached by ESXi hosts, and vCenter. KMS server should NOT be located in the vSAN datastore.

  • Also check KMS server against the VMware HCL. 

  • Validate and/or check the KMS configuration on ESXi hosts.

  • All keys, and certificates for KMS should be located here. /etc/vmware/ssl.

  • Test connection to KMS server with the following command:

#openssl s_client -connect <IP or FQDN of KMS server>:5696 -key /etc/vmware/ssl/vsan_kms_client.key -cert /etc/vmware/ssl/vsan_kms_client.crt -debug

NOTE:

There is no way to mount an encrypted disk group without the KMS server that originally encrypted it. It is crucial to always backup all KMS servers and to not keep them housed on the vSAN datastore that is encrypted.

It is recommended to run the KMS server on a secured, non-encrypted, redundant datastore that is highly available.