[VMC on AWS] Unable to complete SRM Site Pairing -- Operation timed out: 300 seconds.
search cancel

[VMC on AWS] Unable to complete SRM Site Pairing -- Operation timed out: 300 seconds.

book

Article ID: 329463

calendar_today

Updated On:

Products

VMware Cloud on AWS

Issue/Introduction

Symptoms:
Connectivity tests between VMC and on-prem components are successful.

The on-premises vCenter is not using federated SSO, i.e., Enhanced Linked Mode (ELM) or Hybrid Linked Mode (HLM).

From Cloud vCenter GUI you see messages similar to:
Operation Failed SRM server 'srm.sddc-xx-xx-xx-xx.vmware.com' cannot complete a pair operation. The reason is: Operation timed out: 300 seconds.

You see messages in the Cloud SRM Log Intelligence similar to:

  • Federated SSO server detected at LS++
  • Unable to retrieve token from STS
  • N9SsoClient27InvalidCredentialsExceptionE Authentication failed: Invalid credentials

From on-prem vCenter log messages:
Unable to connect to Lookup Service at https://vcenter.sddc-xx-xx-xx-xx.vmware.com:443/lookupservice/sdk. Reason https://vcenter.sddc-xx-xx-xx-xx.vmware.com:443/lookupservice/sdk invocation failed with "org.apache.http.conn.ConnectTimeoutException: Connect to vcenter.sddc-xx-xx-xx-xx.vmware.com:443 failed: connect timed out"

Cause

VMC vCenter STS certificate are listed under trustedcertificatechains in vmdir.  This leads SRM to believe that it is in a federated environment, thus not creating the remote solution users properly.

Resolution

This can be resolved by removing the VMC STS certificate from on-prem vmdir trustedccertificatechains.

Verify that the VMC STS certificate is in fact located under trustedcertificatechains within the on-prem SSO config:
- Download tool Jxplorer (http://www.jxplorer.org).
- Instructions:
Connection parameters
- Host: <IP of SSO machine>
- Port: 389
- Protocol: LDAP v3
- Level: User + Password
- User DN: cn=Administrator,cn=Users,dc=vsphere,dc=local
- Password: SSO administrator password


Using Jxplorer, connect to the on-prem PSC and navigate to Local>vsphere>Services>TrustedCertificateChains


View properties for each of the TrustedCertChain-# listed in this directory by clicking on the Value for userCertificate.

Navigate to the Details tab:


Note which of the TrustedCertChain values contains the VMC STS cert as seen below:



Once it has been verified that the STS certificates for VMC are present, backup the PSC and vCenters in the environment before proceeding with removal of the STS certs.

Steps to perform:

  • Power down all associated on-premises PSCs and VCs at the same time
  • Take offline snapshots of the PSCs and VCs
  • Power on the PSCs and VCs
  • Remove ONLY the certs containing VMC STS certificates from TrustedCertificateChains


Verify functionality of environment.

Please see the "Impact / Risks" section, if you find any problems after this change.

With the VMC STS certificates removed, re-attempt the site pairing. The process should complete successfully


Additional Information

For general DRaaS Troubleshooting, please see the main KB: 78941

SRM does not support HLM when configured from the Cloud Gateway Appliance on-prem:
https://docs.vmware.com/en/VMware-Site-Recovery/services/rn/site-recovery-release-notes.html

SRM supports HLM when configured manually through the VMC SDDC: 
https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vsphere.vmc-aws-manage-data-center-vms.doc/GUID-9803F0FA-42A2-4E0A-8597-2362B2BC46CC.html


Impact/Risks:
Please only use this resolution if all of the symptoms, tests, and configurations match those described in the Symptoms section of this KB.

The resolution steps provide caution to use snapshots of the associated VMs, in case there is a problem with the change that needs a rollback. If the change does cause a problem, then shutdown the same VMs and revert (not delete) the snapshots. Once the VMs are started again, they will be in the restored state before the snapshot and the change was made.