Provide troubleshooting steps to isolate common infrastructure issues that may impact HCX Site Pairing communication between HCX Connector and Cloud Manager.
The same steps are applicable for HCX Cloud to Cloud Site Pairings but access restrictions may apply from the Cloud Provider.
HCX Site Pairing is not established after initial configuration or going down unexpectedly after being in service.
SocketTimeoutException Read timed out
The following error was received during configuring site pairing.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p> </body></html
HCX
Site Pairing connectivity between HCX Managers will depend entirely on the underlying network infrastructure, so problems with basic routing, firewall configuration, or proxy settings can disrupt that communication.
When Site Pairing is down or not getting established for the first time, check the following:
curl -k -v https://<HCX_Manager_FQDN> * Trying #.#.#.#... * TCP_NODELAY set * Connected to <HCX_Manager_FQDN> (#.#.#.#) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: C=US; ST=California; L=Palo Alto; O=VMware, Inc; OU=Hybridity; CN=<HCX_Manager_FQDN> * start date: Jun 18 05:42:30 2021 GMT * expire date: Jun 18 05:42:27 2022 GMT * issuer: C=US; O=Entrust, Inc.; OU=See <Website link>/legal-terms; OU=(c) 2012 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1K * SSL certificate verify ok. > GET / HTTP/1.1 > Host: <HCX_Manager_FQDN> > User-Agent: curl/7.64.1 > Accept: */* > < HTTP/1.1 302 < Date: Wed, 20 Apr 2022 18:13:26 GMT < Server: Apache < Location: https://<HCX_Manager_FQDN>/hybridity/ui/hcx-client/index.html < Content-Length: 0 < Content-Security-Policy: style-src 'self' 'unsafe-inline'; font-src 'self' data:; img-src 'self' data: < * Connection #0 to host <HCX_Manager_FQDN> left intact * Closing connection 0
IMPORTANT: By default, when a Proxy server is configured, the Connector or Cloud Manager uses it for all HTTPS connections ( including communication to local vCenter Server, ESXi, NSX, and HCX IX and NE appliances over the Management Network ) therefore, required entries in the Exclusion list must be included to allow direct access to local network resources. Also, app and web engines restart is required for any changes in the Proxy configuration to take effect.
# su - root # traceroute <HCX_Cloud_Manager_IP>
Perform the packet captures from each HCX manager and connector while setting the site pairing to observe issues such as MTU and analyze the packet captures using tools such as Wireshark.
From HCX connector:
tcpdump -n -i eth0 host <HCX cloud IP> and host <HCX Connector IP> -w /tmp/HCX_connector_pkt_capture.pcap
From HCX Cloud
tcpdump -n -i eth0 host <HCX cloud IP> and host <HCX Connector IP> -w /tmp/HCX_cloud_pkt_capture.pcap
Note:
If there are no underlying issues, site pairing via the API should successfully re-establish the connection. But it has been also observed that due to stale entries in the 'RemotingOutbox' collection due to the site pairing being down issue's the site pairing via the API may not help. In such cases, it is necessary to check the status of the 'RemotingOutbox' in the HCX database and clear any outdated entries. Please contact Broadcom Support for more information on this: Contact Broadcom support
Workaround:
There is no workaround to have full HCX services available without site pairing connectivity between data center sites.
HCX site pairing fails with error "NumberFormatException" (80210)
HCX - Site pairing disconnected with "Error queuing Job: Workflow ReplicationTransferJob" (81978)
HCX - Resync service mesh: "Error in communicating with remote side to find NSX types" (328952)
Impact/Risks:
If the Site Pairing is down, configuration workflows will fail and no migrations can be scheduled from HCX Connector or source Cloud Manager.
Existing Network Extension services will remain active indefinitely but no configuration changes can be made on those, except for "unstretch", which can be forced from the target HCX Cloud Manager's side.