NSX upgrade prior to version 4.2.0 hangs on manager phase, never moves past 0% on Data Migration Dry Run step
search cancel

NSX upgrade prior to version 4.2.0 hangs on manager phase, never moves past 0% on Data Migration Dry Run step

book

Article ID: 428898

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • An NSX upgrade is being performed. The version being upgraded to is earlier than NSX 4.2.0.
  • The Upgrade Coordinator, Edge and Host upgrade phases have completed.
  • The Manager upgrade phase is stuck at 0% on the Data Migration Dry Run step. A rotating circle is displayed.
  • Restarting the install-upgrade service on the NSX manager does not resolve the issue.
  • Messages similar to the following are present in the /var/log/upgrade-coordinator/upgrade-coordinator.log file on the NSX manager currently running the install-upgrade service:

    2026-01-31T18:25:35.147Z  INFO Thread-15 VMwareDownloadSiteSchedularServiceImpl 7429 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Started Populating the compatible versions from VMware Download Site.
    ...
    2026-01-31T18:25:35.204Z  INFO Thread-15 LoggingRestTemplate 7429 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Request::URI:https://apigw.vmware.com/v1/m4/service/oauthservice/oauth/token?scope=read&grant_type=client_credentials method:POST
    2026-01-31T18:25:35.204Z  INFO Thread-15 LoggingRestTemplate 7429 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Request body :<Request Body is masked on the hint.>
    ...
    2026-01-31T18:19:50.293Z  INFO http-nio-127.0.0.1-7442-exec-7 NsxUpgradePlugin 27394 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] Fetching persisted internal data for component MP
    2026-01-31T18:19:50.305Z ERROR http-nio-127.0.0.1-7442-exec-4 NsxBaseRestController 27394 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="upgrade-coordinator"] ServletOutputStream failed to write: java.io.IOException: Broken pipe
    org.springframework.web.context.request.async.AsyncRequestNotUsableException: ServletOutputStream failed to write: java.io.IOException: Broken pipe

  • Messages similar to the following are present intermittently in the /var/log/upgrade-coordinator/upgrade-coordinator.log file on the NSX manager currently running the install-upgrade service after the previous messages:

    2026-02-03T18:41:26.923Z  INFO http-nio-127.0.0.1-7442-exec-3 UpgradeQueryServiceImpl 7429 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="upgrade-coordinator"] For component type MP componentUpgradeStatus is IN_PROGRESS 0%

Environment

VMware NSX

Cause

Prior to NSX 4.2.0, a check against apigw.vmware.com was made during the manager upgrade phase of an NSX upgrade. This address no longer exists but does still resolve to several Cloudflare IP addresses. Connection attempts to this address should fail almost immediately. It is possible, when there is a local network device that intercepts and modifies traffic between on-premises addresses and Cloudflare addresses, that this connection attempt will hang indefinitely. 

The connection hang can be tested from the NSX CLI via the following command (run as the root user):

curl -v -X POST https://apigw.vmware.com/v1/m4/service/oauthservice/oauth/token?scope=read\&grant_type=client_credentials

You should see results similar to the following

*   Trying ###:###:###:###:443...
* Connected to apigw.vmware.com (###:###:###:###) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Palo Alto; O=Broadcom Inc.; CN=*.vmware.com
*  start date: Aug 13 00:00:00 2025 GMT
*  expire date: Apr 14 23:59:59 2026 GMT
*  subjectAltName: host "apigw.vmware.com" matched cert's "*.vmware.com"
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x18f1ae519750)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> POST /v1/m4/service/oauthservice/oauth/token?scope=read&grant_type=client_credentials HTTP/2
> Host: apigw.vmware.com
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 522
< date: Tue, 10 Feb 2026 18:22:41 GMT
< content-type: text/plain; charset=UTF-8
< content-length: 15
< strict-transport-security: max-age=31536000; includeSubDomains
< x-frame-options: SAMEORIGIN
< referrer-policy: same-origin
< cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< expires: Thu, 01 Jan 1970 00:00:01 GMT
< access-control-allow-credentials: true
< access-control-allow-headers: *
< access-control-allow-methods: OPTIONS,GET
< access-control-allow-origin: *
< server: cloudflare
< cf-ray: 9cbda222db1c1876-LAX
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection #0 to host apigw.vmware.com left intact
error code: 522

When the issue is present, this command will hang early on. The "error code: 522" line will not be present and you will not be returned to a prompt.

*   Trying ###:###:###:###:443...
* Connected to apigw.vmware.com (###:###:###:###) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):

Resolution

This issue is resolved in VMware NSX 4.2.0.

To work around this issue, create a "dummy" record for apigw.vmware.com in the /etc/hosts file of the NSX manager node running the install-upgrade service.
As the root user, run the following command:

echo "127.0.0.1       apigw.vmware.com" >> /etc/hosts

Restart the install-upgrade service by running the restart service install-upgrade command as the admin user.

Additional Information

You can determine which NSX manager node is running the install-upgrade service via the following command (run as the admin user):

get service install-upgrade

Note: You should see results similar to the following:

Mon Feb 09 2026 UTC 14:10:26.125
Service name:      install-upgrade
Service state:     running
Enabled on:        192.168.100.10