NSX Manager Upgrade Fails with "Unexpected error while upgrading upgrade unit" while upgrading from version 4.2.x.

search cancel

NSX Manager Upgrade Fails with "Unexpected error while upgrading upgrade unit" while upgrading from version 4.2.x.

book

Article ID: 434774

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

During an upgrade of VMware NSX from version 4.2.X to 4.2.3.2, the following issues are observed:

After successful Edge and Host Transport Node upgrades, the third NSX Manager (orchestrator node) fails to upgrade.

The UI displays the error: "Unexpected error while upgrading upgrade unit"

A 500 Internal Server Error is reported during the manager upgrade phase.

The environment experiences extreme sluggishness and slow response times across the NSX Manager UI and SSH access.

API responses from remote manager nodes exceed the default 60-second timeout limit.

Log lines similar to the below are encountered in var/log/nvpapi/api_server.log:

2026-03-12T23:58:47.033Z napi.root.node.central_api ERROR Invoking GET /api/v1/node/version with timeout 60 on XXXXbbXX-0XX6-4XXd-8XXe-XXXXb470XXXX failed

Reviewing var/log/nvpapi/api_access.log shows the 500 error and timeout:

2026-03-13T01:34:53.219Z INFO UC 'GET /api/v1/cluster/XXXX09XX-2XX5-aXX9-fXX6-5bXXXXXXXX1e/node/upgrade/progress-status' 500 354 "" "Java/11.0.23" "" 60.00408

Reviewing the api_server.log for virtual IP processing time:

2026-03-13T17:12:45.530Z napi.root.node.services.http.__self__ INFO get_virtual_ip_address: 0.003s; get_virtual_interface: 0.000s; parse_interfaces_file: 0.000s; _down_vip_if_needed_greenlet: 0.000s; _up_virtual_interface: 0.496s; _send_garp_for_ip_address: 453.762s; _delete_ip_addr_from_arp_cache: 0.492s; _add_or_remove_iptables_rules: 0.418s; _ping_ip_addr: 0.008s; _arp_cache_has_ip_address: 0.007s; _verify_ip_config: 0.315s; total: 455.502s

Checking for hung processes in var/log/stats/sys_threads.stats:

853048 root 20 0 3256 1200 1100 S 0.0 0.0 0:00.00 853048 /usr/bin/arping -c 1 -A XX.XX.XX.XXX-I eth0

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.2.x

Cause

The upgrade failure and UI/SSH sluggishness are caused by a significant delay in API responses between manager nodes.

The internal method _send_garp_for_ip_address() uses the arping utility to periodically confirm Virtual IP (VIP) ownership within the manager cluster.

In this scenario, an arping instance becomes stuck in a kernel wait state (hang). Because this blocks a subprocess responsible for catering to Northbound API (NAPI) requests, inter-node communication for the progress-status API exceeds the 60-second timeout, resulting in the upgrade failure.

Resolution

This issue is resolved in VMware NSX 9.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround

If the upgrade is currently in progress and has failed/paused on the third node:

Ensure the upgrade state is in a 'Paused' state.
Reboot the affected manager node. Once the node is up , resume the upgrade from NSX UI.
Note: Rebooting a manager node during an active (non-paused) upgrade can cause unexpected cluster synchronization issues.
Once the upgrade is complete , if sluggishness persists, restart the proton services on the manager nodes .
```
root@nsx-mngr-01:~# service proton restart 
```

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

NSX Manager support bundles.
Text of any error messages seen in NSX GUI or command lines pertinent to the investigation.

Handling Log Bundles for offline review with Broadcom support:

Feedback

thumb_up Yes

thumb_down No