NSX Upgrade Pre-Check Fails: "Unable to retrieve disk space information on directory /tmp" Due to Unresponsive Cluster API

Products

VMware NSX

Issue/Introduction

Performing an NSX upgrade pre-check fails with an error indicating an inability to retrieve disk space information for the /tmp directory. This occurs even when physical disk space on all manager nodes is confirmed to be sufficient. This error prevents the upgrade process from proceeding.

Environment

VMware NSX

Cause

Upgrade-coordinator relies on API /nsxapi/api/v1/cluster/nodes/{node-id}/status?source=realtime to get system status. System status includes disk space information. Node API API returning error.

Request::URI:http://localhost:7440/nsxapi/api/v1/cluster/nodes/20d7####-####-####-####-####17b7040f/status?source=realtime method:GET
2025-10-15T11:24:18.417Z ERROR http-nio-127.0.0.1-7442-exec-2 UcRestClient 3268797 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30014" level="ERROR" subcomp="upgrade-coordinator"] Error during GET rest request /nsxapi/api/v1/cluster/nodes/20d7####-####-####-####-####17b7040f/status?source=realtime , trial 2 , err com.vmware.nsx.management.upgrade.rpcframework.UcRestRpcException: [UC] Error in rest call. url= //nsxapi/api/v1/cluster/nodes/20d7####-####-####-####-####17b7040f/status?source=realtime , method= GET , response= {
    "module_name" : "common-services",
    "error_message" : "General error has occurred.",
    "details" : "com.vmware.nsx.messaging.exceptions.InvalidArgumentException",
    "error_code" : 100
}
, error= 500 : "{<EOL> "module_name" : "common-services",<EOL> "error_message" : "General error has occurred.",<EOL> "details" : "com.vmware.nsx.messaging.exceptions.InvalidArgumentException",<EOL> "error_code" : 100<EOL>}<EOL>" .
com.vmware.nsx.management.upgrade.rpcframework.UcRestRpcException: org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 : "{<EOL> "module_name" : "common-services",<EOL> "error_message" : "General error has occurred.",<EOL> "details" : "com.vmware.nsx.messaging.exceptions.InvalidArgumentException",<EOL> "error_code" : 100<EOL>}<EOL>"

From the nsxapi.log we can see that the MPA ID is missing for the node,

nsxapi.log:2025-10-14T05:58:17.594Z INFO tcp://127.0.0.1:6191:worker-0 AbstractConnection 3435329 - [nsx@6876 comp="global-manager" level="INFO" subcomp="manager"] NettyConnection(NettyChannel(local=127.0.0.1:57320, remote=127.0.0.1:6191), active=true) dropping received message on unknown stream 0ebc####-####-####-####-####fb1dbff7 without open_stream=true. Probably this stream failed recently. Message: [NsxRpcMessage streamId=0ebc####-####-####-####-####fb1dbff7 payloadSize=0 streamControl= frame=rpc_msg { call_id: 0 has_payload: false status { code: UNKNOWN error_msg: "leaked exception: java.lang.IllegalArgumentException: no msgClient for manager node ClusterNodeConfigModel/20d7####-####-####-####-####17b7040f
\tat com.vmware.nsx.management.agg.rpc.AggSvcClientIdServiceImpl.toClientId(AggSvcClientIdServiceImpl.java:69)
\tat com.vmware.nsx.management.agg.rpc.AggSvcClientIdNsxRpcService.toClientId(AggSvcClientIdNsxRpcService.java:42)
\tat vmware.nsx.aggservice.framework.AggSvcClientIdServiceNsxRpc$MethodHandlers.invoke(AggSvcClientIdServiceNsxRpc.java:171)
\tat com.vmware.nsx.rpc.call.ServerCalls$AsyncUnaryCallObserver.next(ServerCalls.java:140)
\tat com.vmware.nsx.rpc.call.ServerCalls$AsyncUnaryCallObserver.next(ServerCalls.java:121)
\tat com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallStateBase.invokeNext(NsxRpcCall.java:266)
\tat com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallState.doReceiveNonStreamingRemote(NsxRpcCall.java:384)
\tat com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallState.doReceive(NsxRpcCall.java:482)
\tat com.vmware.nsx.rpc.call.NsxRpcCall.doReceive(NsxRpcCall.java:999)
\tat com.vmware.nsx.rpc.channel.NsxRpcChannel.doReceiveNewCall(NsxRpcChannel.java:683)
\tat com.vmware.nsx.rpc.channel.NsxRpcChannel.doReceive(NsxRpcChannel.java:634)
\tat com.vmware.nsx.rpc.channel.task.ChannelReceiveTask.doRun(ChannelReceiveTask.java:21)
\tat com.vmware.nsx.rpc.channel.task.ChannelTask.run(ChannelTask.java:45)
\tat com.vmware.nsx.rpc.channel.NsxRpcChannel.processOperations(NsxRpcChannel.java:848)
\tat com.vmware.nsx.rpc.core.Scheduler.process(Scheduler.java:112)
\tat com.vmware.nsx.rpc.core.Scheduler.run(Scheduler.java:90)
\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
\tat com.vmware.nsx.util.concurrent.Executors$MeteredRunnable.run(Executors.java:353)
\tat java.base/java.lang.Thread.run(Unknown Source)
" } }]

1172317 2025-10-14T06:17:12.692Z WARN nsx-rpc:tcp://127.0.0.1:6191:user-executor-0 AggSvcClientIdServiceImpl 3435329 - [nsx@6876 comp="global-manager" level="WARNING" subcomp="manager"] MP cluster node ClusterNodeConfigModel/20d7####-####-####-####-####17b7040f has a null MPA entity ID.

Resolution

The issue is resolved in NSX 4.2

Workaround for the issue.

On the node we are seeing the error about no sufficient space:
Run the below script to restore the file /etc/vmware/nsx-mpa/mpaconfig.json
1. # cd /opt/vmware/nsx-mpa/
# ./mpaconfigrestore.sh

Restart CBM so re-registration of Mpa is done
2. /etc/init.d/nsx-cluster-boot-manager restart