Node agent throws error NCP01012 with error "Agent is exiting as connection is unavailable" which causes pods to restart./var/log/nsx-ujo/nsx_node_agent.log:66770 2024-07-16T06:35:39.731Z #######-####-####-####-############ NSX 872887 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="WARNING"] nsx_ujo.agent.agent Agent is unavailable for 30 seconds: connection inactive.hyperbus service inactive., retrying66771 2024-07-16T06:35:44.738Z #######-####-####-####-############ NSX 872887 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="ERROR" errorCode="NCP01012"] nsx_ujo.agent.agent Agent is exiting as connection is unavailable66772 2024-07-16T06:35:44.830Z #######-####-####-####-############ NSX 872858 - [nsx@6876 comp="nsx-container-node" subcomp="nsx_node_agent" level="WARNING"] oslo_privsep.comm Unexpected error: <class 'OSError'>
NCP 4.1.X
NSX Version: 4.1.X
Impact: Pod restarts are observed across clients spread across clusters
On vMotion, node agent closes existing rpc connection with CfgAgent. Connection may be in sleep state for upto 40 secs which may block/delays connection close for that long. While NCP health check detects if node agent remains disconnected for more than 30 secs it restarts the pod.
No workaround available
Fix version NSX 4.2.1