Race condition in NSX federation setup post restarting nsx-appl-proxy service
search cancel

Race condition in NSX federation setup post restarting nsx-appl-proxy service

book

Article ID: 369789

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Restarting the nsx-appl-proxy service on manager node may hit the LM-GM disconnection issue due to race condition
  • aph.sock error seen on "/var/log/vmware/appl-proxy-rpc.log"

2024-06-08T19:07:04.538Z <manager> NSX 108625 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="108628" level="WARNING"] StreamConnection[146 Connecting to unix:///var/run/vmware/appl-proxy/aph.sock(pid:71915 uid:1008 gid:1008) sid:146] Couldn't connect to 'unix:///var/run/vmware/appl-proxy/aph.sock(pid:71915 uid:1008 gid:1008)'
(error: 2-No such file or directory)
2024-06-08T19:07:04.538Z <manager> NSX 108625 - [nsx@6876 comp="global-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="108628" level="WARNING"] StreamConnection[146 Error to unix:///var/run/vmware/appl-proxy/aph.sock(pid:71915 uid:1008 gid:1008) sid:-1] Error 2-No such file or directory

  • retry attempts seen on /var/log/async-replicator/ar.log

2024-06-09T06:43:37.224Z INFO ForkJoinPool.commonPool-worker-6 ConnectionKeeperListener 112146 - [nsx@6876 comp="global-manager" level="INFO" subcomp="async-replicator"] Scheduling reconnect attempt for unix:///var/run/vmware/appl-proxy/aph.sock after delay = 1000

2024-06-09T06:43:37.224Z INFO ForkJoinPool.commonPool-worker-6 ConnectionKeeperListener 112146 - [nsx@6876 comp="global-manager" level="INFO" subcomp="async-replicator"] Scheduling reconnect attempt for unix:///var/run/vmware/appl-proxy/aph.sock after delay = 1000

Environment

VMware NSX-T Data Center 3.x
VMware NSX 4.x

 

Cause

  • Race condition when Async Replicator is trying to create multiple connections to Appliance Proxy Hub from the same service. 
  • Async Replicator service to APH may encounter RPC message deliver failure issue when Async Replicator and Appliance Proxy Hub connection retries happen.

Resolution

  • The fix is in VMware NSX 4.2.0 and above
  • As a workaround, restart the "nsx-appl-proxy" and “async-replicator-service” using the below commands from root mode of the Manager nodes

systemctl stop nsx-appl-proxy
systemctl start nsx-appl-proxy


systemctl stop async-replicator-service
systemctl start async-replicator-service

  • Once the above services are restarted, validate the "/var/log/vmware/appl-proxy-rpc.log" logs and we should not see any aph.sock error