NSX prepared hosts show status "Install Failed"
search cancel

NSX prepared hosts show status "Install Failed"

book

Article ID: 383712

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Issue affects multiple hosts.
  • Clicking on "Resolve" in the UI may result in host status changing to "disconnected".
  • From a problematic host some managers show status "Standby".

nsxcli -c get managers
<Manager IP> Standby (NSX-RPC) <<<<
<Manager IP> Standby (NSX-RPC) <<<<
<Manager IP> Connected (NSX-RPC) *

  • Connectivity to controller is ok

nsxcli -c get controllers
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN           Failure Reason
<Manager IP>    1235   enabled      not used            false              null              NA                       NA
<Manager IP>    1235   enabled      not used            false              null              NA                       NA
<Manager IP>    1235   enabled     connected             true               up               NA                       NA

  • Connectivity from host to some managers over port 1234 may fail.

    nc -v <Manager IP> 1234
    nc: connect to <Manager IP> port 1234 (tcp) failed: Connection timed out

  • NSX Manager syslog shows the following.

/var/log/syslog
[TIMESTAMP]kernel - - - [6156964.576589] Dropped per conn limit: IN=eth0 OUT= MAC=[MAC] SRC=[IP] DST=[IP] LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=[ID] DF PROTO=TCP SPT=1 DPT=1234 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0
[TIMESTAMP] kernel - - - [6156965.168706] Dropped per conn limit: IN=eth0 OUT=  MAC=[MAC] SRC=[IP] DST=[IP] LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23639 DF PROTO=TCP SPT=63076 DPT=1234 WINDOW=65535 RES=0x00 CWR ECE SYN URGP=0

  • App Proxy logs on Manager shows the following.

var/log/vmware/appl-proxy-rpc.log
[TIMESTAMP] - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="1849" level="ERROR" errorCode="NET1"] NetTransport[1] Couldn't create socket 'ssl://[Manager IP]:1234' with {timeout:10000ms acceptor:{reuse_address} linger:{off,0s} uds:{} sctx-id:[ID]}: open: Too many open files

  • Netstat output on manager show connections are maxing out on the App Proxy service

    netstat -anlp | grep -i "appl-proxy" | wc -l
    4096

  • Connections causing issue may be over 1234 or possibly 1236 in a federated environment, as per the example below.

    netstat -anlp | grep -i "appl-proxy"
    ....
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       9218099    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       5721157    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       14311779   1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       9252676    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       9170511    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       5780664    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       8805193    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       647886     1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       14053977   1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       8647752    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       8713716    1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       14346538   1836/appl-proxy
    tcp        0      0 <IP>:1236    <IP>:<PORT>   ESTABLISHED 1007       14285305   1836/appl-proxy
    ...

 

Environment

VMware NSX-T Data Center 3.x
VMware NSX 4.x

Cause

App Proxy is not processing the FIN packet for closed connections. As a result, stale connections with a status of established are causing the App Proxy to hit it's limit of 4096.

Resolution

This is a known issue affecting VMware NSX.

Workaround:

Restart the App Proxy service on the affected manager(s)

systemctl stop nsx-appl-proxy
systemctl start nsx-appl-proxy