Control Channel To Manager Node Down Too Long
search cancel

Control Channel To Manager Node Down Too Long

book

Article ID: 407326

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Hosts and/or edges have alarms related to control channel down
  • Running nsxcli -c get managers command from affected ESXi or get managers command from admin shell of affected edge node, reveals that one of the manager is in standby state

    # nsxcli -c get managers
    - <Ip address of NSX manager3>    Connected (NSX-RPC) 
    - <Ip address of NSX manager2>     Connected (NSX-RPC)
    - <Ip address of NSX manager1>    Standby (NSX-RPC)*

  • When checking for esxcli network ip connection list | grep 1234 (for edges - from root shell use command "netstat -anlp | grep 1234") you see TIME_WAIT towards manager connectivity instead of ESTABLISHED

    # esxcli network ip connection list | grep 1234
    tcp        0      0  <NSX manager 03 IP address>:1234       <Ip Address>:53356     TIME_WAIT   -
    tcp        0      0  <NSX manager 02 IP address>:1234       <Ip Address>:53356     TIME_WAIT   -
    tcp        0      0  <NSX manager 01 IP address>:1234       <Ip Address>:42133     TIME_WAIT   -  
  • When you run 'nsxcli -c get controllers' command, you see CONTROLLER_REJECTED_HOST_CERT 

    Controller IP     Port    SSL         Status          Is Physical  Master   Session State  Controller FQDN       Failure Reason
    <Controller-IP>   1235   enabled     disconnected       true                  down           NA                    CONTROLLER_REJECTED_HOST_CERT
    <Controller-IP>   1235   enabled     not used           false                 null           NA                    NA
    <Controller-IP>   1235   enabled     not used           false                 null           NA                    NA

Environment

VMware NSX

Cause

Versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes are instantiated using a certificate with validity period of 825 days.
NSX-T 3.x and NSX 4.2.1 and higher create Transport Nodes using a certificate with a validity period of 10 years.
The Transport Node certificate used when the node was created is not replaced on upgrade. 
Any Edge that may have been deployed on these versions or any Hosts prepared or re-prepared on these versions will have this shorter validity period certificate.

Resolution

Follow the below KB for resolution steps:
Alarm For Transport Node Certificate Has Expired