Storage issue causes Controller connection to be stopped.
search cancel

Storage issue causes Controller connection to be stopped.

book

Article ID: 422666

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Storage Hardware on the ESXi host throws Error.
    Some NSX processes show no logs until storage is recovered.
    Below example shows 2 lines of log has no log for 10 days.
    2025-01-05T<TIME> In(182) cfgAgent[<PID>]: NSX 2101349 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" s2comp="nsx-sha" tid="<TID>" level="info"] ShaPmClientSocket::Start
    OrConnectProcedure: connected 1
    2025-01-15T<TIME> In(14) nestdb-server[<PID>]: Use default syslog
  • Transport Node Status given by following API shows Controller connection is DOWN.
    GET /api/v1/transport-zones/transport-node-status
            "control_connection_status": {
              "degraded_count": 0,
              "down_count": 1,
              "last_status_changed_time": <time>,
              "status": "DOWN",
              "status_description": "UNKNOWN_FAILURE_STATUS",
              "up_count": 0
            },

  • Control Channel To Transport Node Down Long Alarm is open.
    /var/log/cloudnet/nsx-ccp.log
    <TIMESTAMP> FATAL EventReportProcessor-1-1 EventReportSyslogSender <PID> MONITORING [nsx@6876 comp="nsx-manager" entId="<UUID>" eventFeatureName

    ="communication" eventSev="critical" eventState="On" eventType="control_channel_to_transport_node_down_long" level="FATAL" subcomp="ccp"] Controller service on Manager node <Manager_IP>
    (<Manager_UUID>) to Transport node <TN_FQDN> (<TN_UUID>) down for at least 15 minutes from Controller service's point of view.

Environment

VMware NSX 4.x

Cause

Storage availability may affect NSX functionalities on the host.

Resolution

Verify if the host can read / write log files on its storage after the storage is recovered.