NSX alarms report "Service Status Unknown" intermittently for different ESXi hosts
search cancel

NSX alarms report "Service Status Unknown" intermittently for different ESXi hosts

book

Article ID: 422361

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

 

  •  Intermittently, the cfgagent of the hosts goes to an unresponsive state. 
  •  Keep-alive expired messages have been noticed for different daemons.
  •  There were a lot of dropped packets in the vmsyslogd-dropped Logs.

    Manager - var/log/phonehome-coordinator.log

    NSX-MGR NSX 3726 MONITORING [nsx@6876 alarmId="####d060-####-475d-####-c80920d3####" alarmState="OPEN" comp="nsx-manager" entId="00000000-0000-0000-0000-0000000004c5" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="service_status_unknown" level="FATAL" nodeId="####7d8-386a-####-####-044bd6e7####" subcomp="monitoring"] The service nsx-cfgagent has been unresponsive for 10 seconds. 

    Host - var/log/nsx-syslog.log

    nsx-syslog.X:2025-12-07T14:03:25Z Wa(180) nsx-sha: NSX 2104735 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="WARNING"] NsxRpcConnection (<vmware.nsx.rpc.client.transport.NsxRpcConnection object at 0xd7a19db910>) closing connection because Keepalive expired: timeout 60sec, last ping 338, last pong 335

    nsx-syslog.X:2025-12-07T14:03:49.017Z In(182) vdpi[2103924]: NSX 2103924 - [nsx@6876 comp="nsx-esx" subcomp="nsx-vdpi" s2comp="nsx-rpc" tid="2103979" level="INFO"] RpcConnection[34 Connected to tcp://127.0.0.1:2480 0] Closing (keepalive expired)

    nsx-syslog.X:2025-12-07T14:03:49.018Z In(182) vdpi[2103924]: NSX 2103924 - [nsx@6876 comp="nsx-esx" subcomp="nsx-vdpi" s2comp="nsx-rpc" tid="2103979" level="INFO"] RpcConnection[34 Closed to tcp://127.0.0.1:2480 0] Notifying channels on connection down (keepalive expired)

    nsx-syslog.X:2025-12-07T14:03:49.430Z In(182) nsx-exporter[2103504]: NSX 2103504 - [nsx@6876 comp="nsx-esx" subcomp="nsx-fabric-exporter" s2comp="nsx-rpc" tid="2103764" level="INFO"] RpcConnection[41 Connected to tcp://127.0.0.1:2480 0] Closing (keepalive expired)

    nsx-syslog.X:2025-12-07T14:03:49.432Z In(182) nsx-exporter[2103504]: NSX 2103504 - [nsx@6876 comp="nsx-esx" subcomp="nsx-fabric-exporter" s2comp="nsx-rpc" tid="2103764" level="INFO"] RpcConnection[41 Closed to tcp://127.0.0.1:2480 0] Notifying channels on connection down (keepalive expired)

    nsx-syslog.X:2025-12-07T14:03:50.342Z In(182) nsx-opsagent[2103147]: NSX 2103147 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2103276" level="INFO"] RpcConnection[98 Connected on tcp://127.0.0.1:4554 0] Closing (keepalive expired)

    nsx-syslog.X:2025-12-07T14:03:50.356Z In(182) nsx-opsagent[2103147]: NSX 2103147 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2103276" level="INFO"] RpcConnection[98 Closed on tcp://127.0.0.1:4554 0] Notifying channels on connection down (keepalive expired)

    nsx-syslog.X:2025-12-07T14:04:13.562Z In(182) nsx-proxy[2102674]: NSX 2102674 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2102708" level="INFO"] RpcConnection[84 Connected to tcp://127.0.0.1:2480 0] Closing (keepalive expired)

    nsx-syslog.X:2025-12-07T14:04:13.565Z In(182) nsx-proxy[2102674]: NSX 2102674 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2102708" level="INFO"] RpcConnection[84 Closed to tcp://127.0.0.1:2480 0] Notifying channels on connection down (keepalive expired)

    nsx-syslog.X:2025-12-07T14:04:41.153Z In(182) cfgAgent[2102599]: NSX 2102599 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" s2comp="nsx-rpc" tid="4443D700" level="info"] RpcConnection[67 Connected to tcp://127.0.0.1:2480 0] Closing (keepalive expired)

    Host - var/log/vmsyslogd-dropped.log

    cat vmsyslogd-dropped.* | wc -l

    9491

    <<<Sample log>>>

    Host - var/log/dfwpktlogs.log

    head -n 1 dfwpktlogs.9

    2025-12-08T08:58:54.331Z No(13) FIREWALL-PKTLOG[43311946]: 43df#### INET TERM PASS 10330 OUT TCP FIN ##.##.##.##/42897->##.##.##.##/8247 24/15 1799/1441 Logging_Name

    └─$  tail -n 1 dfwpktlogs.log

    2025-12-08T08:59:59.239Z No(13) FIREWALL-PKTLOG[43311946]: 43df#### INET TERM PASS 10337 IN UDP ##.##.##.##/19644->##.##.##.##/53 1/1 84/228 Logging_Name

    └─$ cat dfwpktlogs.* | wc -l

    639428

  • Within 65 Seconds, there were 639428 logging events, which is roughly 9837 logs/sec.

Environment

VMware NSX

Cause

Excessive logging could lead to a 'keep alive' issue.

Resolution

Logging has to be disabled for the excessive traffic expected by the firewall policy.