Service Status Unknown" alarms about nsx-opsagent and nsx-cfgagent such as: "The service nsx-opsagent has been unresponsive for 10 seconds"YYYY-MM-DDTHH:MM:SS FATAL pool-118-thread-1 MonitoringServiceImpl 4817 MONITORING [nsx@6876 alarmId="########-####-####-####-##########" alarmState="OPEN" comp="nsx-manager" entId="########-####-####-####-############" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="service_status_unknown" level="FATAL" nodeId="########-####-####-####-############" subcomp="monitoring"] The service nsx-opsagent has been unresponsive for 10 seconds.
YYYY-MM-DDTHH:MM:SS FATAL pool-118-thread-1 MonitoringServiceImpl 4817 MONITORING [nsx@6876 alarmId="########-####-####-####-########ccca" alarmState="OPEN" comp="nsx-manager" entId="########-####-####-####-############" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="service_status_unknown" level="FATAL" nodeId="########-####-####-####-############" subcomp="monitoring"] The service nsx-cfgagent has been unresponsive for 10 seconds
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO" s2comp="metric-nsx.daemon.daemon-health"] Error in echo Daemon:nsx-opsAgent, error: RPC Call Failed with CallStatus(code=TIMEOUT), trace:Traceback (most recent call last): File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/daemon_health.py", line 226, in echo rsp = daemon_rpc_client.Echo(req, DEFAULT_TIMEOUT) File "/opt/vmware/nsx-nestdb/python/vmware/nsx/rpc/client/client.py", line 244, in call callback=SyncCallContext(timeout, client_context)) File "/usr/lib/vmware/nsx-com
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO" s2comp="fork-monitor"] keep-alive check received resp {'stats': [{'peak_pending_req': 4, 'resp_error': 0, 'req_error': 0, 'req_dropped_no_resource': 0, 'pending_req': 0, 'req': 1700, 'resp': 1699}, {0: {'req': 523, 'resp': 523}, 1: {'req': 509, 'resp': 509}, 2: {'req': 504, 'resp': 504}, 3: {'req': 85, 'resp': 85}, 4: {'req': 79, 'resp': 79}}], 'timestamp': 1261980.771895807, 'type': 1, 'executor': 3, 'seq': 1699} for req{'timestamp': 1261974.2128648979, 'timeout': 4, 'check': True, 'type': 1, 'seq': 1699}^@
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO"] PSM:nsx-opsagent ts is 1261979 interval is 42^@
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="CRITICAL" eventState="On" eventFeatureName="infrastructure_service" eventType="service_status_unknown" eventSev="critical"] The service nsx-opsagent has been unresponsive for 10 seconds.^@
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO" s2comp="metric-nsx.daemon.daemon-health"] Succeed in destroying rpc client:nsx-opsAgent^@
YYYY-MM-DDTHH:MM:SS nsx-exporter: NSX 2104636 - [nsx@6876 comp="nsx-esx" subcomp="agg-service" tid="2104662" level="INFO"] Flow count diffs in actual (21966) vs in FLOW_GET_NUMRECORDS (27900)
YYYY-MM-DDTHH:MM:SS nsx-opsagent[2104462]: NSX 2104462 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2104473" level="INFO"] RpcConnection[1971 Connected on tcp://127.0.0.1:4098 0] Closing (network error)
YYYY-MM-DDTHH:MM:SS nsx-opsagent[2104462]: NSX 2104462 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2104462" level="ERROR" errorCode="RPC103"] RpcCall:Server:Unary[vmware.nsx.daemon.DaemonHealth/Echo, 0x0000, LOCAL_ERROR] Is in error state (COMMUNICATION_ERROR, status is not reported to the Client)
.....
YYYY-MM-DDTHH:MM:SS nsx-opsagent[2104462]: NSX 2104462 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-rpc" tid="2104462" level="ERROR" errorCode="RPC104"] RpcCall:Server:Unary[vmware.nsx.daemon.DaemonHealth/Echo, 0x0000, ERROR] Can't send a message and complete the call (with status SUCCESS) in error state
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO"] PSM:nsx-cfgagent ts is 1262030 interval is 18^@
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO" s2comp="metric-nsx.daemon.daemon-health"] Error in echo Daemon:nsx-cfgAgent, error: RPC Call Failed with CallStatus(code=TIMEOUT), trace:Traceback (most recent call last): File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/daemon_health.py", line 226, in echo rsp = daemon_rpc_client.Echo(req, DEFAULT_TIMEOUT) File "/opt/vmware/nsx-nestdb/python/vmware/nsx/rpc/client/client.py", line 244, in call callback=SyncCallContext(timeout, client_context)) File "/usr/lib/vmware/nsx-common/lib/python/google/protobuf/service_reflection.py", line 267, in <lambda> self._StubMethod(
YYYY-MM-DDTHH:MM:SS nsx-sha: NSX 9543684 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="INFO" s2comp="fork-monitor"] keep_alive req sent: {'timestamp': 1262048.165605505, 'timeout': 4, 'check': True, 'type': 1, 'seq': 1701}^@
VMware NSX
VMware NSX-T Data Center
This is a cosmetic issue and this issue is resolved in VMware NSX 4.0.0.1, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB
Workaround:
To confirm the alarms triggered match this known cosmetic issue the following steps can be taken.
Check the ESXi host logs in /var/run/log/syslog.log:
/var/run/log/syslog.log for the presence of logging similar to:YYYY-MM-DDTHH:MM:SS NSX: opsAgent stopYYYY-MM-DDTHH:MM:SS NSX: Shutting down opsAgent watchdogYYYY-MM-DDTHH:MM:SS watchdog-opsAgent: Watchdog for opsAgent is now 1055850YYYY-MM-DDTHH:MM:SS watchdog-opsAgent: Terminating watchdog process with PID 1055850YYYY-MM-DDTHH:MM:SS watchdog-opsAgent: [1055850] Signal received: exiting the watchdogYYYY-MM-DDTHH:MM:SS NSX: Shutting down opsAgent serviceYYYY-MM-DDTHH:MM:SS NSX: Unable to shutdown opsAgent serviceYYYY-MM-DDTHH:MM:SS NSX: Terminating opsAgent serviceYYYY-MM-DDTHH:MM:SS NSX: opsAgent startYYYY-MM-DDTHH:MM:SS NSX: /var/log/nsx existsYYYY-MM-DDTHH:MM:SS NSX: opsAgent start watchdogYYYY-MM-DDTHH:MM:SS watchdog-opsAgent: [6349499] Begin '/usr/lib64/vmware/nsx-opsagent/bin/opsAgent', mn-uptime = 60, max-quick-failures = 100, max-total-failures = 1000, bg_pid_file = '', reboot-flag = '0'YYYY-MM-DDTHH:MM:SS watchdog-opsAgent: Executing '/usr/lib64/vmware/nsx-opsagent/bin/opsAgent'YYYY-MM-DDTHH:MM:SS NSX: opsAgent startedYYYY-MM-DDTHH:MM:SS opsAgent: Use default syslogYYYY-MM-DDTHH:MM:SS opsAgent: NSX 6349509 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="framewok" tid="6349509" level="INFO"] NSX OpsAgent start...For more information regarding the NSX version 4.0.0.1, please go through the release notes: NSX 4.0.0.1 release note