Application on NSX node <node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team. Recommended Action Collect Support Bundle for NSX node <nsx manager> using NSX Manager UI or API.
2023-05-19T02:50:34.898Z local-manager NSX 85581 MONITORING [nsx@6876 alarmId="#######-8c4c-47aa-85a9-#########" alarmState="OPEN" comp="nsx-manager" entId="340cd33e-####-####-####-ff3b6fc90faf" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="application_crashed" level="FATAL" nodeId="d1be0142-####-####-####-d5ae7b37180b" subcomp="monitoring"] Application on NSX node local-manager has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team.
-rw-r--r-- 1 root root 21M XXX 3 13:50 core.resolver-execut.XXXXXX22639.1253.991.6.gz
-rw-r--r-- 1 root root 21M XXX 3 15:14 core.resolver-execut.XXXXXX7679.3203641.991.6.gz
2024-06-03T13:50:29.234Z ###-###-edge-a.######.####.com NSX 1253 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="tsdb-sender-napp"] Failed to send one msg timestamp: 1717421738#012entity: TIER0#012entity_id: "#######-4601-419b-a687-############"#012node_id: "#######-d25c-4a3c-9c65-##########"#012nsx_site_id: "#######-434e-4c51-b3af-##########"#012gfw {#012 obj_id: "#######-bb10-48f8-97d4-##########"#012 number_of_sessions: 0#012 number_of_bytes: 48986264#012}#012 from plugin #######-7846-417e-bf8d-##########:#012 <_InactiveRpcError of RPC that terminated with:#012#011status = StatusCode.UNAVAILABLE#012#011details = "failed to connect to all addresses"#012#011debug_error_string = "{"created":"@1717422629.233920228","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1717422629.233918134","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"#012>#012 Traceback (most recent call last):#012 File "/opt/vmware/nsx-netopa/lib/python/sha/core/channel/provider/tsdb_provider.py", line 671, in send_metrics#012 response = self._metric_stub.MetricsUpdate(msg, timeout=transmit_timeout,#012 File "/opt/vmware/nsx-netopa/lib/python/grpc/_channel.py", line 946, in __call__#012 return _end_unary_response_blocking(state, call, False, None)#012 File "/opt/vmware/nsx-netopa/lib/python/grpc/_channel.py", line 849, in _end_unary_response_blocking#012 raise _InactiveRpcError(state)#012grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:#012#011status = StatusCode.UNAVAILABLE#012#011details = "failed to connect to all addresses"#012#011debug_error_string = "{"created":"@1717422629.233920228","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1717422629.233918134","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"#012>
VMware NSX 4.x
The resolver-execute process is an Edge DNS process using grpc. A bug in grpc can cause this process to crash during network interruptions such as network related maintenances or outages.
The fix is addressed on VMware NSX 4.2.1 Release and above.
Note: As a workaround, Watchdog will automatically restart this resolver process after a crash.
Please reference KB 345792 for steps to clear the core dump files and the related VMware NSX alarm once the files are no longer needed.