During time of issue if we run any get commands on the Edge CLI we would get below error.
% An unexpected error occurred: The dataplane service is in error state, has failed or is disabled aggrtr4>
In the Syslog on the Edge VM we would see below Errors.
2023-08-14T18:07:14.532Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="dp-si-purge5" level="WARN" eventId="vmwNSXRCUBlockStatus"] {"event_state":0,"event_external_reason":"dp-ipc18 thread blocked to enter RCU quiesce state","event_src_comp_id":"6b7ae106-3aed-45d6-9735-d4be90b7e815","event_sources":{"process_name":"dp-fp:0#012","thread_id":"dp-ipc18","quiesce_blocked_time":"128000"}}
2023-08-14T18:07:22.432Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 256000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:09:22.532Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="dp-si-purge5" level="WARN"] blocked 256000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:10:30.053Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="INFO" eventId="vmwNSXRCUBlockStatus"] {"event_state":1,"event_external_reason":"all threads exited RCU quiesce blocked state","event_src_comp_id":"6b7ae106-3aed-45d6-9735-d4be90b7e815","event_sources":{"process_name":"dp-fp:0#012","thread_id":"all-threads","quiesce_blocked_time":"0"}}
2023-08-14T18:10:30.058Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="dp-si-purge5" level="INFO" eventId="vmwNSXRCUBlockStatus"] {"event_state":1,"event_external_reason":"all threads exited RCU quiesce blocked state","event_src_comp_id":"6b7ae106-3aed-45d6-9735-d4be90b7e815","event_sources":{"process_name":"dp-fp:0#012","thread_id":"all-threads","quiesce_blocked_time":"0"}}
2023-08-14T18:10:31.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 1000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:10:32.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 2000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:10:34.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 4000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:10:38.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 8000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:10:46.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 16000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:11:02.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 32000 ms waiting for dp-ipc18 to quiesce
2023-08-14T18:11:34.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu1" level="WARN"] blocked 64000 ms waiting for dp-ipc18 to quiesce
2023-08-14T17:10:35.198Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 437740ms poll interval (436096ms user, 76ms system)
2023-08-14T17:18:04.917Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 449719ms poll interval (437555ms user, 100ms system)
2023-08-14T17:25:23.293Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 438375ms poll interval (438252ms user, 48ms system)
2023-08-14T17:33:00.841Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 453069ms poll interval (437248ms user, 160ms system)
2023-08-14T17:40:17.176Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 436336ms poll interval (436176ms user, 52ms system)
2023-08-14T17:47:57.270Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 449515ms poll interval (438190ms user, 104ms system)
2023-08-14T17:55:13.882Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 436612ms poll interval (436405ms user, 48ms system)
2023-08-14T18:03:06.432Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 469795ms poll interval (442090ms user, 268ms system)
2023-08-14T18:10:30.052Z edgenode.fqdn NSX 5038 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18" level="WARN"] Unreasonably long 443620ms poll interval (440017ms user, 140ms system)nsx-edge" subcomp="datapathd" s2comp="timeval" tname="dp-ipc18"
Impact/Risks:
Customer experiences edge failover intermittently and Datapath issue
LACP PDU's getting dropped