Log Line Analysis:
Edge /var/log/syslog*
- 142585:2024-##-##T##:##:##.###Z <Edge-VM-Name01> NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel 192.###.###.35:192.###.###.39 state changed from Up to Unreachable
- The remote endpoint is not sending BFD information to the local endpoint due to
- Incorrect VLAN Tagging in the physical
- Double VLAN Tagging in the NSX Uplink Profile
- Firewall in physical environment is blocking communication between TEPs
- Router in physical environment is unable to send packets to TEP endpoints
- The environment is busy and BFD packets are getting delayed or dropped between TEP endpoints
- These tunnel endpoints have experienced a BFD connectivity timeout. The tunnel has gone down because (non-exhaustive list of examples):
- 142622:2024-##-##T##:##:##.###Z <Edge-VM-Name01> NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel 192.###.###.35:192.###.###.39 state changed from Unreachable to Up
- These tunnel endpoints have begun receiving BFD information again. The tunnel is returning to functional status.
- NOTE: Seeing these two log lines in close proximity frequently between the same two endpoints is an indication of network flapping or high latency at one endpoint or a point in between.
ESXi host /var/log/vmkernel*
- 2024-##-##T##:##:##.###Z cpu59:2098707)BFD_HandleStatusChange:709:[nsx@6876 comp="nsx-esx" subcomp="bfd"]local: 192.###.###.34, remote: 192.###.###.23, oldState: up, newState: down, diag: Control Detection Time Expired, type: overlay
- Log line detailing a tunnel is down between the two IP addresses listed
- 2024-##-##T##:##:##.###Z cpu36:2098706)BFD_HandleStatusChange:709:[nsx@6876 comp="nsx-esx" subcomp="bfd"]local: 192.###.###.34, remote: 192.###.###.23, oldState: down, newState: init, diag: Control Detection Time Expired, type: overlay
- Log line detailing a tunnel is coming up/connectivity has been restored between the two IP addresses listed
2024-##-##T##:##:##.###Z cpu36:2098706)BFD_HandleStatusChange:709:[nsx@6876 comp="nsx-esx" subcomp="bfd"]local: 192.###.###.34, remote: 192.###.###.23, oldState: down, newState: up, diag: No Diagnostic, type: overlay
- Log line detailing a tunnel is fully up and capable of processing TEP traffic
If you are contacting Broadcom Support about this issue, please provide the following:
- Retrieve log bundles from all NSX Edges and all NSX prepared ESXi hosts with TEP/BFD Tunnels reporting down
- Retrieve log bundles from all NSX Managers
Handling Log Bundles for offline review with Broadcom support