It is common to see an edge offline with events showing loss of communication to the VCO without any production data impact.
Environment
SD-WAN
Cause
This can be explained by the fact the edge has separate planes for traffic. When a break in the traffic destined for the VCO called Management Plane is seen you will generally not see any impact to production traffic called Data Plane.
Resolution
There are various reasons for this behavior. It is recommended that a thorough review of the edge in question be conducted to include:
1. Verify Physical and Network Connectivity
Check the Edge Device's Physical Connection: Ensure that the Edge device is powered on and connected to the network (verify that all physical network cables, wireless connections, or virtual NICs are functional).
Ping the Orchestrator: From the Edge device or a device in the same network, attempt to ping the Orchestrator's IP address (or DNS-resolved address) to confirm network reachability.
Command: ping <orchestrator-ip-or-domain>
If the ping fails, check for issues such as a misconfigured firewall or network segmentation that could be blocking communication.
2. Check DNS Resolution
The Edge device needs to be able to resolve the Orchestrator’s domain name via DNS. If DNS is misconfigured or not functioning correctly, the Edge won’t be able to reach the Orchestrator.
Verify DNS Configuration:
On the Edge device, check if the DNS server settings are correct (the Edge may use internal or public DNS servers to resolve the Orchestrator address).
Ensure that DNS servers are reachable and properly configured.
3. Check for Firewall or Port Blocking
The TLS connection used for heartbeat traffic can be blocked by network firewalls, which might prevent the Edge from communicating with the Orchestrator.
Required Ports:
TCP 443 (default port for SSL/TLS communication with the Orchestrator).
If any additional ports are used by Velocloud (such as UDP or custom ports for specific monitoring protocols), ensure those are open.
Inspect Firewall Logs: Check for any denied or dropped packets related to the Orchestrator's IP address or the Edge device.
Ensure that any local firewalls (on the Edge device or network appliances) are configured to allow outbound traffic to Velocloud's cloud-based Orchestrator.
4. Verify Network Path and Route Integrity
Route Verification: Ensure the Edge has a valid route to reach the Orchestrator. If there is a misconfigured static route or missing default gateway, the Edge may not be able to connect to the Orchestrator.
Traceroute: Use traceroute to diagnose where packets might be getting dropped or delayed between the Edge and the Orchestrator.
Command: traceroute <orchestrator-ip-or-domain>
5. Check for Proxy or NAT Issues
Proxy Servers: If the network is behind a proxy server, ensure that the Edge device is properly configured to use the proxy for communication with the Orchestrator.
NAT (Network Address Translation): If the Edge device is behind a NAT device, ensure that the appropriate ports and IP addresses are mapped correctly, and that the Edge can maintain a stable connection.
6. Verify Edge Device's System Health
Edge Device Resource Utilization: High CPU or memory utilization on the Edge device can cause issues with its ability to process heartbeats or maintain connections. Check the system resources on the Edge device (if available via CLI or UI).
Reboot the Edge Device: Sometimes, a reboot of the Edge can resolve issues where network connections or services are stuck.
7. Examine Edge Logs for Errors
Edge Device Logs: Check the logs on the Edge device for any error messages or warnings related to TLS/SSL handshake failures, network connectivity problems, or other connection-related issues. The log might indicate why the heartbeat messages are not being sent or received.
Logs often include messages related to certificate validation issues, authentication failures, or connectivity timeouts.
Log Files to Check: Look for any relevant logs related to Orchestrator communication, such as:
Heartbeat failure logs
SSL/TLS handshake logs
Connection timeout logs
8. Verify Edge’s Time and Date Configuration
Time Synchronization: Ensure that the Edge device's time settings are correctly synchronized. SSL/TLS certificates used to establish communication with the Orchestrator have expiration dates, and mismatched times on the Edge device can cause authentication failures.
NTP Settings: Verify that the Edge device has proper NTP (Network Time Protocol) configured to ensure time synchronization with an authoritative source.
Once these have been verified, we suggest opening a case with our support team.