For a site where a pair of Edges are deployed in a High Availability topology, the diagnostic bundle contains the information for both active and standby edges. However, customer may find it only contains the bundle of active edge even if HA is working properly and Core Limit set to 0.
HA is functional:
All VMware VeloCloud SD-WAN edge deployed in a High Availability topology.
Check mgdha log of standby edge, find standby edge receives HA event from remote mgd and starts generating Diagnostic Bundle:
2025-01-08T07:39:47.376 DEBUG [ha (6949:HaWorker:7359)] Got task {'action': 'ha_gen_diag', 'HA_VERSION': '2.0', 'data': {'action': {'action': 'generateDiagnosticBundle', 'data': {'maxBundleSize': 180000000, 'options': {'maxCores': 0, 'type': 'diagnosticDump'}, 'requestId': '276ed91c-ca04-4938-9de1-7f581e305f2d'}, 'id': 188199}}}
2025-01-08T07:39:47.376 INFO [diag (6949:HaWorker:7359)] In StandbygenerateDiagnosticBundle
Issue happens when copying the Diagnostic Bundle to 169.254.2.1:
2025-01-08T07:42:46.921 DEBUG [hautils (6949:HaWorker:7359)] diag copy to 169.254.2.1169.254.2.1 failed Warning: Permanently added '169.254.2.1' (RSA) to the list of known hosts.
Authorized Users Only
Permission denied, please try again.
Received disconnect from 169.254.2.1 port 22:2: Too many authentication failures
Disconnected from 169.254.2.1 port 22
lost connection
As SD-WAN edge uses SCP to transmit Diagnostic Bundle, it means SSH does not working properly. Enter Standby edge's CLI and find ssh 169.254.2.1 uses password instead of public key authentication. This is the reason why SCP fails.
Expected behavior is using public key authentication without entering password:
Unexpected behavior indicates public key authentication does not work properly.
1.Enter /root/.ssh
2. Check the content of id_rsa.pub on a SD-WAN edge and authorized_keys_ha on the peer MUST be equal.
3. If they are not equal, copy the content of id_rsa.pub and overwrite the content of peer's authorized_keys_ha
4. edged service restart or sshd restart are not necessary, just try again and verify public key authentication works as expected.
5. Trigger a new Diagnostic Bundle on VCO web portal and verify the size: