When customer upgrade SD-WAN edge in High Availablity pair, customer may find software upgrade failed with multiple "Software download failed", "High Availability Going Active" and "HA Failover Identified" events. At the end, the software versions remains at old version instead of the new version. Check the mgd log file customer may find active SD-WAN edge downloaded the new software image successfully.
2024-12-28T14:10:36.780 DEBUG [mgd (6758:Thread-322298:22924)] Downloaded 100% of Software Update 5.2.4.0 build R5240-20240828-GA-87fd4dc4cd
2024-12-28T14:10:36.878 INFO [update (6758:Thread-322298:22924)] Downloaded software update https://vco01.tchybrid.com/upload//fileDownload into /velocloud/images/image.R5240-20240828-GA-87fd4dc4cd.zip in 33.641970 seconds, size=184148331, sha1=436c35d06455315827b1b4e3cdda72d4ddd453b0
VMware by Broadcom SD-WAN edge supported versions
As per the upgrade process, after the SD-WAN active edge downloads the new software image, it transmits the new software image to the standby edge first and starts a 5-minute timer. If the standby edge upgrades successfully, in the new HA heartbeat message, the active edge detects the standby edge version changes and starts upgrading itself. If the standby edge fails to upgrade and the timer times out, the active edge starts upgrading itself.
In this case, standby edge's disk entered disk read-only mode and cause software image transmission failure:
2024-12-28T14:10:45.549 DEBUG [ha (6758:HaWorker:6990)] image R5240-20240828-GA-87fd4dc4cd copy to 169.254.2.2 failed Warning: Permanently added '169.254.2.2' (RSA) to the list of known hosts.
Authorized Users Only
scp: /velocloud/images//image.R5240-20240828-GA-87fd4dc4cd.zip: Read-only file system
After multiple attempts, the active edge reports an event 'Failed HA Standby update with new software version' and starts upgrading itself. As the active edge does not have a disk issue, it successfully upgrades to the new version. However, during the reboot process, the previous standby edge becomes the active edge running the old software version. Due to a disk read-only issue, it still cannot download the image version from the SD-WAN orchestrator. After HA is ready, it resynchronizes the old software image to the standby unit, which is running the new software image. This behavior ultimately rolls back the software version to the old version for both edges.
The issue is very likely fixed after a power cycle of the defective SD-WAN edge. If the issue persists after a power cycle, replace the problematic SD-WAN edge and initiate a RMA process via a standard Broadcom technical support case.