/var/log/corfu/corfu.9000.log shows error:<Time-stamp> | ERROR | WrapperSimpleAppMain | o.c.infrastructure.CorfuServer | CorfuServer: Server exiting due to unrecoverable error: org.corfudb.runtime.exceptions.DataCorruptionException: Checksum mismatch detected while trying to read fileOR:
<Time-stamp>| ERROR | WrapperSimpleAppMain | o.c.infrastructure.CorfuServer | Failed starting server
<Time-stamp> | ERROR | WrapperSimpleAppMain | o.c.infrastructure.CorfuServer | Failed starting server org.corfudb.runtime.exceptions.DataCorruptionException: Can't parse metadata. Segment File: /config/corfu/log/297779.log. File size: 3872130. File position: 3871666
...
...
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero)./var/log/corfu/corfu-compactor-audit.log may show message "Tried to get layout from <node with corruption IP>:9000 but failed by timeout"
/var/log/syslog reports the below error2026-05-19T05:01:23.622Z nsx-mgr01 NSX 3489 - - getClusterStatus: Error while fetching layout from 10.##.##.##:9000. Exception: 2026-05-19T05:01:23.994Z nsx-mgr01 NSX 2342 - - Connect Async 10.##.##.##:9000(END)When the Corfu server starts up, it loads binary data files and verifies checksums. If a data file is corrupted, Corfu cannot recover on its own. The cluster of Corfu nodes protects against this scenario. If Corfu data files are corrupted on a node, then the node needs to be removed and replaced with a new node.
The issue can also occur in Greenfield VCF deployments if there are underlying storage issues on the host hosting NSX Managers, and if remnant data exists on the LUN.
If more than 1 NSX Manager node in a 3 node cluster is showing indications of Corfu corruption, please gather full NSX support bundles from all three NSX Managers by running the following command from the admin shell, copying the resulting files from the default location they will be written to (/image/vmware/nsx/file-store/), and then open a case with Broadcom Support for further assistance.
admin> get support-bundle file <nsx-manager-name>.tgz
If only one NSX Manager node shows signs of Corfu corruption, continue below.
Process to remove and replace corrupted Manager node:
Confirm if the NSX Manager VM to be redeployed is the orchestrator node:
nsxmanager1> get service install-upgrade
nsx-mngr> get service install-upgrade
Service name: install-upgrade
Service state: running
Enabled on: #.#.#.# <<< orchestrator nodeIf the node to be replaced is the orchestrator, change the orchestrator to a manager appliance that is not being replaced (NOTE: This command needs to be run from one of the NSX Manager nodes that you are not going to replace/detach.)
nsxmanager2> set repository-ip
Record config settings for node to be replaced
To find the node UUID of the NSX Manager to be replaced
nsxmanager2> get nodes
Detach the node from the cluster via admin CLI of a node not being replaced
nsxmanager2> detach node <UUID>
Check cluster status to ensure the node has been removed from all cluster services
nsxmanager2> get cluster status
Power off detached node and delete from disk via vSphere UI
Deploy new NSX Manager Appliance via NSX UI System > Appliances > Add NSX Appliance
Wait for repo_sync to complete after cluster stabilizes in the NSX UI (this process can take some time)
Alternatively:
If the failed NSX Manager was auto-deployed through the NSX UI, instead the corrupt NSX Manager can be deleted in the NSX Manager UI under System > Appliances and the Delete option for the corrupt node. Then a new NSX Manager node can deployed to return the cluster to 3 nodes.
Workaround:
If all 3 NSX Manager nodes Corfu data files are corrupted then the only recourse is to restore from valid NSX Backup. Reference documentation at Restore a Backup