Failed to detach replica disks from SDR (Invalid configuration for device '0')
search cancel

Failed to detach replica disks from SDR (Invalid configuration for device '0')

book

Article ID: 410017

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When performing an OSAM migration on a HCX 4.11 or 4.11.1 of Linux machines, this will fail with the following error:

 

In the app.log you will also find similar entries 

app.1.log
Final sync failed. Failed to detach replica disks from SDR (Invalid configuration for device '0'.)","result":{"error":"Final sync failed. Failed to detach replica disks from SDR (Invalid configuration for device '0'.)"}}

 

####-##-##T##:##:##+##:## Service-Mesh-#######-to-VMWare-SRG-I1 syslog-ng 902 - - Error suspend timeout has elapsed, attempting to write again; fd='15'
####-##-##T##:##:##+##:## Service-Mesh-#######-to-VMWare-SRG-I1 syslog-ng 902 - - I/O error occurred while writing; fd='15', error='No space left on device (28)'
####-##-##T##:##:##+##:## Service-Mesh-#######-to-VMWare-SRG-I1 syslog-ng 902 - - Suspending write operation because of an I/O error; fd='15', time_reopen='10'
####-##-##T##:##:##+##:## Service-Mesh-#######-to-VMWare-SRG-I1 auditd 875 - - Error receiving audit netlink packet (No buffer space available)
####-##-##T##:##:##+##:## Service-Mesh-#######-to-VMWare-SRG-I1 cgw 1629 - - [Info-restServer] : sgw rp: GET /sgw/registry HTTP/1.1
####-##-##T##:##:##+##:## [OsAssistedMigrationService_SvcThread-73, Ent: HybridityAdmin, , TxId: .....................................] ERROR c.v.v.h.s.m.j.OsAssistedMigrationJob- Failed in final sync with jobData



####-##-##T##:##:##+##:## ERROR [Thread-1204129] c.v.h.S.RemoteSyncer - Error sending packet
java.io.IOException: Not connected to SDR
        at com.vmware.hcx.Fabric.SDRMTP.SDRConnectionManager.writePacket(SDRConnectionManager.java:113)
        at com.vmware.hcx.Fabric.SDRMTP.SDRConnectionManager.writePacket(SDRConnectionManager.java:87)
        at com.vmware.hcx.Sync.RemoteSyncer$1.run(RemoteSyncer.java:806)
        at java.base/java.lang.Thread.run(Unknown Source)
####-##-##T##:##:##+##:## ERROR [Thread-1204126] c.v.h.S.RemoteSyncer - Error sending packet
java.io.IOException: Not connected to SDR
        at com.vmware.hcx.Fabric.SDRMTP.SDRConnectionManager.writePacket(SDRConnectionManager.java:113)
        at com.vmware.hcx.Fabric.SDRMTP.SDRConnectionManager.writePacket(SDRConnectionManager.java:87)
        at com.vmware.hcx.Sync.RemoteSyncer$1.run(RemoteSyncer.java:806)
        at java.base/java.lang.Thread.run(Unknown Source)

 

 

which will fill up the partition var/log 

df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 1.6G 28M 1.6G 2% /run
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
/dev/sdb4 2.9G 1.8G 1.1G 63% /
/dev/sdb5 488M 1.5M 461M 1% /var/lib
/dev/sdb3 89M 57M 27M 69% /boot
/dev/sdb6 2.8G 2.8G 0 100% /var/log
/dev/sdb2 10M 2.1M 7.9M 22% /boot/efi
/dev/loop0 3.7M 19K 3.4M 1% /home/root/syncfs

Environment

VMware HCX 

Cause

During the OSAM migration, this is writing to the audit.log, which is unable to rotate the data and fill up the partition 

Resolution

Steps:
1. Make sure that there are no ongoing migrations for the host

2. Reboot SRG appliance

3. Once SRG boots up, run the following commands:
    - pvdisplay
    - vgdisplay
    - lvdisplay
  Note: if there are stale entries, the vgdisplay should list out the VG name

4. If there are stale entries we need to understand why reboot did not take care of removing them. Now if you trigger a migration at this point, you will run into the issue seen earlier which would look like below in /var/log/message of SRG appliance:

<131>1 2025-08-15T05:26:49+00:00 Netcracker-ServiceMesh-SRG-I1 sdrd 1904 - - [Err-sdr-replStor] : VG [######-####-####-####-####-####-######]: [/sbin/vgcreate -s 4096K /dev/sdf2] failed
<131>1 2025-08-15T05:26:49+00:00 Netcracker-ServiceMesh-SRG-I1 sdrd 1904 - - [Err-sdr-replStor] : VG [######-####-####-####-####-####-######]: stderr: A volume group called already exists.

5. If you don't see any stale entries for the above VG, a new migration can be triggered