vSphere Replication recovery point objective violations
search cancel

vSphere Replication recovery point objective violations

book

Article ID: 334259

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

Multiple VMs encounter RPO violations

Sync operation is in progress with the state of RPO violation

If few VM replications are paused, active VM replications completes fast. 

Validation Steps :

Run the following commands to get the information of the Sync status of the VM. 

vim-cmd vmsvc/getallvms > To get the Vmid
vim-cmd hbrsvc/vmreplica.getState
vim-cmd hbrsvc/vmreplica.getConfig
vim-cmd hbrsvc/vmreplica.queryReplicationState
vim-cmd hbrsvc/vmreplica.sync

Environment

vSphere replication 9.x 

Cause

RPO violations might occur for one of the following reasons:

  • Network connectivity problems between source ESXi hosts and vSphere Replication servers at the target site.
  • As a result of changing the IP address, the vSphere Replication server has a different IP address.
  • The vSphere Replication server cannot access the target virtual machine file system (VMFS) datastore.
  • Slow bandwidth between the source ESXi hosts and the vSphere Replication servers.

Cause Validation

  • Search the /var/log/vmkernel.log at the source ESXi host for the vSphere Replication server IP address to see any network connectivity problems.
  •  To validate the bandwidth run iperf command using KB:  312678 and check sender and receiver bandwidth. 
  • The output of iperf looks below (This needs to be collected from Source ESXI host to target vsphere replication appliance) 

    [root@s###-w###2:/vmfs/volumes/6###9-0d654912-##-6####0/log] /usr/lib/vmware/vsan/bin/iperf3 --client (Target VR appliance IP) --port 5201
    Connecting to host 1##.3#.8#, port 5201
    iperf3: getsockopt - Function not implemented
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec   768 KBytes  6.28 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   1.00-2.00   sec  1.00 MBytes  8.40 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   2.00-3.00   sec   640 KBytes  5.24 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   3.00-4.00   sec   512 KBytes  4.20 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   4.00-5.00   sec   640 KBytes  5.24 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   5.00-6.00   sec   640 KBytes  5.24 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   6.00-7.00   sec  1.12 MBytes  9.43 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   7.00-8.00   sec   512 KBytes  4.20 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   8.00-9.00   sec   768 KBytes  6.28 Mbits/sec    0   0.00 Bytes
    iperf3: getsockopt - Function not implemented
    [  5]   9.00-10.00  sec   768 KBytes  6.29 Mbits/sec    0   0.00 Bytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec  7.25 MBytes  6.08 Mbits/sec    0             sender     >>>>>>>>>>>>>>>>If the bandwidth is too low it impacts the speed of replication
    [  5]   0.00-10.03  sec  7.25 MBytes  6.07 Mbits/sec                  receiver

  

  • Check /var/log/vmware/hbrsrv.log on the vSphere Replication appliance at the target site for problems with the server accessing a target VMFS datastore. You see entries similar to these entries, matching the RPO violation times on the affected virtual machines:

    963Z [EAF7DB90 error 'Main'] HbrError stack:
    963Z [EAF7DB90 error 'Main'] [0] Class: NFC Code: 3
    963Z [EAF7DB90 error 'Main'] [1] NFC error: The operation experienced a network error
    963Z [EAF7DB90 error 'Main'] [2] Can't write remote file /vmfs/volumes/###-####-###-############/VM-replica/hbrcfg.#######-####-###-###-############.###.###.vmx
    963Z [EAF7DB90 error 'Main'] [3] Failed to write to file (instanceKey=226137) (type=vmx) (identifier=VM.vmx)
    963Z [EAF7DB90 error 'Main'] [4] Converting error to wire failure
    743Z [F3DFCB90 warning 'Libs'] [NFC ERROR] NfcNetTcpWrite: bWritten: -1
    743Z [F3DFCB90 warning 'Libs'] [NFC ERROR] NfcSendMessage: send failed: NFC_NETWORK_ERROR
    743Z [F3DFCB90 warning 'Libs'] [NFC ERROR] NfcFssrvr_IO: failed to send io message
    743Z [F3DFCB90 verbose 'PropertyProvider'] RecordOp ASSIGN: lastGroupError, Hbr.Replica.Group.GID-####-5###-###-###-############

 

Resolution

To resolve this issue:

  • Engage network team and validate the bottleneck of the low bandwidth issues.
  • Verify that the vSphere Replication server IP address is the same. If it is different, reconfigure all the replications so that the source ESXi hosts use the new IP address.

 

Additional Information