vSphere Replication (Live Recovery) 9.0
vSphere Replication 8.0
The Security Profile is associated with network port numbers 31031 and 32032. Earlier editions of vSphere Replication used 44046 which is obsolete in vSphere Replication 8.x versions. Port 902 is ESXi host related to the NFC service.
Two specific distinctions to make with vSphere Replication. The vSphere Replication Appliance and 9 add-on vSphere Replication Servers configuration limits.
The vSphere Replication Appliance is the primary server and the vSphere Replication Servers are the secondary servers that communicate to the primary server over port 8123.
The Hosts hbr-agent service is enabled when vSphere Replication is configured to vCenter. The hbr service communicates with the vCenter's inventory and populates the vSphere Replication hbrsrv database of all ESXi hosts managed by vCenter.
Get the Vmid of the virtual machine
[root@esxi:~] vim-cmd vmsvc/getallvmsVmid Name File Guest OS Version
11 vmname [datastore] vmname/vmname.vmx OS_64Guest vmx-21
These are the tools related to vSphere Replication
[root@esxi:~] vim-cmd hbrsvc/Commands available under hbrsvc/:
vmreplica.abort vmreplica.pause
vmreplica.create vmreplica.queryReplicationState
vmreplica.disable vmreplica.reconfig
vmreplica.diskDisable vmreplica.resume
vmreplica.diskEnable vmreplica.startOfflineInstance
vmreplica.enable vmreplica.stopOfflineInstance
vmreplica.getConfig vmreplica.sync
vmreplica.getState
Get the destination vSphere Replication Appliance or vSphere Replication Server (add-on) that the virtual machine is replicating to
[root@esxi:~] vim-cmd hbrsvc/vmreplica.getConfig 11Retrieve VM replication configuration:
The VM is configured for replication with the following options:
VM Replication ID = GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Destination IP Address = x.x.x.x
Destination Port = 31031
Recovery Point Objective = 1440
Quiesce Guest OS = false
Enable Opportunistic Updates = false
Network Compression = false
Network Encryption = false
Paused for Replication = false
Disk scsi0:0 is configured for replication:
Device key = 2000
Replication ID = RDID-d27a4619-157e-414a-9803-427f822a4de5
To see if the virtual machine is replicating run a sync
[root@esxi:~] vim-cmd hbrsvc/vmreplica.sync 11
Force a replica synchronzation for the VM:
Get the state of the virtual machines replication in bytes of data transferring, run a few times to see progress.
[root@esxi:~] vim-cmd hbrsvc/vmreplica.getState 11Retrieve VM running replication state:
The VM is configured for replication. Current replication state: Group: GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (generation=9627309314105)
Group State: full sync (0% done: checksummed 0 bytes of 16 GB, transferred 0 bytes of 0 bytes)
DiskID RDID-d27a4619-157e-414a-9803-427f822a4de5 State: full sync (checksummed 0 bytes of 16 GB, transferred 0 bytes of 0 bytes)
[root@esxi:~] vim-cmd hbrsvc/vmreplica.getState 11Retrieve VM running replication state:
The VM is configured for replication. Current replication state: Group: GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (generation=9627309314105)
Group State: full sync (60% done: checksummed 9.6 GB of 16 GB, transferred 80 KB of 392 KB)
DiskID RDID-d27a4619-157e-414a-9803-427f822a4de5 State: full sync (checksummed 9.6 GB of 16 GB, transferred 80 KB of 392 KB)
[root@esxi:/var/run/log] cat hostd.log |less
yyyy-mm-ddThh:mm:ss.msZ info hostd[265382] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx-HMS-4293-61-1-f969 user=vpxuser:VSPHERE.LOCAL\Administrator] Event 575 : Sync started by System for virtual machine vmname on host esxi.domain.tld in cluster Cluster_name in ha-datacenter.
...
yyyy-mm-ddThh:mm:ss.msZ info hostd[264342] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 1136 ms, size 826908 (file transfers duration: 126 ms, prepare delta duration: 0 ms)
yyyy-mm-ddThh:mm:ss.msZ info hostd[264342] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 576 : Sync completed for virtual machine vmname on host esxi.domain.tld in cluster Cluster_name in ha-datacenter (826908 bytes transferred).
yyyy-mm-ddThh:mm:ss.msZ info hostd[265393] [Originator@6876 sub=Hbrsvc] ReplicationScheduler: stats updated for (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last duration was 1s, bandwidth was 0.79 MB/s; estimated duration is now 1s, estimated bandwidth is 0.79 MB/s.
[root@esxi:/var/run/log] cat hostd.log |grep "last delta";date
yyyy-mm-ddThh:mm:ss.msZ info hostd[265381] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 775 ms, size 0 (file transfers duration: 360 ms, prepare delta duration: 17 ms)
yyyy-mm-ddThh:mm:ss.msZinfo hostd[264342] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 1136 ms, size 826908 (file transfers duration: 126 ms, prepare delta duration: 0 ms)
yyyy-mm-ddThh:mm:ss.msZ info hostd[265393] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 1136 ms, size 826908 (file transfers duration: 126 ms, prepare delta duration: 0 ms)
yyyy-mm-ddThh:mm:ss.msZ info hostd[265381] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 618 ms, size 0 (file transfers duration: 118 ms, prepare delta duration: 14 ms)
yyyy-mm-ddThh:mm:ss.msZ info hostd[265393] [Originator@6876 sub=Hbrsvc] Replication group (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last delta duration 618 ms, size 0 (file transfers duration: 118 ms, prepare delta duration: 14 ms)
[root@esxi:/vmfs/volumes/611a2ae5-aa26b7de-6322-00505601e86b/log] date
Day Month Day HH:MM:SS UTC YYYY
[root@esxi:/var/run/log] cat hostd.log |grep "last duration"
yyyy-mm-ddThh:mm:ss.msZ info hostd[265381] [Originator@6876 sub=Hbrsvc] ReplicationScheduler: stats updated for (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last duration was 1s, bandwidth was 0.00 MB/s; estimated duration is now 1s, estimated bandwidth is 1.00 MB/s.
yyyy-mm-ddThh:mm:ss.msZ info hostd[265393] [Originator@6876 sub=Hbrsvc] ReplicationScheduler: stats updated for (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last duration was 1s, bandwidth was 0.79 MB/s; estimated duration is now 1s, estimated bandwidth is 0.79 MB/s.
yyyy-mm-ddThh:mm:ss.msZ info hostd[265393] [Originator@6876 sub=Hbrsvc] ReplicationScheduler: stats updated for (groupID=GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx): last duration was 1s, bandwidth was 0.00 MB/s; estimated duration is now 1s, estimated bandwidth is 0.79 MB/s.
To track the vSphere Replication network. Replace the x.x.x.x with the your actual IP's in your environment. For this example the last octet is provided to understand direction of packets with capturing network data.
From the vCenter command line
Name: vc_fqdn.domain.tld
Address: x.x.x.4
From vCenter (IP x.x.x.4) to
local
vSphere Replication on port 8043
root@vc [ ~ ]
# curl -v telnet://x.x.x.5:8043
* Rebuilt URL to: telnet:
//x.x.x
.5:8043/
* Trying x.x.x.5...
* TCP_NODELAY
set
* Connected to x.x.x.5
(x.x.x.5
) port 8043 (
#0)
From vCenter to remote vCenter over port 443
root@vc [ ~ ]
# curl -v telnet://x.x.x.2:443
* Rebuilt URL to: telnet:
//x.x.x.
2:443/
* Trying x.x.x.
2
...
* TCP_NODELAY
set
* Connected to x.x.x.
2
(x.x.x.
2
) port 443 (
#0)
From Host to
local
vSphere Replication on port 80
[root@Host:~] nc -z .x.x.x.5
Connection to x.x.x.5 80 port [tcp
/http
] succeeded!
From Host to remote vSphere Replication on port 31031
[root@Host:~] nc -z x.x.x.3 31031
Connection to x.x.x.3 31031 port [tcp/*] succeeded!
From vSphere Replication to
local
Host on port 902
root@vr [ ~ ]
# curl -v telnet://x.x.x.6:902
* Trying x.x.x.6:902...
* Connected to x.x.x.6 (x.x.x.6) port 902 (
#0)
---
From vSphere Replication to
local
vCenter on port 80 and 443
root@vr [ ~ ]
# curl -v telnet://x.x.x.4:80
* Rebuilt URL to: telnet:
//x.x.x.4
:80/
* Trying x.x.x.4...
* TCP_NODELAY
set
* Connected to x.x.x.4 (x.x.x.4) port 80 (
#0)
root@vr [ ~ ]
# curl -v telnet://x.x.x.4
:443
* Rebuilt URL to: telnet:
//x.x.x.4
:443/
* Trying x.x.x.4
...
* TCP_NODELAY
set
* Connected to x.x.x.4
(x.x.x.4
) port 443 (
#0)
---
From vSphere Replication to remote vCenter on port 80 and 443
root@wvr [ ~ ]
# curl -v telnet://x.x.x.2:80
* Trying x.x.x.2:80...
* Connected to x.x.x.2 (x.x.x.2) port 80 (
#0)
root@vr [ ~ ]
# curl -v telnet://x.x.x.2:443
* Trying x.x.x.2:443...
* Connected to x.x.x.2 (x.x.x.2) port 443 (
#0)
To check for IP conflict on the vSphere Replication command line
root
@
vr [ ~ ]# ifconfig -a
root
@w
vr [ ~ ]# nslookup <IP address>
For the purpose of this information vSphere Replication Appliances are vr1 and vr2.
Isolating the Network Traffic of vSphere Replication KB 78613
Login to the vSphere Replication Appliance command line (putty session)
If you see 10-eth0.network. The ESXi host is using the default vmk0 to replicate data.
root
@vr1
[ ~ ]# cd /etc/systemd/network
root
@vr1
[ /etc/systemd/network ]# ls -l
-rw-r--r--
1
root root
197
Jan
13
19
:
20
10
-eth0.network
If you see 10-eth1.network and 10-eth2.network. There is a dedicated replication configuration on the ESXi hosts.
root
@vr1
[ /etc/systemd/network ]# ls -l
-rw-r--r--
1
root root
119
Jan
13
19
:
20
10
-eth0.network -> Management
-rw-r--r--
1
root root
117
Jan
13
19
:
20
10
-eth1.network -> VR Traffic
-rw-r--r--
1
root root
117
Jan
13
19
:
20
10
-eth2.network -> VR NFC Traffic
Check the current arp/network tables. The ports 8043 are vSphere Replication point-to-point of the paired configuration between vr1 and vr2.
root
@vr1
[ /etc/systemd/network ]# netstat –r |egrep -i
"State|8043"
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp6
0
0
vr1.domain.tld
:
33040
vr2.domain.tld:
8043
ESTABLISHED
tcp6
0
0
vr1.domain.tld
:
36796
vr1.domain.tld
:
8043
ESTABLISHED
tcp6
0
0
vr1.domain.tld
:
8043
srm1.domain.tld
:
50316
ESTABLISHED
tcp6
0
0
vr1.domain.tld
:
8043
vr1.domain.tld
:
36796
ESTABLISHED
tcp6
0
0
vr1.domain.tld
:
8043
vr2.domain.tld
:
46902
ESTABLISHED
On the ESXi host . Identify and make a note of these parameters PortNum, ClientName, and tmp directory related to the replication information you discovered in the vCenter UI for the ESXi host.
Login to the ESXi Host where the vSphere Replicated VM resides to get the destination vSphere Replication IP and port number.
run command: vim-cmd getallvms |grep vm_name
Vmid Name File Guest OS Version
11 vmname [datastore] vmname/vmname.vmx OS_64Guest vmx-21
use the vmid to run command syntax: vim-cmd hbrsvc/vmreplica.getConfig <vmid>
ex: vim-cmd hbrsvc/vmreplica.getConfig
11
Retrieve VM replication configuration:
The VM is configured
for
replication with the following options:
VM Replication ID = GID-xxxxxxxx-xxxx-xxxx-
xxxx
-xxxxxxxxxxxx
Destination IP Address =
x.x.x.x
<--- vSphere Replication target IP, or paired vSphere Replication
Destination Port =
31031
<--- vSphere Replication target IP port
Recovery Point Objective =
1440
Quiesce Guest OS =
true
Enable Opportunistic Updates =
false
Network Compression =
false
Network Encryption =
false
Paused
for
Replication =
false
Disk scsi0:
0
is configured
for
replication:
Device key =
2000
Replication ID = RDID--d27a4619-157e-414a-9803-427f822a4de5
Login to the ESxi host where the vSphere Replication Appliance resides/running.
[root
@esxi
:/tmp] which net-stats
/bin/net-stats
[root
@esxi
:/tmp]# net-stats -l
PortNum Type SubType SwitchName MACAddress ClientName
2214592523
4
0
vSwitch0
xx
:
xx
:
xx
:
xx
:xx:
xx
vmnic0 <---
for
default
vmk0 the replication is on
this
uplink
How to find the uplink the replication VM is using on the ESXi host
[root
@esxi
:/tmp]# esxcli network vm list
World ID Name Num Ports Networks
-------- ---------- --------- --------
265948
vr1
1
VM Network - Management
[root
@esxi
:/tmp]# esxcli network vm port list -w
265948
Port ID:
67108899
vSwitch: vSwitch0
Portgroup: VM Network - Management
DVPort ID:
MAC Address:
xx
:
xx
:
xx
:
xx
:
xx
:xx
IP Address:
x.x
.x
.x
Team Uplink: vmnic0 <---------- vsphere replication VM vr1 is using vmnic0
Uplink Port ID:
2214592523
Active Filters:
Using the pktcap-uw tool in ESXi 5.5 and later KB 2051814
The vmnic is the uplink and the vmk is the kernel port. The PortNum is the virtual switch port id for the uplink.
To capture packets run the pktcap-uw command at both sites simultaneously: you will need to edit the switch port id for the uplink and vmnic (221459252 and vmnic0) based on the customer's configuration found for replication.
[root
@esxi
:/tmp]# pktcap-uw --switchport
2214592523
-o /tmp/
2214592523
.pcap & pktcap-uw --uplink vmnic0 -o /tmp/vmnic0.pcap &
or
[root
@esxi
:/tmp]# pktcap-uw --trace --ip destination_ip > ip.pcap &
or replace X with vmnic number
[root
@esxi
:/tmp]# pktcap-uw --dir
2
--uplink vmnicX -o -| tcpdump-uw icmp -enr -
You can stop pktcap-uw tracing with the kill command:
kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)
Run this command to check that all pktcap-uw traces are stopped:
lsof |grep pktcap-uw |awk '{print $1}'| sort -u
To read the packet capture live or upload the pcap files and/or use wireshark (download | open pcap file | work with a pcap file).
[root@esxi:/tmp]# tcpdump-uw -ttttnnr 2214592523.pcap |grep 31031