vSphere Replication takes 24 hours to start after removing a large number of hosts from the environment
search cancel

vSphere Replication takes 24 hours to start after removing a large number of hosts from the environment

book

Article ID: 301325

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:
The hbr service is not starting on port 8123 for 24 hours after starting the vSphere Replication appliance.

Environment

VMware vSphere Replication 6.0.x
VMware vSphere Replication 6.1.x
VMware vSphere Replication 5.6.x
VMware vSphere Replication 5.1.x
VMware vSphere Replication 6.x
VMware vSphere Replication 5.8.x
VMware vSphere Replication 5.5.x
VMware vSphere Replication 6.0 Beta
VMware vSphere Replication 6.5.x
VMware vSphere Replication 5.x

Cause

vSphere replication stores a list of host ip's and connects with each one with extremely persistent approach. In some cases, where over 100 hosts are changed and are no longer visible, the hbr service will take 24 hours to start.

Resolution

To resolve this issue:
  1. Stop the hbr service by running this command:

    service hbrsrv stop
     
  2. Take a backup of the database by running this command:

    cp /etc/vmware/hbrsrv.54.db /etc/vmware/hbrsrv.54.db.bak
     
  3. Run this query for IP's:

    sqlite3 /etc/vmware/hbrsrv.54.db 'SELECT addresses FROM HostInfo' > /tmp/addresses.txt
     
  4. Open the /tmp/addresses.txt file using a text editor.
  5. Run this command to remove the comma:

    :%s;,;\r;g
     
  6. Run this command to determine what pings and what does not:

    nohup cat /tmp/addresses.txt | xargs -n1 ping -c 1 > /tmp/pings.txt
     
  7. Run this command to determine bad IP's:

    cat /tmp/pings.txt | grep -B 3 "100%" | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | uniq > /tmp/badips.txt
     
  8. Run this command to determine good IP's:

    cat /tmp/pings.txt | grep -B 3 " 0%" | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | uniq > /tmp/goodips.txt
     
  9. Run this command to test good IP's:

    cat /tmp/goodips.txt | xargs -n1 ping -c 1
     
  10. Run this command to test bad IP's:

    cat /tmp/badips.txt | xargs -n1 ping -c 1
     
  11. Run this command to known the number of bad IP's:

    cat /tmp/badips.txt | wc
     
  12. Run this command to prepare your SQL statement:

    cp /tmp/badips.txt /tmp/sqlstatement.txt
     
  13. Edit the statement:

    vi /tmp/sqlstatement.txt
     
  14. Run this command to add quotes and percentage around IP's:

    :%s/^\(.*\)$/"%\1%"'/
     
  15. Add SELECT statement at beginning of each line:

    For example:

    :%s!^!sqlite3 /etc/vmware/hbrsrv.54.db 'SELECT addresses FROM HostInfo WHERE addresses LIKE !
     
  16. Save file:

    :wq
     
  17. Make SQL statement executable:

    chmod +x /tmp/sqlstatement.txt
     
  18. Run this query:

    /tmp/sqlstatement.txt

    Should return all the fields with bad IP's similar to step #1.
     
  19. Copy to new file by running this command:

    cp /tmp/sqlstatement.txt /tmp/deletesql.txt
     
  20. Edit the new file using this command:

    vi /tmp/deletesql.txt
     
  21. Replace SELECT with DELETE:

    :%s;SELECT addresses;DELETE FROM;g
     
  22. Save file:

    :wq
     
  23. Execute the deletesql.txt file:

    chmod +x /tmp/deletesql.txt
     
  24. EXECUTE:

    /tmp/deletesql.txt
     
  25. VERIFY:

    /tmp/sqlstatement.txt

    Note: The output should be no results.
     
  26. Start the hbr service.