RPO violation due to Host connectivity issues in VMware Cloud Director Availability 4.x
search cancel

RPO violation due to Host connectivity issues in VMware Cloud Director Availability 4.x

book

Article ID: 388076

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • This article addresses an RPO violation for a virtual machine residing on Host-X. The issue arises due to connectivity failures between the host and the replicator, preventing successful replication.
  • Replicator Logs show below entries: The following error message indicates that the replicator was unable to authenticate with the host, potentially leading to an RPO violation:

2025-01-24 00:00:24.985 WARN - c.v.h.c.e.ExceptionConversionService: Unable to convert exception. Using fallback exception instead.

com.vmware.vim.binding.hbr.replica.fault.HbrRuntimeFault: Exception Vmacore::Exception: Can't login to the host

  • HBRSRV Logs show below entries: Connection timeouts and authentication failures observed in HBRSRV logs:

2025-01-24T02:20:29.744Z error hbrsrv: User agent failed to send request; Connection timed out 

2025-01-24T02:20:29.744Z info hbrsrv: Agent failed to log in. Connection type: /sdkTunnel 

2025-01-24T02:20:29.744Z error hbrsrv: Can't login to the host

Environment

VMware Cloud Director Availability 4.x

Cause

The root cause of the RPO violation is a network connectivity failure between Host '######' and the replicator. The logs indicate multiple timeouts and login failures, confirming that the issue is specific to the host which is having connectivity issues.

 

Resolution

To resolve the RPO violation and restore replication, follow these recommended steps:

1. Restart the hostd Service:

  • Restarting the hostd service on the affected host will refresh the connection between the host and the replicator.

2. Verify vSphere Replication Settings:

  • Ensure that NFC (Network File Copy) tags are properly configured and enabled on the ESXi host for replication.

3. Move the Affected VM:

  • Migrate the impacted VM to another host within the cluster to prevent further replication failures.

4. Increase the RPO Window Temporarily:

  • Extend the RPO window to provide additional time for replication to complete and catch up.

If the issues persists, contact Broadcom Support and note this Article ID (388076) in the problem description. For more information, see Creating and managing Broadcom support cases.