"java.io.IOException: Too many open files" error result in a large number of RPO violations in vCloud Availability 3.0
search cancel

"java.io.IOException: Too many open files" error result in a large number of RPO violations in vCloud Availability 3.0

book

Article ID: 314972

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • In the vCloud Availability Portal, you see RPO violations for active replications increase over time.
  • Manually sycnhronizing replications from the vCloud Availability Portal fail.
  • In /opt/vmware/h4/lwdproxy/log/lwdproxy.log on the Replicator, you see messages like:
2019-10-10 13:59:08,334 WARN [Boss-1-1] i.n.c.DefaultChannelPipeline [DefaultChannelPipeline.java:1163] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
        java.io.IOException: Too many open files

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vCloud Availability 3.0.x

Cause

This issue occurs when the Lightweight Delta Proxy has opened the maximum number of socket connections allowed for the service and is unable to open any more to process new synchronizations.

Resolution

This is a know issue affecting vCloud Availability 3.0.3.
Currently, there is no resolution.

Workaround:
To work around this issue, you have to modify the Lightweight Delta Protocol Service (LWD Proxy) configuration file and restart the services on the Replicator:
  1. SSH to the affected vCloud Availability Replicator appliance.
  2. Log in as the root user.
  3. Take a backup of the LWD Proxy configuration file:
cp /opt/vmware/h4/lwdproxy/conf/lwdproxy.properties /opt/vmware/h4/lwdproxy/conf/lwdproxy.properties.YYYY-MM-DD.bak
  1. Set the following variable:
echo “TRAFFIC_ACCOUNTING=false” >> /opt/vmware/h4/lwdproxy/conf/lwdproxy.properties
  1. Navigate to the Replicator Management Portal in a browser.
  2. In the left pane, click System Monitoring.
  3. Under System health, click on Restart Service.
  4. After the Replicator is back online, verify the RPO violations start to decrease.