yyyy-mm-dd hh:mm:ss,sss DEBUG - [########-####-####-####-##########-##] [job-##] c.v.h.r.replication.SyncSourceJob : Requesting online instance for vm vm-##
In the /opt/vmware/h4/lwdproxy/log/lwdproxy.log file on the Cloud Director Availability On-Premises Appliance, you see entries similar to:
yyyy-mm-dd hh:mm:ss,sss INFO [Worker-3-3] c.v.h.p.h.InitSessionHandler [InitSessionHandler.java:74] PeerId: null yyyy-mm-dd hh:mm:ss,sss WARN [Worker-3-3] c.v.h.p.h.InitSessionHandler [InitSessionHandler.java:224] Handshake relay to server /127.0.0.1:8049 failed for group H4-########-####-####-####-############
javax.net.ssl.SSLException: SSLEngine closed already at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:848) at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
yyyy-mm-dd hh:mm:ss,sss WARN [Worker-3-3] i.n.c.DefaultChannelPipeline [DefaultChannelPipeline.java:1152] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
In the /opt/vmware/h4/replicator/log/replicator.log file on the Cloud Director Availability On-Premises Appliance, you see entries similar as below containing "FOUND_ONGOING_INSTANCE":
yyyy-mm-dd hh:mm:ss,sss DEBUG - [UI-########-####-####-####-############-#####-##-##-##] [pc-task-monitor-2] com.vmware.h4.jobengine.JobExecution : Task 8e7331bd-bf8d-4ff2-bdb4-6481d50f38ff (WorkflowInfo{type='sync', resourceType='replication', resourceId='H4-########-####-####-####-############', isPrivate=false, resourceName='null'}) completed with result SyncRequestResult{instanceKey='replica-########-####-####-####-############', result=FOUND_ONGOING_INSTANCE}
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware Cloud Director Availability 4.x
Cause
This issue occurs when the lightweight delta service certificate on the Cloud Director Availability On-Premises Appliance is regenerated due to the upgrade of the appliance, but this change doesn't get updated on the cloud site for existing replications.
Resolution
This is a known issue affecting Cloud Director Availability 4.x.
Currently there is no resolution.
Workaround: To work around this issue, you need to reconfigure affected replications. To do this in the least disruptive way, you can perform the following steps.
Log into the Cloud Director Availability Portal.
Select an affected replication.
Click All actions.
Under Settings, click Replication settings.
Toggle the Compress replication traffic option.
Click Apply.
Click All actions again.
Under Settings, click Replication settings.
Toggle the Compress replication traffic option back to its original setting.