VCHA was in degraded state with an alert "appliance configuration is out of sync" while reconfiguring VCHA.
search cancel

VCHA was in degraded state with an alert "appliance configuration is out of sync" while reconfiguring VCHA.

book

Article ID: 418705

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

VCHA was triggered, resulting in the Passive node becoming the new Active (master) node in the VCHA cluster.

The cluster then entered a degraded state with the alert “appliance configuration is out of sync".

When we attempted to fail back to the previously Active node—which had all services running—the operation failed.

/var/log/vmware/vcha/vcha.log report entries related to port 22 connection refused, refer to the snippet below: 

YYYY-MM-DDTHH:MM:SS.358+08:00 error vcha[476348] [Originator@6876 sub=Notifier] Failed to establish a connection to VC.
--> N7Vmacore9ExceptionE(Failed to get solution user certificates from VECS.)
--> [context]zKq7AVECAQAAACqAeAEPdmNoYQAAQxxTbGlidm1hY29yZS5zbwAACBhCACk/QwDIm0oBFKUOdmNoYQABQqkOAdWrDgGGZw4BuGwOAeNtDgAE7DcAF0U4AMUPUQKwjgBsaWJwdGhyZWFkLnNvLjAAA9/6D2xpYmMuc28uNgA=[/context]
YYYY-MM-DDTHH:MM:SS.359+08:00 error vcha[476348] [Originator@6876 sub=Notifier] Dropped event because VC is not available.
--> Event ID: com.vmware.vcha.file.replication.state.changed
--> Arguments: (vmodl.KeyAnyValue) [
-->    (vmodl.KeyAnyValue) {
-->       key = "fileProviderType",
-->       value = "configuration"
-->    },
-->    (vmodl.KeyAnyValue) {
-->       key = "state",
-->       value = "out of sync"

YYYY-MM-DDTHH:MM:SS.514+08:00 verbose vcha[481190] [Originator@6876 sub=IO.Connection opID=WorkQueue-731dac28] Attempting connection; <resolver p:0x00007f540400ce80, 'localhost:1080', next:<TCP '127.0.0.1 : 1080'>>, last e: 0(Success)
YYYY-MM-DDTHH:MM:SS.534+08:00 verbose vcha[476350] [Originator@6876 sub=HttpConnectionPool-000003 opID=WorkQueue-731dac28] LayeredHttpConnectionPool created. maxPoolConnections = 1; idleTimeout = 900000000 us; maxOpenConnections = 1; detectRemoteClose = true
YYYY-MM-DDTHH:MM:SS.541+08:00 verbose vcha[481179] [Originator@6876 sub=IO.Connection opID=WorkQueue-731dac28] Attempting connection; <resolver p:0x00007f5404009700, 'localhost:1080', next:<TCP '127.0.0.1 : 1080'>>, last e: 0(Success)
YYYY-MM-DDTHH:MM:SS.541+08:00 info vcha[476351] [Originator@6876 sub=vpxUtil] System command failed; '/usr/bin/rsync', args: [--recursive,--checksum,--perms,--times,--group,--owner,--links,--protect-args,--temp-dir=/storage/vcha/.tmpfiles,--info=progress,--timeout=60,--rsh=ssh -i /home/vcha/.ssh/id_rsa -o UserKnownHostsFile=/home/vcha/.ssh/known_hosts,/etc/ssl/certs/3e5a4c62.0,vcha@<IP of the failed node>:/etc/ssl/certs/], exit code: 12
--> stdout:
--> stderr: ssh: connect to host <IP of the failed node> port 22: Connection refused
--> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
--> rsync error: error in rsync protocol data stream (code 12) at io.c(232) [sender=3.4.1]
-->
YYYY-MM-DDTHH:MM:SS.542+08:00 error vcha[476351] [Originator@6876 sub=RsyncRepl-smallFrp] Rsync failed, retcode: 12, error: ssh: connect to host <IP of the failed node> port 22: Connection refused
--> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
--> rsync error: error in rsync protocol data stream (code 12) at io.c(232) [sender=3.4.1]

sshconnect.log, also confirms that failed node is refusing connect to port 22: 

YYYY-MM-DDTHH:MM:SS.23Z INFO sshConnect Starting ssh connect to <IP of the failed node>
YYYY-MM-DDTHH:MM:SS.23Z INFO sshConnect Retry attempt 0
YYYY-MM-DDTHH:MM:SS.236Z WARNING sshConnect retry attempt 1 failed [Errno None] Unable to connect to port 22 on <IP of the failed node>
YYYY-MM-DDTHH:MM:SS.238Z INFO sshConnect Retry attempt 1
YYYY-MM-DDTHH:MM:SS.493Z WARNING sshConnect retry attempt 2 failed [Errno None] Unable to connect to port 22 on <IP of the failed node>
YYYY-MM-DDTHH:MM:SS.498Z INFO sshConnect Retry attempt 2
YYYY-MM-DDTHH:MM:SS.710Z WARNING sshConnect retry attempt 3 failed [Errno None] Unable to connect to port 22 on <IP of the failed node>
YYYY-MM-DDTHH:MM:SS.714Z INFO sshConnect Retry attempt 3
YYYY-MM-DDTHH:MM:SS.927Z WARNING sshConnect retry attempt 4 failed [Errno None] Unable to connect to port 22 on <IP of the failed node>
YYYY-MM-DDTHH:MM:SS.927Z ERROR sshConnect Failed to connect to <IP of the failed node> after retry timeout of 15
YYYY-MM-DDTHH:MM:SS.927Z ERROR sshConnect Could not connect to Peer node


Environment

vCenter 8.x

Cause

Multiple possible causes including failed VCHA.

Resolution

Enable sshd service by logging into the previously Active VC node using the below command: 

# systemctl restart sshd

-If it fails with an error that service is masked, then Unmask the service using the following command:

# systemctl unmask sshd.service

Restart the sshd service and now perform the failover via vCenter UI. 

Additional Information