Upgrade to VCF Operations for Networks 6.14.2 in a clustered deployment fails with CheckServiceStatusTaskException
search cancel

Upgrade to VCF Operations for Networks 6.14.2 in a clustered deployment fails with CheckServiceStatusTaskException

book

Article ID: 436725

calendar_today

Updated On:

Products

VCF Operations for Networks

Issue/Introduction

  • Upgrade from VMware VCF Operations for Networks (formerly vRealize Network Insight) version 6.14 (6.14.0.17256887) to 6.14.2 fails with the following error:

com.vmware.vrealize.lcm.plugin.core.vrni.common.exception.CheckServiceStatusTaskException: Few Services are running with IP<REDACTED_IPS> Kindly login to node as support user and stop service

  • Verification via ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d confirms all services are up, with the exception of the JournalNode service, which reports a status of "running but not healthy" on one of the Platform Nodes.
  • Upgrade initiated from LCM(Lifecycle Manager) fails with following error:

Environment

VCF Operations for Networks 6.14.x

Cause

Executing the following command returns multiple instances of /etc/ssh/ssh_config: line xx: Bad configuration option: xxxxx.

sudo ssh -i /home/support/.ssh/id_rsa_vnera_cluster_keypair -o StrictHostKeyChecking=no support@<REDACTED_IPS> "sudo ls -la /var/lib/jn/data/1/dfs/jn/arkin-platform-cluster/current/ | grep edits_in"

Examination of /etc/ssh/ssh_config on the Platform Node where the JournalNode service is reported as unhealthy, reveals a recent modification timestamp consistent with the upgrade attempt, whereas the configuration files on other Platform nodes show no modifications since the previous deployment.

Resolution

STEPS:

  1. Replace the contents of the /etc/ssh/ssh_config file on the Platform Node where the JournalNode service is reported as unhealthy, with the contents from a different Platform Node using a tool such as WinSCP.

  2. Verify the JournalNode service reports as running and healthy on all nodes after the file is replaced via the command ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d

    • NOTE:  No reboots or service restarts are required after the file replacement.

Additional Information

The JournalNode service relies on functional and consistent SSH communication across all Platform nodes to maintain cluster health and synchronize data.

Correcting the invalid SSH configuration options restores the required secure communication channel between nodes.

If following the Resolution steps does not result in symptom relief, please open a Broadcom support case with the VCF Operations for Networks team using the instructions at KB 142884 - Creating and managing Broadcom cases