Aria Operations for Networks fails to upgrade with error seen on GUI "Disk Utilization Check Fails"
search cancel

Aria Operations for Networks fails to upgrade with error seen on GUI "Disk Utilization Check Fails"

book

Article ID: 370288

calendar_today

Updated On:

Products

VMware Aria Operations for Networks

Issue/Introduction

If upgraded is started immediately after taking snapshots for an Aria Operations for Networks deployment, then error mentioned below is expected.

This is upgrade precheck which is build in Aria Operations for Networks.

This is expected behavior as after snapshots when Aria Operations for Networks appliances are powered on there is high I/O seen due to which foundation database replicate status shows Replication healthy with some moving data.

 

Symptom1:

Aria Operations for Networks fails to update through Aria Suite Lifecycle manager (vRSLCM) - Disk Utilization Check Fails

Error on Aria Suite Lifecycle manager (vRSLCM) GUI shows as below:

 

com.vmware.vrealize.lcm.plugin.core.vrni.common.exception.VRNIUpgradeCheckStatusException: Error occurred while checking upgrade pre-check status with IP ##.##.###.## Pre-check message : {
"msg" : "High disk utilization",
"type" : "INFO",
"status" : "FAIL",
"id" : "DiskUtilizationCheckTask",
"title" : "Disk Utilization Check",
"consentMsg" : null
}
at com.vmware.vrealize.lcm.plugin.core.vrni.task.upgrade.UpgradePrecheckStatusTask.execute(UpgradePrecheckStatusTask.java:161)
at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:62)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)

Symptom2:

On Aria Operations for Networks GUI below error message is seen High disk utilization is Failed and (contact Support)

 

Symptom3:

Aria Operations for Networks database shows replication status as Healthy (Repartitioning) with some moving data. See below output:

ubuntu@platform1:~$ fdbcli
Using cluster file `/etc/foundationdb/fdb.cluster'.

The database is available.

Welcome to the fdbcli. For help, type `help'.
fdb> status details

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - double
  Storage engine         - ssd-2
  Coordinators           - 3
  Desired Proxies        - 2
  Desired Logs           - 2
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 20
  Zones                  - 10
  Machines               - 10
  Memory availability    - 11.0 GB per process on machine with least available
  Retransmissions rate   - 93 Hz
  Fault Tolerance        - 1 machine
  Server time            - 09/17/24 18:32:05

Data:
  Replication health     - Healthy (Repartitioning)
  Moving data            - 0.129 GB
  Sum of key-value sizes - 2.048 TB
  Disk space used        - 5.230 TB

Operating space:
  Storage server         - 1603.0 GB free on most full server
  Log server             - 1650.0 GB free on most full server

Workload:
  Read rate              - 9295 Hz
  Write rate             - 3475 Hz
  Transactions started   - 27044 Hz
  Transactions committed - 648 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Environment

VMware vRealize Network Insight 6.9
Aria Operations for Networks 6.10.0
Aria Operations for Networks 6.11.0
Aria Operations for Networks 6.12.0
Aria Operations for Networks 6.12.1
Aria Operations for Networks 6.13.0
Aria Operations for Networks 6.14.0

Cause

Post snapshots via vCenter or via Aria Suite Lifecycle when Aria Operations for Networks appliances are powered on there are high I/O seen due to which foundation database shows replication health as Healthy (Repartitioning) with some moving data.

 

 

Resolution

This is expected behavior. After taking snapshots, it is recommended to check the services status and database healthy from cli.

Database should show Healthy with 0 GB Moving data.

It is expected to wait for few minutes for High I/O seen on the database to settle down before triggering the upgrade.

FDB replication status usually takes maximum 30-60 minutes for 3 Platform Node Cluster and up to more than 90 minutes  to be healthy with 0 GB of moving data on 5 to 15 Node platform Cluster deployments. This also varies as it is dependent on Size of Moving data.

We have a workaround to overcome this via policy modification within Aria Operations of Networks database.

GS support team will have to review and evaluate this further and make changes to Aria operations for Networks database.

Open Broadcom support case by referring to this Knowledge base article.

See Creating and managing Broadcom support cases.