High Indexer Lag showing in the VCF Operations for Networks GUI

Products

VCF Operations for Networks

Issue/Introduction

After logging into the GUI for VCF Operations for Networks and selecting Settings --> Infrastructure and Updates, under System Health , the Indexer lag shows as high.

You may see an error message such as below, but you also may just see a value that you know to be considerably higher than typical for your environment.

The error message text is "Indexer Service" followed by "Recent data is still being indexed. Search results might be inaccurate." followed by "Resolution: Wait for indexer to catch up and this error to clear. If this persists for more than 12 hours, contact support."

As a double-check, SSH into the Platform1 node with the support user, switch to the Ubuntu user with the ub command, and then invoke the rdb tool to check the Indexer status with the indexer_status command.

ub

rdb

arkin> indexer_status

The Indexer Status shows a high lag in seconds, which corresponds to what the GUI shows (In perhaps a different unit of measurement, such as minutes, hours, or even days)

As an additional check, when you check the service statuses using the ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d command while in the ub user state, you observe:

FlinkContainer is running and healthy.

When checking the hosts file, located at /etc/hosts , you see platform-infra on one of the lines:

When checking the file deployment-set.info located at /home/ubuntu/build-target/deployment/ , there is an incorrect "platform-infra," entry prior to the other existing Platforms, such as below:

Environment

VCF Operations for Networks

Cause

The cause of this issue is still being investigated.

Resolution

Note: It is recommended to take snapshots prior to making any changes. Please see Best practices to shutdown Aria Operations for Networks Clustered deployments

STEPS:

Edit the hosts file, located at /etc/hosts
- IMPORTANT NOTES:
  - First, create a backup of the existing /etc/hosts file, using the following command:
    - sudo cp /etc/hosts /etc/hosts.backup.before.kb.edit
  - Next, run the ls -al /etc/hosts command, and you should observe the following attributes of the file:
    - -rw-r--r-- # root root ### ### ## #### /etc/hosts
  - Since you are logged in as the ub (Ubuntu) user, and the owner of the files in the /etc directory is the root user, you will have to temporarily change the file to allow write permissions by "others"
  - You do this change of permission by this command:
    - sudo chmod o+w /etc/hosts
  - Now, if you run the ls -al /etc/hosts command, you will observe the following attributes of the file:
    - -rw-r--rw- # root root ### ### ## ##:## /etc/hosts
- Now you can use a text editor such as "vi" to change the string platform-infra to the string aria-networks-platform in the file /etc/hosts on each Platform Node and save the changes using ":wq!".
- After you have made the change on each Platform node, change the permissions back to what they were by the command:
  - sudo chmod o-w /etc/hosts
Next, edit the deployment-set.info located at /home/ubuntu/build-target/deployment/deployment-set.info in each Platform Node.
- IMPORTANT NOTES:
  - First, create a backup of the existing /home/ubuntu/build-target/deployment/deployment-set.info file, using the following command:
    - cp /home/ubuntu/build-target/deployment/deployment-set.info /home/ubuntu/build-target/deployment/deployment-set.info.before.kb.edit
- Now you can use a text editor such as "vi" to delete the string "platform-infra," from the line in the file /home/ubuntu/build-target/deployment/deployment-set.info on each Platform Node and save the changes using ":wq!".
- For a 3-node cluster, for example, the finished result should look like this:
  - platform1,platform2,platform3
After the changes, verify the files hosts and deployment-set.info are correct across each Platform Node using the following commands:
- ./run_all.sh sudo cat /etc/hosts | grep localhost
  - The result should look something like the example below (for a 3-node cluster)

--platform1--
127.0.0.1 localhost aria-networks-platform
127.0.0.1 aria-networks-platform
127.0.0.1 localhost aria-networks-platform
###.###.###.### platform1
###.###.###.### platform2
###.###.###.### platform3
--platform2--
127.0.0.1 localhost aria-networks-platform
127.0.0.1 aria-networks-platform
127.0.0.1 localhost aria-networks-platform
###.###.###.### platform1
###.###.###.### platform2
###.###.###.### platform3
--platform3--
127.0.0.1 localhost aria-networks-platform
127.0.0.1 aria-networks-platform
127.0.0.1 localhost aria-networks-platform
###.###.###.### platform1
###.###.###.### platform2
###.###.###.### platform3

- ./run_all.sh sudo cat /home/ubuntu/build-target/deployment/deployment-set.info
  - The result should look something like the example below (for a 3-node cluster)

--platform1--
platform1,platform2,platform3
--platform2--
platform1,platform2,platform3
--platform3--
platform1,platform2,platform3

4. Restart the Flink service on each Platform Node (There is no "restart", but first you "stop", then "start"):

- ./run_all.sh sudo systemctl stop flinkjobs.service
- ./run_all.sh sudo systemctl start flinkjobs.service

5. Verify that the services on each Platform Node are running and healthy:

- ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d

At this point, you should begin to see the indexer lag gradually return to more typical values.

You may have to wait hours, or even days, depending on how long the lag was when you began this procedure.