Issues Found during an OI 22,1>23.1
search cancel

Issues Found during an OI 22,1>23.1

book

Article ID: 266835

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

These were technical questions raised and issues found during a recent OI 22.1>23.1 upgrade.. Emphasis is to help other customers facing the same issues.

We're looking at the pre-requisite  steps for upgrading 22.1 to 21.3 on-premise:  OI (DX Platform)

Issue #1

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/23-1/Upgrade-DX-Platform/upgrade/upgrade-as-cluster-administrator/pre-upgrade-tasks-for-cluster-administrator.html

We're confused about the step "Back Up the Elasticsearch Cluster Snapshot"

1)  What does this step do?  It looks like it reconfigures es-utils to backup ao_ indices that weren't being backed up via a snapshot.  But when does this happen?  Seems like we'd have to wait until 11 PM.  I'm unclear how this helps us for our upgrade if we're going to upgrade immediately
Regarding Issue #1, you are right. The snapshot / backup steps are a precaution only. A file-system backup is generally not recommended for Elasticsearch.

2) One of the steps asks to query the snapshots from ES... when we did this, the most recent ES snapshot was Jan. 23rd.   When looking at the es-utils pod log we see this repeatedly

tail jarvis-es-utils.logs:

INFO   [main] UtilityController:165 - Kron service has not started yet. Will try again in 5 sec.

INFO   [main] UtilityController:165 - Kron service has not started yet. Will try again in 5 sec.

INFO   [main] UtilityController:165 - Kron service has not started yet. Will try again in 5 sec.

 

3) When looking at the jarvis-kron pod it "seems" to be running.

Bottom line, I'm not sure a) what we should be doing, b) why, and c) whether their environment is healthy - seems like no in this regard.

Finally, can't we get an ES backup by just backing up the ES directories on NFS via Linux rather than going through all of this?

i.e. this is only a precaution, right?

 

Issue #2

nfs-migration.sh - not sure what this is doing and why we need to make a whole copy of "some* of the data.  Is this still needed? It also doesn't make sense to me, and they might not have enough disk space for it

 

Resolution

Regarding Issue #1:
You are right. The snapshot / backup steps are a precaution only. A file-system backup is generally not recommended for Elasticsearch.
 
The errors related to Kron and ES Utils communication may indicate that none of the ES utils functionality (index rollover, purge, etc.) are working at the Customer site. There is a techdoc to address this situation: https://knowledge.broadcom.com/external/article/236940
 
 
NOTE: In the upcoming release we plan to refactor the Kron and ES Utils communication mechanism to make it more robust.
 
Regarding Issue #2:
For 22.1 to 23.1 upgrade nfs-directories-migration.sh script execution is not required.
 
The purpose of this script execution is to migrate the data to new directories in the latest release when there are directory changes(Ex: Postgres upgrades, normalization of directories for diff components) in nfs from old release to new release.
 
The main activity is to perform a backup of the dxi-postgres prior to migrating to new structure.  This is required for 22.1 migration, but given that they are already on 22.1 they can skip this activity for the 23.1 upgrade.