Troubleshooting Nolio RA: Essentials

book

Article ID: 207038

calendar_today

Updated On:

Products

CA Release Automation - Release Operations Center (Nolio)

Issue/Introduction

Investigating problems while using Nolio RA require certain information. 

 

 

Cause

The information needed to investigate a problem is often overlooked when there is a problem because the focus is restoring the service. This often leads to frustration because problems cannot be explained - usually because the information needed is:

  • erased: log files overwriting the date/time when there was a problem.
  • forgotten: details of the exact errors/behaviors observed.

This frustration can be avoided by a little planning and adopting the best practices highlighted throughout this article.

Environment

Release : 6.6, 6.7

Component : CA RELEASE AUTOMATION RELEASE OPERATIONS CENTER

Resolution

Planning

You cannot plan for everything. But you can plan for the following:

  • How much data your NAC and NES are gathering and adjust the log settings to collect at least 24 hours worth of data. 
  • Automate the collection of all NAC (management server), NES (execution server) and retrieval/utility agent log files. 
    • If you're NAC, NES or retrieval/utility agents are on Linux then one option for automating log collection is available on the Nolio RA Community, here: Nolio RA Collect Logs Scripts. These scripts help collect potentially everything by running one script while you tend to other tasks. 
  • Understand how to collect data. Be aware of these methods for when you need to collect data for agents not targeted by your automated log collector scripts/jobs.
  • Be ready with a tool to take screenshots and/or record video. As you're reviewing a problem that you believe might need investigation later, start taking whole window screenshots. 
    • Things like releaseIds, startTime, endTime are often included in whole window screenshots and is just as valuable (if not more) than the error message. Often it is quicker to screenshot the whole window too. 

 

 

Additional Information

When opening an issue please be as specific as you can. A description of "Deployments hanging" is extremely vague. Is there anything not hanging? Is it happening everywhere (across all applications in Nolio, across all environments in Nolio)? If the answer is yes then that will likely mean we need to focus somewhere different than if the answer is no. 

There are several layers in Nolio where a problem might occur. Knowing the who, what, when, where and how (if available) give insight on where to focus. Examples:

  • If literally everything (at all levels; ie. submitting deployments, navigation, artifact distribution, steps in the deployment, etc.) is slow then it makes sense to look at the resources (CPU, Memory, I/O, Network) of your NAC, Database, Execution Servers. Or, review the nolio_dm_all.log to see if there are any DB connection errors. 
  • If artifact distribution is having errors or hanging then it makes sense to check the NES, Retrieval Agents and the source of artifacts.
  • If it takes deployment steps longer to complete then usual then it makes sense to check resource utilization (Memory, CPU) of your Database and NAC. 
  • If deployment steps are failing due to agent connectivity errors then it's important to understand if that agents Execution Server is reachable by the NAC or if the NES is having problems communicating with the agent(s) for some reason. 

 

Additional information that often helpful:

  • The environment where this is happening: Production, development, test, qa?
  • Version of Nolio?
  • Database Vendor/Version?
  • When did the problem first occur? 
  • Where does the error occur? 
  • How often does the problem occur?
  • Have there been any changes? 
  • Have you tried to investigate?
  • If yes, what did you review and what were your findings?
  • What is the urgency level?
  • Are there certain deadlines that should be taken into consideration?

Some of this "Additional Information" may not be necessary. Some of it can be collected later on during a conversation or webex. But some of it may be relevant and the purpose of having it here is to raise awareness of these things that matter. This way you can plan on being mindful of these things while evaluating the state of Nolio and reporting problems.