Working with Automic Support

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation Automic SaaS

Issue/Introduction

This document is meant as a first step in working with Support for Automic Workload Automation, Continuous Delivery Automation, and Automic Service Orchestration cases. It is meant to be a living document, so check back for updates!

If ALL of the below details are provided to Broadcom Support upfront, with effective communication, this will lessen the time required for Broadcom Support to be able to provide a resolution or if they deem it necessary, engage L2. Insisting on access to L2, will not quicken this process. Every step below needs to be completed, including all appropriate logs, dumps, scripts, exports, screenshots, and reproduction steps as requested by Broadcom Support and Development teams.

Once exact, detailed steps to reproduce an issue is provided, along with everything mentioned below, Broadcom Support will also attempt to reproduce the issue and will offer a work-around or solution, if one can be identified, without engaging L2. On rare occasions, L2 assistance may be necessary.

Resolution

Case Summary Template

Description:

1. Creating the problem statement - Include within the Problem statement:

1. What:
  1. A brief description of the problem - when x is done, y happens, rather than z
2. Who:
  1. Does it happen to everyone? Or just a single user or subset of users?
3. Where:
  1. Does this happen on all systems/servers? Or just a single system/server?
  2. What is the difference between this and a system that does work?
4. When:
  1. What is the catalyst for the issue to occur?
  2. How often does this occur?
  3. Does this only happen at a certain time of day? Does it happen after a certain amount of time?
  4. When was the most recent occurrence?
  5. When is the issue expected to happen again?
  6. Is this still happening?
5. Has this worked before?

2. Steps to reproduce.

1. Reproduction (use steps as below)
  1. Step 1
  2. Step 2
  3. Step 3
2. Document expected behavior
  1. Why is this expected?
  2. Does the documentation state this?
  3. Did it work differently before?
3. Document actual observed behavior
  1. Include fully written out errors here
  2. Include screenshots of the progression
  3. Include the command line entries with responses
  4. Include related scripts
4. Can the issue be reproduced at will?
5. If the issue cannot be reproduced, why not?

Even stating something when steps are mostly unknown as below provides a lot of useful information:

1) Job is started

2) Wait some time
3) Job fails with error xyz

3. Version(s) of affected/involved component(s)

NOTE: If this is the result of an upgrade, include previous version as well as current version

1. Automation Engine full version (if applicable)
2. Initial data full version (if applicable)
3. Agent full version (if applicable)
4. AWI full version (if applicable)
5. DB type (Oracle, MSSQL, PostGreSql, DB2)
6. DB version
7. OS type and kernel version

4. What changes have been made/found on the involved:

1. Product components
2. Environment in which the product/components are running
  1. OS updates/patches
  2. Hardware or Virtual HW updates (CPU, Mem, Storage)
  3. Increase in server load during specific times (spiking)
  4. Network updates/patches
3. Increase in general system load over last few weeks/months

5. Is there a current solution or workaround?

1. If this is for root cause, what resolved the issue when it happened?
2. Is there a way to get around the issue (or another way to get expected results)?

6. Description of the impact to the business (explaining the chosen priority)

1. What is the impact to business when this problem occurs?
2. Is this blocking an upgrade? If so, what is the timeline? What happens if that timeline is not hit?
3. Are workarounds acceptable? If so, for how long?
4. If there is no impact, and just an inconvenience, please note that

7. What troubleshooting has been done so far

1. What has been tried? Tried: x, y, z
2. What was the result? Result: no change, different error during x, y, z

8. Knowledge search results

1. What keywords were used in the Broadcom Knowledge search?
2. What was returned?

- This is important for Broadcom to know how to improve their articles if there is one that was already created that was missed. It’s important to describe this in a way that customers would search in the future. “Agent goes down after 10 minutes” is a viable knowledge search.

9. Analysis Documents (object-exports / job-reports / logs / traces / screenshots / command line history, etc.)

NOTE: When a problem is not reproducible then as much information as possible should be provided (complete logs from the time, xml exports, forced traces, etc…). If a problem is reproducible, provide the appropriate traces required to analyze the root cause. This is one area that consumes a lot of time prior to being able to carry out a full analysis. If the correct traces are provided upfront, the issue will progress much more quickly.

1. What documents were gathered?
2. What should be looked at?
3. What files contain what information?
4. What kind of traces on which components were gathered?
5. If xml files are provided, what do they contain?

10. RunID / timestamp

1. When files are looked at, what timestamp should be looked at?
2. What is the RunID for a task that hit the issue?

11. Expectations with the case - if not clear above, what is expected when looking at this case?

1. Technical expectations (which of the following fit the situation):
  - Is this behavior as designed?
  - If this is not as designed, is there a product defect?
  - What information is necessary to move forward?
  - Question to answer - is there a way to do this?
  - Is this a known issue?
  - Is there a different way to achieve results?
2. Case expectations
  - Is there a specific timeline expected?
  - Is there specific knowledge needed when troubleshooting?
  - Is there a preferred method of communication?
  - Is there a preferred timezone and time of day to be contacted?

Additional Information

Other info

In case of transferring logs and traces:

Please do not provide traces with all trace flags set to 9. Make sure to create traces with just the lowest trace level required (see list below)

Trace Level:

Default trace flags:

For issues where WP traces are indicated, the default is database=3, TCP/IP=2. For performance analysis scenarios a TCP/IP=2 & database=2 trace is sufficient
For performance analysis scenarios for certain actions (time critical calls or U#3434) TCP/IP=2 & database=3
AWI: xml=3 additional traces like CP
For issues with Agents:

- Default: tcp/ip = 9

- In case of failing file transfers user tcp/ip=9, ft=5
- If a file-event is not triggering properly, set event=9, ft=5 would not be needed in this scenario

Knowledge articles can help in many cases and should be checked:

How to improve overall Automation Engine performance

Root cause investigation for Automation Engine outage / freeze / unavailability