Tips on setting "Max DB Agents" level in pdm_vdbinfo for SDM/ITSM

book

Article ID: 127779

calendar_today

Updated On:

Products

SUPPORT AUTOMATION- SERVER CA Service Desk Manager - Unified Self Service KNOWLEDGE TOOLS CA Service Management - Asset Portfolio Management CA Service Management - Service Desk Manager

Issue/Introduction

Q1) We are running 30 database agents. Is this enough? 
Here is an extract from the pdm_vdbinfo output:

======================================== 
VDBINFO invoked at 02/18/2019 10:28:44 
======================================== 
Min Config Agents = 30 
Max Config Agents = 30 
Max DB Agents = 30 
Tgt num idle = 2 
Num Agents running = 43 
Num Agents starting = 0 
Num Requests pending = 330 
Actual num idle = 0 

Q2) For the application layer caching, pdm_vdbinfo  has a caching mechanism called “Delayed ID Queue.”
Can this be tuned, and if so, how?
The pdm_vdbinfo command produces output useful in examining how the CA Service Desk Manager (ITSM) agents used in the virtual database layer are running.
 
The virtual database layer exists between the application layer (SDM/ITSM) and the underlying database.

It can be useful to increase the maximum number of database agents if needed.

Environment

CA Service Desk Manager or ITSM. All versions.

Resolution

RULE OF THUMB METHOD
The rule of thumb with setting the Max DB Agents amount is to increase it in small increments, and then observe the results for a period.

For example, moving on a "working system" with 30 to a count of 40 could reasonably be expected to provide a low risk performance improvement, if database agents were lacking previously.
However, you would not not jump the value from 30 to 50 or 60 or above in one hop, because there are externalizes from adding extra agents for no reason.
It is possible to bring a system down with the seemingly harmless change to 60, if resources are not adequate. 
Or to see no performance improvement as the number of agents was never overloaded in the first place.

Note that any site on ITSM 17.1 or above should be running a minimum of 20 database agents.
30 database agents are a good starting point for any implementation.
The additional agents may not be used on smaller sites, but they also will not consume enough resources to impact the rest of the system by being present. It also allows for some growth in traffic.

Pros and Cons of the Rule of Thumb
This method is best when there are clear issues with regular agent overload from expected sources.
It is weak in handling unusual scenarios, as although it may provide some relief to the overload, the root cause may still overwhelm the system.

ANALYSIS OF SYSTEM METHOD
The alternative is to analyse the performance and examine if additional database agents are warranted, or if there are other factors that are inhibiting performance beyond the agents.

Pros and Cons of the Analysis Method
This provides a better understanding of the system, better allocation of resources and is better at finding root causes with long term solutions.
It requires more effort to implement, and a deeper knowledge of the system and its interaction with the environment.

The aim is to differentiate between the following scenarios.
Note that more than one cause can be in play at one time.
It is best to eliminate the worst offenders first, and repeat the process.

1) HIGH BASE LOAD 
The number of database queries in general is consistently high over time, and is overwhelming the number of available agents on a regular basis, because: 

1A)  High Base Load Cause: There are insufficient database agents. 
Eg: The stdlogs are often filled with "millisecond" string messages. The pdm_vdbinfo regularly shows agents overloaded with queries. And when examined, the queries are expected.
Fix: Raise the number of agents, as per the documentation.
Small increments are best (Minimum of 20 agents, then 5 to 10 agent increments are best), unless load and environment analysis indicate a higher figure from the outset.

1B High Base Load Cause: There are queries flooding that should not be. 
Eg Poor Scoreboard design, Events, input from email/Web Services/Reporting load etc. 
Fix: Minimise the incorrect inbound load. 

2) HIGH SPIKE 
The pdm_vdbinfo is hit with spike volume, which clears itself, but ties up all agents briefly. 

2A High Spike Cause: The spike is understood and is expected in the environment. It needs to be handled. 
Fix: Increase agents as per (1A), but possibly not by as much. 
Eg: Each day at 11:00 am a large Contact load is performed.
Identify source and see if it could be handled more effectively.
Otherwise, see if it can be moved to quieter time such as overnight processing.

2B High Spike Cause: The spikes should NOT be occurring. 
Eg: A Web Services loop of activity where repeated open calls are made without closing, between two systems, in a feedback loop.
Fix: Identify cause of spikes and eliminate. 

TOOLS TO HELP ANALYSIS

pdm_vdbinfo

The pdm_vdbinfo report shows the virtual database agent state. Monitor especially for agents that take a long time to clear queries. 
For example, you may see: 
* Stuck queries. These are bad in their own right and should be eliminated. 
* Queries that can get their own dedicated agent, for example configure the Animator to have its own agent. 

Note that pdm_vdbinfo only gives an indication of agent use at a point in time.  There is a natural ebb and flow of volume. 
It is natural for the load to change as usage occurs, so you must take several readings at different points. Use the Administrator, System, and Interval Logging to gather a sample.

It is not necessary to leave this logging on, as it will consume a small amount of resources periodically for no reason. It is best used as a tool to examine the health of  a system, or when diagnosing a specific performance problem, or as recommended by CA Support.

db_report
The dbreport shows the actual database queries. This can be invaluable, but it is recommended to have access to the Database Administrators to help review this information alongside the Service Desk Manager Administrator, in order to get the best out of it.

Caution: This report can generate a lot of information. You may wish to only take a few point selections, and then turn it off. 
How to Identify Performance Problems in CA SDM

stdlog
Open the stdlog with a tool such as Notepad++ and search for the string "milliseconds" for find long running queries.
Cross reference these to the db_report if needed. Possibly link to the findings from the pdm_vdbinfo output. 
The aim is again to identify, understand and reduce the number of long running queries.

Example: This message indicates that the virtual database has consumed all of the agents, and cannot serve any more until the existing agents are processed. This might indicate that the number of database agents should be increased.

11/14 17:08:10.13 MY_SERVER  bpvirtdb_srvr 5884 SIGNIFICANT vdbagent.c 504 Select queue is currently backlogged 23585 millisecond. The NX_VIRTDB_SELECT_QUEUE_WARN is currently set at 3000 milliseconds

COMMENTS

The key aim of the analysis is to work out which scenarios are present.
A site may be facing multiple scenarios, and so it is important to work out where the load is coming from, if it is valid, and if it can be minimized, rather than only bumping the agents up.
Note that sometimes it is simply not possible to increase the agents high enough to handle "flawed data" such as the mass, repeated running of queries with long run times, or syntax that cannot be handled by one of the links in the chain.

Another point to consider is Conventional Configuration vs Advanced Configuration. With Advanced Configuration, there are multiple virtual databases all with their own channel to the database, which distributes the load a lot better. You do need more machines, more administration overhead and better specs on the database, but correctly configured it does give better, more stable performance in high end scenarios.

DELAYED QUEUE ID

Search this document for the string "Delayed Queue ID".
CA Service Desk Manager (SDM) performance is poor when using Oracle Database Management System (DBMS) to host the MDB.

CA Support does not recommend changing any settings that impact this variable.

Additional Information

The Max DB Agent value can be changed using the pdm_option command, as per this document.
The NX.env variable is "NX_MAX_DBAGENT". 
Note that this value may be directly updated in the NX.env and the NX.env_nt.tpl, then recycling the service.
However, the pdm_options command is recommended for all NX.env changes as it allows better administration and changes by scripting.
Best Practice when doing Changes to NX.ENV: using the 'pdm_options_mgr' command

How does Service Desk communicate with the DBMS where the mdb is installed?

Meaning of "Num Agents running" in pdm_vdbinfo output 

What performance information does the pdm_vdbinfo command give me, and how can I interpret the output?