This TECDOC will describe what CA Support looks at when reviewing the CA SDM STDLOGs to troubleshoot an issue.
Disclaimer: This guide provides a general overview of CA SDM log file examination. It is strongly recommended that if you are experiencing problems with the install, to open a CA Support issue and provide the log materials (NX_ROOT\LOG directory) so that CA Support can further troubleshoot the issue.
Whenever a problem arises, a commonly requested item are the CA SDM STDLOGs (NX_ROOT\LOG directory). CA Support often asks for the entire set of logs. This is so that CA Support can get an overall view of the install's history, including any prior occurrences of the given problem that may have arisen but went undetected or undiagnosed.
It is also asked that CA Support obtain the logs directly to examine the issue and analyze the logs more thoroughly. Some of the pitfalls of trying to examine the logs over a remote session would include false-positives or wading through considerable amounts of irrelevant information. Further, any later examination of the issue by Sustaining Engineering would require the logs be sent in for examination.
While the tools to examine the issue may vary, the most frequently used tool are string and expression searches. This may take the form of DOS/Windows' findstr command or Unix/Linux/MKS Tools grep command.
Once CA Support receives the logs, the log materials are examined for specific concerns being reported. Techniques and methods may vary depending on the issue. The following are some of the more common concerns:
Depending on what the nature of the performance issue is, one could search for instances of the word "millisecond", which may manifest entries such as:
SERVER1 web:local 2292 SIGNIFICANT session.c 3888 This request took 1829940 milliseconds to complete. session id:0 login name:User1 htmpl name:detail_cr_ro.htmpl
SERVER1 sqlagt:select2 3128 SIGNIFICANT sqlclass.c 1049 The following statement took 4623 milliseconds: Clause (SELECT act_log.time_stamp...
SERVER1 bpvirtdb_srvr 5880 SIGNIFICANT vdbagent.c 504 Select queue is currently backlogged 3012 millisecond.
SERVER1 sqlagt:select2 3128 SIGNIFICANT sqlclass.c 1230 A FETCH for the following statement took 2040 milliseconds: Clause (SELECT call_req.id...
In the above, there may be multiple instances of any of the above messages. One can consider such instances to form a pattern. For example, is the query on the detail_cr_ro.htmpl or the act_log table or call_req table always running slow? Were any changes made to the given web form? How large are the given tables? If there is a specific timeframe specified, is everyone running a report at the same time or even coming in at once?
If there are issues regarding the database, one can examine for entries that contain "sqlclass" or "orclclass". For instance, one may find:
SERVER1 sqlagt:select3 3984 ERROR sqlclass.c 996 SQL Execute failed: [Microsoft OLE DB Provider for SQL Server] [ SQL Code=11 SQL State=08S01] [DBNETLIB][ConnectionWrite (send()).]
SERVER1 sqlagt:select0 3640 ERROR sqlclass.c 470 Failed to logon to SQL Server (DBSERVER\INSTANCE) Reason: [Microsoft OLE DB Provider for SQL Server] [ SQL Code=17 SQL State=08001] [DBNETLIB][ConnectionOpen (Connect()).]
In the above, the logging shows CA SDM is having problems with the backend SQL Server. There is a reference to a SQL Code and a SQL State in both of the above example entries. These are NOT CA SDM specific error values as they reference the SQL Server. One may perform a Google Search to find out what these error codes actually signify.
Sometimes, there is a question about the web interface, that some accesses such as login or searches, may take time to complete. One thing that CA Support may consider is searching for instances of the web process (usually the primary server's web:local or the secondary's web:SECONDARY-SERVER process on the secondary servers
SERVER1 web:local 3296 SIGNIFICANT session.c 10468 Session 1933497577:0x0EADFB68 login by analyst user1 (cnt:ABCDEFGHIJKLMNOPQRSTUV); session count 78
SERVER1 web:local 3296 SIGNIFICANT session.c 7823 Session User1-218221404:0x8f88838 ended without logout; session count 77
SERVER1 web:local 3296 SIGNIFICANT session.c 6539 Web Statistics - Cumulative Sessions (340) Most Sessions (81) Current Sessions (77)
From the above, if one were to see muliple instances of "login by analyst" by the same user in a short time frame, we can infer that the user might be attempting multiple logins across workstations, or if there is an impossibly high number of the same login in a very short amount of time, either the user has loaned their login out or may be part of a third party monitoring tool that could test CA SDM's availability via login access.
Similarly, "ended without logout" may point to such a monitoring tool, especially if a pattern of the same user timing out the session is seen. Similarly, if end users are in a habit of leaving their browsers or just closing them, but have a high Timeout value in web.cfg, that may point to a performance issue.
The third line points to session statistics that are recorded for the given webengine. Something to be concerned is if a given entry infers a high value for "Most Sessions" or "Current Sessions" (over 100 sessions) which may point to a highly taxed environment with the amount of sessions the given server is handling.
If there is a specific timeframe of interest, one can obtain a list of all records that name a specific time/date range. For instance, if one knows a given issue had occurred at or around Jan 6, 12:25, one could search for all records that start with "01/06 12:2". What this does is show all activity that occurred from Jan 6, 12:20 to Jan 6, 12:29. This is especially useful if a specific time is not known, but a general range could be considered.
For more information on the content of the CA SDM STDLOGs, refer to TEC478009