Some common troubleshooting advice for when a CA Business Service Insight system is failing.
First, a discussion of scope – for a production BSI system to be failing, it must first have been working; this document does not cover installation or initial configuration problems.
In order for BSI to work in an environment, it requires the systems it communicates with to be working. Therefore, check that each server - App, Web, remote Adapters, Database and any associated load balancers are running and that the operating systems can see each other.
BSI stores the database user/password information in %OG_HOME%\bin\registry.xml – where %OG_HOME% is the environment variable that points to the home directory where BSI is stored. The out of the box default for that is C:\Program Files (x86)\CA\Cloud Insight but it’s quite normal to install elsewhere, for example on a different drive.
However, that’s deliberately obfuscated with encryption for security; if you need to update DB usernames and passwords, use the app %OG_HOME%\Utilities\PassUpdate\PassUpdate.exe – this needs to be run as Administrator because it’s also updating the registry itself. The TNS name for database connection is stored in HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Oblicore\DatabaseConfigurationInfo
Once you have confirmed the TNSname used to connect to the database in PassUpdate, check this works from the command line. Run
sqlplus [email protected]
where ‘oblicore’ is the main OBL user you see in PassUpdate and ‘tnsname’ is the name found. If this, coupled with the correct password, works then you know the database connection itself is correct.
The following SQL query will list all the users in Oracle and their account status; you may find that one or more is locked/expired.
With this info you can then unlock them and reset the password:
ALTER USER user_name IDENTIFIED BY password ACCOUNT UNLOCK;
Once a password has been changed, use PassUpdate on each Web and App server to tell BSI of your changes.
Further info on this is in Article Id 97574
If the browser connection is only getting a 404 error, then either IIS on the Web server isn’t running, or the folder it points to is missing. Launch IIS Manager on the server, and check the Site “Oblicore_Guarantee” still exists. Try a restart here, or preferably open up a command prompt and run
This will force the site to be reloaded.
If you have a page displayed, but that page itself is an error, then check the details – they are usually informative.
If the error is about DLLs, then re-register them on both App and Web servers with the command
There is more detail for this in Article Id 120043
This error screen is also a common way to discover the database connection problems listed in the first section.
If a login screen is visible, but rather than the correct text you see items like [MSG – Login/LOGIN_CAPTION] then the system has lost its strings. You can re-import them with stringsloader. Unless the path to the web server’s Inetpub folder has been changed, this will be
StringsLoader.exe -a C:\Inetpub\wwwroot\Oblicore\App_Data\Resources
Detail for this is in Article Id 135150
You may also see a pop-up saying that the browser is unsupported. BSI requires Internet Explorer 11 in Compatibility Mode. Further client setting prerequisites are in the documentation:
If any other user cannot log in, then log in as the ‘sadmin’ administrator and reset their password. But if you can’t log in as sadmin either, then you will need to reset their password. This is possible directly in the database, with the following SQL query:
update t_users set user_password =fnc_encrypt_password('sadmin','sadmin') where user_id = 100;
more detail is in Article Id 35807
For anything with the Application server not working, rather than simple UI/login issues, look for errors in the log. As long as the logging service is working on the App server, errors will be loaded into the T_LOG table of the database. The following query will return every message logged at ERROR level for the last three days:
SELECT MESSAGE_ID, to_char(time_stamp, 'dd/mm/yyyy hh24:mi:ss') as times_stamp, user_id, level_id, message, reporter_object, info, file_name, ip_address
FROM t_log where level_id = 'E' and time_stamp > sysdate - 3
ORDER BY 1 desc;
If that log service itself has failed, then messages that could not be stored are written to LogServer.log and LogClient.log in the %OG_HOME%\log folder. However, it is very rare for that to be failing after a restart of the ‘Oblicore – LogServer’ service for a reason other than because the Database server will not allow the service to save the errors due problems described in section one.
Any error that refers to an error code starting ORA- is an Oracle error code; the application has tried to perform a database task and has been informed by Oracle that it failed. Many of them are either self-explanatory, or an explanation is available on the wider web. Typically, the failing database query will be visible, indicating the tables at fault or a query memory issue.
As with the Login Page problems discussed in section two, these indicate that the T_RES_KEY table of Oracle is missing the relevant lookup information for a message, and the same solution of running stringsloader to re-import them to the system will solve this.
The journey from a raw event in a source system to data in a report takes two main stages. Working backward through them will show where the failure is.
Firstly, check the ACE Engine (PSLWriter) assigned to the metric is running. If it is, then look for errors in the t_log – the most common cause of failed calculation is memory. Memory tuning for the writers is covered in Article Id 6272 in detail, but the simplest step is to reduce the Max Event Block Size. This is controlled in Administration > Site Settings > Advanced > Calculation Engine
Use of Service Level Mgmt > Business Logic Scope will confirm whether the Events are in the system and can be calculated. Only calculate a small number of events to avoid memory/performance issues.
The other common issue is that data is present, but needs to be recalculated due to changes in the system. Ordinarily this should happen automatically, but you may find you need to force this process. Article Id 10171 has a full description of how to do this at the database level.
If the raw events are not present, then the Pslwriter itself can’t calculate Business Logic against them. Design > Data Acquisition > Event Management allows viewing of these events.
If they are missing, then the Adapter has had an issue importing them. Each Adapter has its own configuration and log files; typically found in %OG_HOME%\Adapters\<adapter name> - each file in there will be useful for debugging the issue.
<adapter name>Log.log – the log file for the adapter, showing any errors it has hit importing.
<adapter name>Config.xml – the configuration settings of the adapter, including database connection info for SQL type adapters, other log file names in case these have been customized, etc.
Then there is the output subfolder, which contains further important information to review:
AdapterStatistics.txt – as the name suggests, this will show whether recent runs of the adapter have imported any data, how much, whether events were translated or rejected etc.
rejectedEvents.txt – events that have been rejected are stored so they can be checked for why they were rejected; typically, because their identifying data does not map to a registered resource. Particularly important because each Adapter will have a maximum number of events it will reject before it assumes something is wrong with the definition and stops so the user can fix the problem.
DataSourceControl.xml – this is where the system tracks what the last event is imported was. Each data source needs a way of tracking event order so an event isn’t imported twice, e.g. date. If a rogue event came through with an unusually high value in this field, the Adapter may think there is no new data to import and be ignoring information.