Common BSI ‘system down’ causes

book

Article ID: 144764

calendar_today

Updated On:

Products

CA Business Service Insight

Issue/Introduction

Some common troubleshooting advice for when a CA Business Service Insight system is failing. 

First, a discussion of scope – for a production BSI system to be failing, it must first have been working; this document does not cover installation or initial configuration problems.

Resolution

1)    Environmental issues

In order for BSI to work in an environment, it requires the systems it communicates with to be working. Therefore, check that each server - App, Web, remote Adapters, Database and any associated load balancers are running and that the operating systems can see each other.

Database connection

BSI stores the database user/password information in %OG_HOME%\bin\registry.xml – where %OG_HOME% is the environment variable that points to the home directory where BSI is stored. The out of the box default for that is C:\Program Files (x86)\CA\Cloud Insight but it’s quite normal to install elsewhere, for example on a different drive.

However, that’s deliberately obfuscated with encryption for security; if you need to update DB usernames and passwords, use the app %OG_HOME%\Utilities\PassUpdate\PassUpdate.exe – this needs to be run as Administrator because it’s also updating the registry itself. The TNS name for database connection is stored in HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Oblicore\DatabaseConfigurationInfo

Once you have confirmed the TNSname used to connect to the database in PassUpdate, check this works from the command line. Run

sqlplus [email protected]

where ‘oblicore’ is the main OBL user you see in PassUpdate and ‘tnsname’ is the name found. If this, coupled with the correct password, works then you know the database connection itself is correct.

The following SQL query will list all the users in Oracle and their account status; you may find that one or more is locked/expired.

SELECT username,
       account_status
  FROM dba_users;

 

With this info you can then unlock them and reset the password:

ALTER USER user_name IDENTIFIED BY password ACCOUNT UNLOCK;

Once a password has been changed, use PassUpdate on each Web and App server to tell BSI of your changes.

Further info on this is in Article Id 97574

2)    Web Interface is just showing an error

If the browser connection is only getting a 404 error, then either IIS on the Web server isn’t running, or the folder it points to is missing. Launch IIS Manager on the server, and check the Site “Oblicore_Guarantee” still exists. Try a restart here, or preferably open up a command prompt and run

iisreset

This will force the site to be reloaded.

If you have a page displayed, but that page itself is an error, then check the details – they are usually informative.

If the error is about DLLs, then re-register them on both App and Web servers with the command

Regfiles.bat 3

There is more detail for this in Article Id 120043

This error screen is also a common way to discover the database connection problems listed in the first section.

Web interface is present, but login screen is corrupted

If a login screen is visible, but rather than the correct text you see items like [MSG – Login/LOGIN_CAPTION] then the system has lost its strings. You can re-import them with stringsloader. Unless the path to the web server’s Inetpub folder has been changed, this will be

StringsLoader.exe -a C:\Inetpub\wwwroot\Oblicore\App_Data\Resources

Detail for this is in Article Id 135150

You may also see a pop-up saying that the browser is unsupported. BSI requires Internet Explorer 11 in Compatibility Mode. Further client setting prerequisites are in the documentation:

https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/business-management/clarity-business-service-insight/8-3-5/installation/install-the-ca-business-service-insight-client-pack.html

3)    User login issues

If any other user cannot log in, then log in as the ‘sadmin’ administrator and reset their password. But if you can’t log in as sadmin either, then you will need to reset their password. This is possible directly in the database, with the following SQL query:

update t_users set user_password =fnc_encrypt_password('sadmin','sadmin') where user_id = 100;

more detail is in Article Id 35807

4)    Checking logging

For anything with the Application server not working, rather than simple UI/login issues, look for errors in the log. As long as the logging service is working on the App server, errors will be loaded into the T_LOG table of the database. The following query will return every message logged at ERROR level for the last three days:

SELECT MESSAGE_ID, to_char(time_stamp, 'dd/mm/yyyy hh24:mi:ss') as times_stamp, user_id, level_id, message, reporter_object, info, file_name, ip_address

FROM t_log where level_id = 'E' and time_stamp > sysdate - 3

ORDER BY 1 desc;

If that log service itself has failed, then messages that could not be stored are written to LogServer.log and LogClient.log in the %OG_HOME%\log folder. However, it is very rare for that to be failing after a restart of the ‘Oblicore – LogServer’ service for a reason other than because the Database server will not allow the service to save the errors due problems described in section one.

ORA- Errors

Any error that refers to an error code starting ORA- is an Oracle error code; the application has tried to perform a database task and has been informed by Oracle that it failed. Many of them are either self-explanatory, or an explanation is available on the wider web. Typically, the failing database query will be visible, indicating the tables at fault or a query memory issue.

StringKey Errors

As with the Login Page problems discussed in section two, these indicate that the T_RES_KEY table of Oracle is missing the relevant lookup information for a message, and the same solution of running stringsloader to re-import them to the system will solve this.

5)    Reports showing empty data

The journey from a raw event in a source system to data in a report takes two main stages. Working backward through them will show where the failure is.

Pslwriter calculated data

Firstly, check the ACE Engine (PSLWriter) assigned to the metric is running. If it is, then look for errors in the t_log – the most common cause of failed calculation is memory. Memory tuning for the writers is covered in Article Id 6272 in detail, but the simplest step is to reduce the Max Event Block Size. This is controlled in Administration > Site Settings > Advanced > Calculation Engine. The metrics most likely to be causing such a problem can be found using the database query in Article 212727.

Use of Service Level Mgmt > Business Logic Scope will confirm whether the Events are in the system and can be calculated. Only calculate a small number of events to avoid memory/performance issues.

The other common issue is that data is present, but needs to be recalculated due to changes in the system. Ordinarily this should happen automatically, but you may find you need to force this process. Article Id 10171 has a full description of how to do this at the database level.

Event data import

If the raw events are not present, then the Pslwriter itself can’t calculate Business Logic against them. Design > Data Acquisition > Event Management allows viewing of these events.

If they are missing, then the Adapter has had an issue importing them. Each Adapter has its own configuration and log files; typically found in %OG_HOME%\Adapters\<adapter name> - each file in there will be useful for debugging the issue.

<adapter name>Log.log – the log file for the adapter, showing any errors it has hit importing.

<adapter name>Config.xml – the configuration settings of the adapter, including database connection info for SQL type adapters, other log file names in case these have been customized, etc.

Then there is the output subfolder, which contains further important information to review:

AdapterStatistics.txt – as the name suggests, this will show whether recent runs of the adapter have imported any data, how much, whether events were translated or rejected etc.

rejectedEvents.txt – events that have been rejected are stored so they can be checked for why they were rejected; typically, because their identifying data does not map to a registered resource. Particularly important because each Adapter will have a maximum number of events it will reject before it assumes something is wrong with the definition and stops so the user can fix the problem.

DataSourceControl.xml – this is where the system tracks what the last event is imported was. Each data source needs a way of tracking event order so an event isn’t imported twice, e.g. date. If a rogue event came through with an unusually high value in this field, the Adapter may think there is no new data to import and be ignoring information.