We are running the performance tests in Production environment and ran into issues, we see the riskfort server heavily flooded with fatal errors. the following is the sample from logs. Also find the attached arcotriskfort.log, i could see the CA guide in the following URL, it says the issue is fixed in 8.0 release itself, but we are running 8.2.1 and still i see these issues in the logs.
Can you please tell us what is wrong?
Fri Jun 01 00:45:19.297 2018 WARNING: pid 18945 tid 3985636208: 8: 15:13021: ARRFDbOperations::getUserContextFromRFDB. Two parallel request for the USER. Fri Jun 01 00:45:19.297 2018 FATAL: pid 18945 tid 3985636208: 8: 15:13021: Error in Fetching UserContext from ARRFUSERCONTEXT. DB operation Failed.
Environment
Risk Authentication 8.x, 9.x
Resolution
We store the user context information primarily in the column( USERCONTEXTINFO), which can hold a maximum of 4000 characters. If the context is more than 4000 characters, remaining characters will be stored to USERCONTEXTINFOEXT1 and USERCONTEXTINFOEXT2 columns. So, while fetching user context, we combine all these columns to get the consolidated user context info. The problem had occurred while combining these columns data. So, we have changed the way of combining them to get user context properly. This is handled by the code changes.
To elaborate more on the riskfort evaluation flow, when a user performs a risk evaluation request, we lock the respective usercontext row, processes the risk evaluation, updates the usercontext with latest data and then unlocks the row. In this time-interval, if any risk-evaluation comes for the same user, we do make the transaction fail with "two parallel request" error. In case. for any reason. if the transaction got hung, the next transactions fails subsequently as that respective usercontext row will be locked forever. To avoid this situation, we do maintain a time-interval gap of 120 seconds between two successive operations from the same user and then unlocks irrespective of any result.
If we can reduce that gap of 120 seconds to 5 seconds (also, you can try with 1 second), then the chance of getting error can be minimized as now we maintain only 5 seconds gap between two successive transactions from same-user.
Also please note that the earlier patch delivered was to solve the two parallel requests for the devicecontext as the code does not handle to unlock the row even after configured value(default 120 seconds). Whereas the usercontext two parallel requests code flow is fine and now only the chance available for us is to tweak the db table ARRFCONFIGURATION.