Troubleshoot Job aborts on AIX with Page init error
search cancel

Troubleshoot Job aborts on AIX with Page init error

book

Article ID: 236690

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

We experienced a technical issue with a node whereby node was behaving abnormally causing some Jobs to be aborted. We had to restart the node for it to be back to normal.

Please let us know what could have caused so.

Environment

Release : 6.x

Component : DOLLAR UNIVERSE

Operating System: AIX

Cause

During the review we found below error in universe.log 

| 2022-02-22 05:09:28 |ERROR|X|IO |pid=p.t| vhandle_select | Page init error 101 for oex history

| 2022-02-22 05:25:32 |ERROR|X|IO |pid=p.t| owls_oex_exception_select | cannot parse operation exception article [ S000000866000U000002323000U00000001000000002130653XF9999998XF213]

The above errors are linked to a memory issue. Page init error 101 means that DUAS could not allocate enough memory to create a new item for exception data.

There are no specific trace possible to set in scope of DUAS, as the memory management is done by OS(AIX) and hence need to be investigated at OS level.

Below configurations can be validated at DUAS level and from UXTrace

  • IO memory limits are set in the values.xml i.e. <var id="UNI_AIX_LDR_CNTRL_MAXDATA_IO">MAXDATA=0x80000000</var>, where MAXDATA represents maximum amount of data this process can use. In this example it is set to approximately 2 GB (or 0x80000000).
  • in UxTrace look for text in file ps_efl_01G.txt (the highlighted text is memory usage of IO process)
       40001 A  user-account 4720232        1   0  60 20 90f514480 217012        *   Feb 22      - 275:15 ./uxioserv COMPANY X NODEID

In case of above errors, collect below mentioned information and contact Technical Support for assistance. 

  1. AIX version details
  2. Output of command ps -elf | grep uxioserv
  3. Output of command errpt -a. The man page of command can be referred at errpt_man_page
  4. Capture system memory reports at time of issue, to check if system is lacking memory at time problem observed.
  5. Capture free memory available on AIX LPAR(Logical Partition) at time of issue.
  6. Share the object(UPROC/Session) to be executed in the job which is aborting.
  7. Generate UxTrace at time of issue (UVC-> Administration -> Nodes -> Node List -> Select Node -> Right Click -> Trace -> Generate UxTrace) or via command line (refer document in additional information below for more detail).
  8. Share the UPROC script used within job.

Resolution

The workaround is restart of Node, as it will release the system memory.

Additional Information