We experienced a technical issue with a node whereby node was behaving abnormally causing some Jobs to be aborted. We had to restart the node for it to be back to normal.
Please let us know what could have caused so.
Release : 6.x
Component : DOLLAR UNIVERSE
Operating System: AIX
During the review we found below error in universe.log
| 2022-02-22 05:09:28 |ERROR|X|IO |pid=p.t| vhandle_select | Page init error 101 for oex history
| 2022-02-22 05:25:32 |ERROR|X|IO |pid=p.t| owls_oex_exception_select | cannot parse operation exception article [ S000000866000U000002323000U00000001000000002130653XF9999998XF213]
The above errors are linked to a memory issue. Page init error 101 means that DUAS could not allocate enough memory to create a new item for exception data.
There are no specific trace possible to set in scope of DUAS, as the memory management is done by OS(AIX) and hence need to be investigated at OS level.
Below configurations can be validated at DUAS level and from UXTrace
40001 A user-account 4720232 1 0 60 20 90f514480 217012 * Feb 22 - 275:15 ./uxioserv COMPANY X NODEID
In case of above errors, collect below mentioned information and contact Technical Support for assistance.
The workaround is restart of Node, as it will release the system memory.