OPSBCPII dumps and failed to reconnect all cpcs after MCL upgrade
During CPC MCL upgrade OPSBCPII took a dump, and failed to reconnect to other cpc's besides its own.
Date TIME Job Name Job ID ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----10---+----11---+
06MAR 11:30:26 OPSBCPII S84063 OPBCP080I - OPS BCPii Server topology refresh in progress ...
06MAR 11:30:26 OPSBCPII S84063 OPBCP081I - Purging the OPS BCPii Server topology ...
06MAR 11:30:27 OPSBCPII S84063 OPBCP082I - ( OPIIEVTR ) OPS BCPii Server V1 topology purged
06MAR 11:30:27 OPSBCPII S84063 OPBCP083I - ( OPIIEVTR ) OPS BCPii Server V2 topology purged
06MAR 11:30:27 OPSBCPII S84063 OPBCP999E - ( OPIICAPI ) hwilist CPCs failed - rc=101
06MAR 11:30:27 OPSBCPII S84063 OPBCP099E - (OPIIDRV ) Build Topology failed - rc=257
06MAR 11:30:27 OPSBCPII S84063 OPBCP063I - OPS BCPii Server initialization complete. Waiting for requests...
06MAR 11:30:28 OPSBCPII S84063 OPBCP100I BCPii V2 Topology Display 969
06MAR 11:30:28 OPSBCPII S84063 OPBCP100I LV ENTITY TYPE ENTITY NAME
06MAR 11:30:28 OPSBCPII S84063 OPBCP100I 1 ENTERPRISE company name
06MAR 11:30:28 OPSBCPII S84063 OPBCP100I 2 INSTALLATION installation name
06MAR 11:30:50 OPSBCPII S84063 IDI0001I Fault Analyzer V16R1M00 (UI98875 2024/10/25) invoked by IDIXDCAP using SYS1.xxxx.PARMLIB(IDICNF00) <<--
06MAR 11:30:50 OPSBCPII S84063 IDI0034I Fault analysis skipped due to: IDICNF00 config member EXCLUDE option specification
06MAR 11:30:50 OPSBCPII S84063 IEA995I SYMPTOM DUMP OUTPUT
06MAR 11:30:50 OPSBCPII S84063 USER COMPLETION CODE=0000 REASON CODE=00000000
06MAR 11:30:50 OPSBCPII S84063 TIME=11.30.28 SEQ=00157 CPU=0000 ASID=00D0
06MAR 11:30:50 OPSBCPII S84063 PSW AT TIME OF ERROR 078C1001 8000F8FC ILC 2 INTC 0D
06MAR 11:30:50 OPSBCPII S84063 NO ACTIVE MODULE FOUND
06MAR 11:30:50 OPSBCPII S84063 NAME=UNKNOWN
06MAR 11:30:50 OPSBCPII S84063 DATA AT PSW 0000F8F6 - 00181610 0A0DD7FF 40004000
06MAR 11:30:50 OPSBCPII S84063 AR/GR 0: 00000000/84000000 1: 00000002/84000000
06MAR 11:30:50 OPSBCPII S84063 2: 00000000/00000000 3: 00000000/0000F960
06MAR 11:30:50 OPSBCPII S84063 4: 00000000/208C7470 5: 00000000/00000064
06MAR 11:30:50 OPSBCPII S84063 6: 00000000/208C7598 7: 00000000/208C7428
06MAR 11:30:50 OPSBCPII S84063 8: 00000000/208C7470 9: 00000000/208C430F
06MAR 11:30:50 OPSBCPII S84063 A: 00000000/208C78F0 B: 00000000/0000B000
06MAR 11:30:50 OPSBCPII S84063 C: 00000000/208C73E0 D: 00000000/208C7828
06MAR 11:30:50 OPSBCPII S84063 E: 00000000/208C2CD6 F: 00000000/00000000
06MAR 11:30:50 OPSBCPII S84063 END OF SYMPTOM DUMP
06MAR 11:30:50 OPSBCPII S84063 OPBCP999E - ( OPIICMDP ) Retry after abend !
06MAR 11:30:50 OPSBCPII S84063
06MAR 11:32:54 OPSBCPII S84063 OPBCP080I - OPS BCPii Server topology refresh in progress ...
06MAR 11:32:54 OPSBCPII S84063 OPBCP081I - Purging the OPS BCPii Server topology ...
06MAR 11:32:54 OPSBCPII S84063 OPBCP082I - ( OPIIEVTR ) OPS BCPii Server V1 topology purged
06MAR 11:32:54 OPSBCPII S84063 OPBCP083I - ( OPIIEVTR ) OPS BCPii Server V2 topology purged
06MAR 11:32:56 OPSBCPII S84063 OPBCP036I - hwilist CPCs succeeded, numOfCPCs=1
06MAR 11:32:58 OPSBCPII S84063 OPBCP033I - hwilist CPC 1 = IBM390PS.CPC212
A stop/start after MCL upgrade completed, fixed it, but why did it occur?
Following a name change event, the CPC that was in SERVICE status ended up going to OPERATIONAL. The REFRESH TOPOLOGY command was entered prior to the OPERATIONAL status appearing. The abend is caused because the server attempted to satisfy the DISPLAY TOPOLOGY command, but since the refresh had failed at this point, there was no TOPOLOGY.
the OPSLOG it appears the CPC was not in a state that allowed the OPSBCPII server to respond to the REFRESH TOPOLOGY commands issued after the name change events. See below message explanation from IBM documentation Base Control Program Internal Interface (BCPii) Services:
101 HWI_COMMUNICATION_ERROR
Meaning: A communication error is detected. The hardware management console application API
(HWMCA) or the BCPii transport layer has returned with a failing return code.
Action: See the DiagArea for further diagnostic information. The Diag_CommErr indicates the
return code that is returned from HWMCA APIs or the BCPii transport layer.
HWMCA API and BCPii transport return codes are provided in Appendix A, “BCPii communication
error reason codes,” on page 637.
The server would have refreshed when the polling triggered here would have been able to determine the CPC's were available.
11:32:56 HWSHWCOMER HARDWARE COMMUNICATION ERROR - PERMANENT CPC IBM390PS.CPC###
11:32:56 HWSHWCOMER HARDWARE COMMUNICATION ERROR - PERMANENT CPC IBM390PS.CPC###
11:32:56 HWI024I BCPII IS NO LONGER ATTEMPTING COMMUNICATION WITH A CPC
11:32:56 OPBCP303E - ( OPIIEVTX ) Permanent Comm-Error recovery in progress <- polling should have begun