MIMES004I SMS waiting on IGDCDSXS
search cancel

MIMES004I SMS waiting on IGDCDSXS

book

Article ID: 410057

calendar_today

Updated On:

Products

MIM Resource Sharing (MIM) MIM Data Sharing (MII) MIM Message Sharing (MIC) MIM Tape Sharing (MIA)

Issue/Introduction

In most cases when doing an  IPL of the DR site, the following problem can occur when, before JES startup, SMS entered a wait state :

MIMES004I SMS      waiting on IGDCDSXS

IGD033I JOB *MASTER* IS WAITING FOR THE SMS ADDRESS SPACE TO INITIALIZE

D SMS, Shows that SMS is not yet active . The MIMPLEX consists of 2 LPARs(SYS1,SYS2) sharing an CAMIMGR connection.

Normally when IPL a DR site,  the Command :

F CAMIMGR,FREE 1

is issued and it relieve the locking situation, but it doesn't help the SMS wait on IGDCDSXS.

Also the restart of SMS, doesn't help: 

T SMS=GD

IGD024I SMS START IN PROGRESS –

UNABLE TO PROCESS SET SMS COMMAND AT THIS TIME

Only when cancelling SMS the IPL can continue: 

F CAMIMGR,FREE 1

FORCE SMS,ARM

T SMS=GD

D SMS < --- This time start good and IPL can continue 

How that can be avoided.

Resolution

Syslog summary of what happened: 

10:42:58.35 - SMS shows it's waiting on IGDCDSXS
10:45:26.56 - MIM synchronization completes for system SYS2
10:45:26.62 - SMS becomes active (just 1/10th second later)

The 3-minute delay was caused by the MIM0062W warning about SYS1 being critically non-responsive, which was fixed with the FREE 1 command.

What's happening behind the scenes:

During early startup, MIM uses an ENQ exit that triggers MIMES003 and kicks off the CAMIMGR address space. All RESERVE and SYSTEMS ENQ requests get suspended until CAMIMGR is fully up and running.

While some ENQ Qnames and JOBNAMES are on an "allow list" and can complete right away, SMS with QNAME IGDCDSXS isn't one of them. So each RESERVE request sits in a WAIT state until MIM starts up and synchronizes - that's when the ECBs get posted, and SMS, for example, would continue.

You know the MIM synchronization process is finished when you see the MIM0023I message. That's exactly why IGD020I (SMS becoming active) appears right after MIM0023I in the log.

Why the delay from MIM0062W?

With the MIM proc using FORMAT=NONE, no formatting happens at startup, so MIM picks up an instance of SYS1 because of the DEFSYS statement.  SYS1 was not active.   Here are a couple of options for your DR testing: 

- Do exactly what you did: FREE 1

This worked but takes manual intervention to see that MIM is not synchronizing due to the MIM0062W.


- Use fresh files for DR

Since your PROC uses FORMAT=NONE, bring over brand new - or cleared -  MIM control and checkpoint files for DR. Check members ALLOCCF and ALLOCKPT in the MIM samplib for samples of how to allocate and clear the files. Starting the DR system fresh means no MIM0062W issues and no synchronization delays.


- Manual workaround for existing files 

If you want to keep your current control and checkpoint files in their current state and just use them, you can avoid MIM0062W delays with some prep work: Before IPL on the DR systems, change your DEFSYS from:

 

DEFSYS    (SYS1),            /* Development system               
(SYS2), /* Production system
(SYS3), /* Test system
(SYS4) /* Test system

to:

DEFSYS    (SYS1,01,SYS1,INITIAL=FREED),   /* Development system               
(SYS2,02,SYS2,INITIAL=FREED), /* Production system
(SYS3,03,SYS3,INITIAL=FREED), /* Test system
(SYS4,04,SYS4,INITIAL=FREED) /* Test system

and start the first system with FORMAT=BOTH. Note: This option gets tricky because you'd need to update the PROC between SYS2 and SYS1 startups (to put SYS1 back to FORMAT=NONE), since early start only invokes the PROC (S CAMIMGR,SUB=MSTR) without allowing PROC parameter overrides.