How can you identify which records have duplicate keys or other data that is duplicated?
If a file has records with duplicate data, there is frequently a need to identify which records contain the duplicates. As long as the data is sorted, this is easily resolved. Use a temporary field to store the value of the prior record, and compare the value in the record to the temporary field.
We could use this technique to identify keys, but let's say there is another field we want to analyze.
Let's take a simple case where we have these records where the key is bytes 1-8 and the field of interest is the 1 byte field that follows:
RECORD 1 A RECORD 2 A RECORD 3 B RECORD 4 C RECORD 5 D RECORD 6 D RECORD 7 D RECORD 8 E RECORD 9 E
To get this report:
--------------------------------------------------
KEY DATA DUPFLAG
--------------------------------------------------
RECORD 1 A
RECORD 2 A DATA DUPLICATES RECORD 1
RECORD 3 B
RECORD 4 C
RECORD 5 D
RECORD 6 D DATA DUPLICATES RECORD 5
RECORD 7 D DATA DUPLICATES RECORD 6
RECORD 8 E
RECORD 9 E DATA DUPLICATES RECORD 8
** MK4FT03 TYPE 0 END OF REPORT.
We would code this in the ASL freeform syntax:
CONTROL
FILE MASTER INPUT NAME MYDATA
FILE REPORT
;
LOGIC: PROC
;
DUPFLAG: FIELD TYPE C LENGTH 30 INIT ' '
PRIORKEY: FIELD TYPE C LENGTH 10 INIT ' '
TEMP: FIELD TYPE C LENGTH 1 INIT ' '
;
IF DATA = T.TEMP THEN
COMBINE 'DATA DUPLICATES' T.PRIORKEY STORE T.DUPFLAG BLANKS 1
END
;
REPORT KEY, DATA, T.DUPFLAG
FORMAT HEADINGS NAME
END
;
LET T.TEMP = DATA
LET T.PRIORKEY = KEY
LET T.DUPFLAG = ' '
;
END
This is the code in fixed format that is generated as a result of the ASL code above:
--------------------------------------------------------------------
RUN STMT FILE O N S D U T R A B S D S L RB R S M R S
NAME TYPE NAME L E E I P R E U U R L C S PL J O O F S
D W Q R D N P D F T M N T TK T P P O R
--------------------------------------------------------------------
(ASLRC ) (RC) (MYDATA )(S) (S) (R)
*****************************
* PROC NAME - LOGIC *
*****************************
---------------------------------------------------------------
STMT FIELD FIELD FLD DEC OUTPT EDIT INITIAL
TYPE NAME LNGTH TYP PLC EDIT LGTH VALUE
---------------------------------------------------------------
(TF) (DUPFLAG ) ( 30) (C)
(TF) (PRIORKEY) ( 10) (C)
(TF) (TEMP ) ( 1) (C)
(TF) (TEMP___1) ( 15) (C) (DATA DUPLICATES )
------------------------------------------------------------------ ------------------
STMT SEQ LOG CON .....OPERAND-A..... OPER ............OPERAND-B ......RESULT....
TYPE NO. LEV CTR QLF FIELD SEG-LVL ATN QLF FIELD QLF FIELD
------------------------------------------------------------------ ------------------
(PR)(100) (DATA ) 001-1 (EQ) (T,TEMP )
(PR)(101) (NS) (103 )
(PR)(102) (T,TEMP___1) (C1) (T,PRIORKEY ) (T,DUPFLAG )
(PR)(103) (GO) (SUB REPT___2)
(PR)(104) (R ) (DATA ) (T,TEMP )
(PR)(105) (R ) (KEY ) (T,PRIORKEY)
(PR)(106) (R ) (C ) (T,DUPFLAG )
*****************************
* REQUEST NAME - REPT___2 *
*****************************
------------------------------------------------------------------------------ --------
STMT REPORT MAX SEL SUM VERT FORMS PAGE PAGE LINE REQ
TYPE DATE REQUESTOR ID ITEMS CTL RPT SP CNTRL WDTH HGHT NOS? TYP
--------------------------------------------------------------------------------------
(ER)(TODAY ) (S)
------------------------------------------------
ST SUM V 8 PG PG I SP MAXIMUM TB COLUMN
TY RPT S LPI WID HGT M FRM LPP PGS TO HEADING
MP P G LT TYP
------------------------------------------------
(E1) (F)
--------------------------------
SP Q
STMT SEQ BF L FIELD SEG-
TYPE NO. COL F NAME LVL
--------------------------------
(R1) (KEY ) 001-1
(R1) (DATA ) 001-1
(R1) (T,DUPFLAG )
Note that similar logic can be used to find redundant data in dependent segments as long as the data is presented in sequence.