How can you identify which records have duplicate keys or other data that is duplicated?
If a file has records with duplicate data, there is frequently a need to identify which records contain the duplicates. As long as the data is sorted, this is easily resolved. Use a temporary field to store the value of the prior record, and compare the value in the record to the temporary field.
We could use this technique to identify keys, but let's say there is another field we want to analyze.
Let's take a simple case where we have these records where the key is bytes 1-8 and the field of interest is the 1 byte field that follows:
RECORD 1 A RECORD 2 A RECORD 3 B RECORD 4 C RECORD 5 D RECORD 6 D RECORD 7 D RECORD 8 E RECORD 9 E
To get this report:
-------------------------------------------------- KEY DATA DUPFLAG -------------------------------------------------- RECORD 1 A RECORD 2 A DATA DUPLICATES RECORD 1 RECORD 3 B RECORD 4 C RECORD 5 D RECORD 6 D DATA DUPLICATES RECORD 5 RECORD 7 D DATA DUPLICATES RECORD 6 RECORD 8 E RECORD 9 E DATA DUPLICATES RECORD 8 ** MK4FT03 TYPE 0 END OF REPORT.
We would code this in the ASL freeform syntax:
CONTROL FILE MASTER INPUT NAME MYDATA FILE REPORT ; LOGIC: PROC ; DUPFLAG: FIELD TYPE C LENGTH 30 INIT ' ' PRIORKEY: FIELD TYPE C LENGTH 10 INIT ' ' TEMP: FIELD TYPE C LENGTH 1 INIT ' ' ; IF DATA = T.TEMP THEN COMBINE 'DATA DUPLICATES' T.PRIORKEY STORE T.DUPFLAG BLANKS 1 END ; REPORT KEY, DATA, T.DUPFLAG FORMAT HEADINGS NAME END ; LET T.TEMP = DATA LET T.PRIORKEY = KEY LET T.DUPFLAG = ' ' ; END
This is the code in fixed format that is generated as a result of the ASL code above:
-------------------------------------------------------------------- RUN STMT FILE O N S D U T R A B S D S L RB R S M R S NAME TYPE NAME L E E I P R E U U R L C S PL J O O F S D W Q R D N P D F T M N T TK T P P O R -------------------------------------------------------------------- (ASLRC ) (RC) (MYDATA )(S) (S) (R) ***************************** * PROC NAME - LOGIC * ***************************** --------------------------------------------------------------- STMT FIELD FIELD FLD DEC OUTPT EDIT INITIAL TYPE NAME LNGTH TYP PLC EDIT LGTH VALUE --------------------------------------------------------------- (TF) (DUPFLAG ) ( 30) (C) (TF) (PRIORKEY) ( 10) (C) (TF) (TEMP ) ( 1) (C) (TF) (TEMP___1) ( 15) (C) (DATA DUPLICATES ) ------------------------------------------------------------------ ------------------ STMT SEQ LOG CON .....OPERAND-A..... OPER ............OPERAND-B ......RESULT.... TYPE NO. LEV CTR QLF FIELD SEG-LVL ATN QLF FIELD QLF FIELD ------------------------------------------------------------------ ------------------ (PR)(100) (DATA ) 001-1 (EQ) (T,TEMP ) (PR)(101) (NS) (103 ) (PR)(102) (T,TEMP___1) (C1) (T,PRIORKEY ) (T,DUPFLAG ) (PR)(103) (GO) (SUB REPT___2) (PR)(104) (R ) (DATA ) (T,TEMP ) (PR)(105) (R ) (KEY ) (T,PRIORKEY) (PR)(106) (R ) (C ) (T,DUPFLAG ) ***************************** * REQUEST NAME - REPT___2 * ***************************** ------------------------------------------------------------------------------ -------- STMT REPORT MAX SEL SUM VERT FORMS PAGE PAGE LINE REQ TYPE DATE REQUESTOR ID ITEMS CTL RPT SP CNTRL WDTH HGHT NOS? TYP -------------------------------------------------------------------------------------- (ER)(TODAY ) (S) ------------------------------------------------ ST SUM V 8 PG PG I SP MAXIMUM TB COLUMN TY RPT S LPI WID HGT M FRM LPP PGS TO HEADING MP P G LT TYP ------------------------------------------------ (E1) (F) -------------------------------- SP Q STMT SEQ BF L FIELD SEG- TYPE NO. COL F NAME LVL -------------------------------- (R1) (KEY ) 001-1 (R1) (DATA ) 001-1 (R1) (T,DUPFLAG )
Note that similar logic can be used to find redundant data in dependent segments as long as the data is presented in sequence.