When an SQL query is executed against a network IDMS Database, it is possible to get an error DB001025 INVALID DATA. This can occur for a few reasons, including:
1- network data bases do not have the integrity constraints that SQL databases do;
2- network databases allow redefines, so data that could be valid for one representation of a data element may not be valid for the data element that the SQL query is retrieving.
An outer join query may get this error, even when another type of join between the same data sources does not. That is because the outer join is processed in a very specific way. In order to determine which rows of two data sources would satisfy an outer join, we first do a Cartesian join. That means we extract every column and every row from both tables and pair them in every possible combination, regardless of any join criteria. That's our first intermediate result. Then, from that, we project out only the columns named in the query column list, and the rows which satisfy any value-based where clause. Because we're starting with the Cartesian product, we will examine every column in every row of both tables, and it's during that process that we will find any bad data that exists in either data source. We do not do the Cartesian join as a starting place for any other type join. So that's why you can get the data exception on columns that are not named or referenced in any way, in the query.
Now that we’ve established why the outer join is getting the error, to resolve it you will need to identify where the bad data is, and correct it. There are a few possible ways to do this. The first step is to issue a SELECT * against each of the data sources. This will indicate the problematic record. If you produce a DEBUGAMC trace on the query, you can provide that to support and we can help narrow down the columns (record elements) that may be causing the problem. If the record contains only a few data elements, you can simply issue a query
SELECT <column_name) FROM <data-source>;
for each element in the record; any element that contains bad data in any row will fail with the DB001025.
If the record contains many elements, then it may not be possible to immediately identify the problem data element. Naming either the specific column, or all columns, you can then use any network data query tool (OLQ, DMLO, Culprit, a program or dialog, etc) to retrieve all the occurrences of the problematic column (or all columns), along with the calc key or some other key that would uniquely identify the record occurrence that has the problem. Depending on the tool you use, the output will either return all of the rows and you'll be able to see the bad data in the report (usually something non-numeric in a numeric field); or it will return all of the rows with good data, then stop (or abend with a S0C7) on the occurrence with the bad data. In the latter case, the next row to be retrieved using whatever access method is specified (OBTAIN NEXT WITHIN AREA, OBTAIN NEXT IN SET, etc) after the last one shown, would be the one with bad data.
The data can be corrected with DMLO, a user program, or any other tool available to you.