Two File Processing
Assuming max of 1 record per id on each file:
When two files are processed together (assuming a max of 1 record per id on each file), there are a variety of ways they can be handled depending on the desired processing outcome. If we assume sequential processing, the two files should be sorted in the match sequence (id in this explanation). If we are doing random processing one file would be designated as the primary file and it would be read sequentially and the second file would be randomly accessed for a match. If the access was successful then you have a match, invalid key means no match. If you want the output in order, the primary file should be sorted. Some of the possibilities listed below:
- The two files could be merged together and one output file created. The output file would contain the merged records from the two input files - in other words, all of the records from the input file would appear on the output file. This might be information from last year and this year.
- The two files could be merged together but there could be a requirement that each file has a record with each id - records without a match would be errors and only the matching records would be merged. This could be merging department expenses from last year and this year.
- The two files could be merged together but the records with the same id could be merged into one record. This could mean additional information on the merged output file or it could mean if you were dealing with amounts that the information from the two records was added together etc. If one of the input records did not contain a match to the other the output record would still be completed but certain fields might be blank or contain information only from the one input file that contained the record. This could be product sales but we are combining the information on one record - if the product does not occur on both files only part of the information would be on the merged record.
- The two files could be merged together as described in #3, but there is a requirement that each file has a record with each id - records without a match would be errors and only the matching records would be merged. This could be department information by year and we require information for both years.
Obviously there are other scenarios such as only writing those without a match on the output file, only writing the records from the first file on the output file if they have a match or if they don't have a match etc. But the four listed at the top are the more frequently needed processing scenarios.
Assuming max of 1 record per id on one file (0 or 1) and a variable number or records (0 to...) on the other file:
When these two files are processed together there are a number of ways it could be handled:
- One way is to say 1 record is required on file one but there can be 0 or 1 or many on file two. For example: this could be a book club situation where the first file is the customer file and the second file is the book order file. If I have enrolled in the club and bought no books than I have 1 record on the first file and 0 records on the second file. If I have bought one book that I have 1 record on each. If I have bought 5 books, I have 1 record on the first file and 5 records on the second file.
- You might have the same scenario as #1 but you can't belong to the club unless you buy a book so file two 1 record or many but not 0.
- Another way might be there could be 0 or 1 records on the first file and 0 to many on the second file and we want to merge them together. The first file might be information on a product for last year (if we didn't carry the product last year there would be 0) and the second file might be monthly information from this year. The output should contain information about last year and information about the status so far this year. The information could be on one record or multiple.
- You could do a pure merge of the two files and produce one output file with all of the records from both files.
Assuming any number of records per id on each file:
This means any possibility of 0 to many on each file. Some of the ways this could be handled are:
- Merge all of the records from both files together by id and produce one output file with all of the records from both files. You could either require that there be at least one record from both files or make it okay to have a record or records on just one file.
- Take all of the information from both of the files for a particular id and produce one record on the input file that contains all the information or sums all of the information from both files. You could require that there be at least one record from both files or make it okay to have a record or records on just one file.