Editing Data
All data when it is initially entered into the system should be checked for errors so that bad
data does not get put onto permanent disk files.
Remember the rule: "Garbage in, garbage out!"
This process of error checking is called EDITING.
Examples of the types of errors that editing looks for are:
- no data in a field where data is required - for example idno is frequently
required, as is name, hours worked etc
- non numeric data in a numeric field - for example a letter or comma in a
numeric field
- non alphabetic data in a alphabetic field - for example a hyphen or apostrophe
in a alphabetic field
- invalid codes - for example if the valid codes are A, B, and C these are the
only entries allowed in the field
- data that is out of range - for example pay rates range from $6.00 per hour to
$25.00 per hour - anything outside this range is invalid
- data for a particular code that is out of range - for example pay rates for
pay code P range from $6.00 per hour to $20.00 per hour - this involves checking first for code P
and then for the range
- group of fields evaluated together meet specifications - for example the sum
of regular hours, overtime hours, sick hours and vacation hours must equal 40 or more
- field blank or contains data depending on value in another field - for example
salaried employees cannot have an entry in hours worked or overtime hours while hourly workers
must have an entry in hours worked
- check digit - a check digit is the calculated last digit of an identification
type of number such as employee number or item number - for example, with an eight digit id #
the first seven digits would be assigned and the eighth digit (the check digit) would be
calculated using special formulas designed for this purpose - the eighth digit now becomes part
of the id - any time the id is typed in the calculation can be redone on the first seven digits
to see if the answer is the same as the eighth digit, if it is then the id is considered valid -
this is a great technique for catching transposition of digits etc.
- batch editing - in batch editing a group of transactions are grouped together
as a batch, for example 20 transactions might be called a batch - each batch is given a number -
before data entry, the batch of transactions are gathered and totals are run on significant
numeric fields - this might mean running a total on part number, on hand, cost etc. - as many
totals can be gathered as needed - this total information is entered into a batch header along
with the batch number - when the data is being keyed in, the batch header is keyed in followed
by all of the transactions in the batch and then another batch header followed by its
transactions - in the edit program the batch header is read and the totals on it are stored in
memory, then the transactions are read one at a time and the same totals are accumulated (if you
did part number, on hand and cost you would total the same three fields) - when a new batch
header is read, it is the signal that the old batch is complete and the totals are compared - if
the totals that were accumulated during processing do not match the totals from the batch header,
the batch is considered to be unbalanced and the information is printed out - the advantage of
this system is that the unbalanced batch only involves 20 transactions so finding the error or
errors is a much less significant problem than searching for the errors on thousands of
transactions
Edit Program:
Possible Input to EDIT Program |
Possible Output from EDIT Program |
- Transactions coming in on a disk
- Transactions being keyed in
|
- Written report of all error
- Written report of valid transactions
- Disk file of valid transactions
- Disk file of records flagged as invalid - to be processed and corrected - frequently
using on-line processing
|
When the data on input records is processed in an EDIT program, the errors are reported so that
the data can be corrected and reentered into the system. The input data usually comes in on
either a screen or a disk. The valid records are usually written on a disk and a record is made
of them on a report. The bad records are definitely written on a report and they may also be
written to an error disk where they can be corrected. If you are processing inactively, the bad
records may be displayed on the screen so the user can fix them immediately. After corrections
have been made, the record is rechecked. If the corrections were satisfactory the record gets
written to the permanent disk file, if not it is either redisplayed or written to a print report.
There are two major categories of data entry. The first is data entry by a clerk that does not
know the data and is entering the data as fast and as accurately as possible. Since the clerk
does not know the data, this kind of data entry frequently takes what is keyed and creates a disk
file to be checked in an EDIT program. The second is data entry by a clerk that knows the data
and can make many of the corrections immediately if they are pointed out by an EDIT program.
This kind of data entry is frequently done interactively with the clerk sitting at the computer
and entering the data onto fields on the screen. The EDIT program analyzes the entry and reports
errors that can be interactively fixed by the clerk. Hopeless errors or errors where there is
insufficient data are report, but the rest are fixed on-line.
As can be seen, reporting is an important part of editing. Both valid and invalid records are
reported - usually on separate reports, but occasionally you will see reports that mix valid and
invalid record reporting. The report can be done using a variety of styles depending on the needs
of the users. The important thing is that on a report of valid transactions the entire record is
printed and on an error report, for each field in error, the user can identify:
- the id# or some other identifying field from the record
- the contents of the field that is in error
- an error message that explains the error
A few possible error report styles:
- For a valid transaction report, the most common style is to print each field in a column -
if there are too many fields, then you can stagger the columns (and the headers) so that some
fields appear on the first line and some fields appear on the second line but they are staggered
under the appropriate headers
- For an error report, the easiest style is to write one line for every error that is found
- An error report can be done so that the data appears on one line and the error message
appears on the line beneath it
- An error report can be done so that each error results in a code and the codes are all
printed in the error code column while the data appears in regular columns - the user will
receive a handbook containing all codes and their meanings
COBOL
Before we start to look at the logic of an edit program, there are some COBOL and programming
techniques that you should be aware of.
Writing records to a disk
If you plan to write a record to a disk you will be creating a disk file. Therefore the file
must be defined in both the SELECT statement and in the FILE SECTION with a FD. The record that
you will write must pass through the 01 level of the FD. The programmer can either define the
fields on the record under the 01 level of the FD in the FILE SECTION or set up the record in
WORKING-STORAGE and define the fields there. In the PROCEDURE DIVISION, the file must be OPENed
as an OUTPUT file since you are writing to the disk. When the record is ready to be written,
the WRITE statement will be used. Since you are writing to a disk instead of the printer, there
are no AFTER ADVANCING clauses. When the program is complete, the file must also be CLOSEd.
Example
SELECT NEW-DISK-FILE
ASSIGN TO "A:\NEWFILE.DAT".
FILE SECTION.
FD NEW-DISK-FILE
DATA RECORD IS NEW-DISK-REC.
01 NEW-DISK-REC.
05 IDNO-DSK PIC X(4).
05 NAM-DSK PIC X(20).
....
PROCEDURE DIVISION.
...
A-100-INITIALIZE.
OPEN INPUT...
OUTPUT NEW-DISK-FILE
...
...
B-200-LOOP.
...
code to setup the record my moving data to IDNO-DSK, NAM-DSK etc.
WRITE NEW-DISK-REC.
...
C-100-TERMINATE.
CLOSE ...
NEW-DISK-FILE.
If the record was set up in WORKING-STORAGE, for example using 01 OUTPUT-REC, then the
write statement would read like this:
WRITE NEW-DISK-REC FROM OUTPUT-REC.
Redefines
The REDEFINES clause is used when you have a field that you want to look at two ways. For
example a field can be given a numeric picture and then redefined and given an alphanumeric
picture:
05 FLDX PIC 9(5).
05 RDF-FLDX REDEFINES FLDX PIC X(5).
If you use the name FLDX you are referring to a numeric field that you may use in a calculation
or move to an edited field. If you use the name RDF-FLDX, you are referring to an alphanumeric
field. This is useful if you are checking an incoming field to see if it is numeric. If it is
you want to print it on the report as an edited numeric, if it isn't you want to print it on the
report as a non-numeric. In other words, you would move FLDX if it passed your editing tests
and you would move RDF-FLDX if it did not pass the tests and was therefore not numeric.
Another way you might use the REDEFINES is on the print line. For example, suppose you want to
print a numeric field in a column if the field passes your input tests and print a message in
the field if it does not pass the tests. This can be done using the REDEFINES or it can be done
by breaking down the field into a numeric sub field. Both ways are illustrated below:
Using the REDEFINES:
05 AMT-PR PIC $ZZZ,ZZZ.99.
05 RDF-AMT-PR REDEFINES AMT-PR PIC X(11).
In the PROCEDURE DIVISION, when the programmer wants to move a number to the area they would code:
MOVE AMT-WS TO AMT-PR.
If instead, they were moving a message they would code either of the following:
MOVE MSG-WS TO RDF-AMT-PR.
MOVE "* INVALID *" TO RDF-AMT-PR.
Breaking up the field:
05 MSG-PR.
10 AMT-PR PIC $ZZZ,ZZZ.99.
10 FILLER PIC X(9).
Two things need to be remembered. First, when a field is broken up into sub fields, the top
field (the one that is being broken up) is not given a PIC. The PIC is the sum of the sub
fields beneath it. Second, the field that is divided is considered to be alphanumeric even
though the parts may all be numeric. In this case, the parts are a mixture, but MSG-PR is
considered to be alphanumeric. In this example, if there is valid numeric data, the programmer
will move the data to AMT-PR. However, the programmer decided that the error message needed
more room. This caused the addition of the second 10 level which gives MSG-PR an additional
nine characters beyond AMT-PR. The MOVE statements that could be used in this program to move
either a number or a message to the column on the print line are illustrated below. First, to
move a number the following MOVE statement could be used:
MOVE AMT-WS TO AMT-PR.
If instead the programmer wanted to move a message to the field on the print line they would
code either of the following:
MOVE MSG-WS TO MSG-PR.
MOVE "*** INVALID DATA ***" TO MSG-PR.
Before moving on to the next topic, we will examine another use of the REDEFINES that
doesn't relate to editing. In dealing with percents, you want to use the decimal number for
calculations and the whole number to print or display. The REDEFINES lets you set this up
easily:
05 PERC PIC V99.
05 RDF-PERC REDEFINES PERC PIC 99.
When the programmer is using the percent in a calculation they will use the name PERC which as
the PICTURE of V99, however, when the program wants to move the percent to the printline,
RDF-PERC is the field that will be moved.
Is Numeric or Is Alphabetic Test
COBOL has a numeric or alphabetic test that can be used to test data and make sure that it
contains the expected catagorie of characters. A field can be tested to see if it contains just
numeric digits or just alphabetic characters (spaces are acceptable). The test is a clause that
can be used with the IF statement:
SYNTAX:
IF {fieldname} IS NUMERIC
IS ALPHABETIC
EXAMPLE:
IF ONHAND IS NUMERIC
MOVE ONHAND TO ONHAND-PR
ELSE
error processing to handle the non-numeric data.
NOTE: With many compilers you cannot move a non-numeric field to a numeric output field so this
test becomes very important.
Indicators
You have already seen an indicator used to tell when the end of the file has been reached.
Indicators can also be used in other ways to make the program work well. For example in an edit
program, records with no errors may get written to a disk while records with errors will be
printed on a report. Since there are many fields on the record, and you want to check every
field for accuracy, checking of the records may involve a lot of code and several different
routines. To make sure you know whether any errors have been found, an indicator can be used.
For example, whenever an error is found the indicator can be set by moving "YES" to the
indicator. It doesn't matter whether you move "YES" to the indicator once because you found
one error or ten times because you found ten errors, the indicator saying YES will indicate
that the record is invalid and therefore should not be written to the disk. The indicator
should be set up in WORKING-STORAGE with the other indicators and it can be given level 88 names
if you want to:
01 INDICATORS.
05 MORE-RECS PIC XXX VALUE "YES".
05 VALID-REC-IND PIC XXX VALUE "YES".
88 VALID-REC VALUE "YES".
88 INVALID-REC VALUE "NO ".
Sample editing program
The sample edit program (PAYEDIT.CBL) is a simple version of an edit. The input comes in
on a disk, Good output is written to a disk and errors are written to a report (one error per
line). Note that in the real world, good records might get written to a separate report in
addition to being written on the disk. This would involve creating two printer reports and our
sample does not do this.
The sample edit program has the following input and output files:
- disk file input containing the records to be edited
- disk file output containing the records that passed the edit tests and are to become a
permanent part of our system
- printer file containing the records that were found to contain errors - these records were
not written on the disk file output - on our report, each error will be printed on a separate
line
The sample program is editing payroll transactions. Each transaction record is checked for the
following:
- The first character of the employee id must contain an F, P, C, or T - these are set up with
a level 88 for VALID-EMP-CD rather than checking for each letter within the IF statement.
- The rest of the id number (4 characters) must be numeric
- The work hours must be numeric and cannot be greater than 40
- The overtime hours must be numeric and cannot be greater than 20
- The sick hours must be numeric and cannot be greater than 40
- The vacation hours must be numeric and cannot be greater than 40
- The holiday hours must be numeric and cannot be greater than 8
- If holiday hours = 0 then the holiday code must be blank. If there are holiday hours than
the code must be N, K, M, L, T or C
- Bonus pay must be numeric and cannot be greater than 1000
- If employee code is F (full-time) than the some of the employee hours (regular, overtime,
sick, vacation, and holiday) cannot be less than 40
- The sum of the employee hours (regular, overtime, sick, vacation and holiday) cannot exceed
80
If an error is found a line is printed on the error report. This means if a transaction has 5
errors, there will be 5 lines on the error report. Only records with no errors are written to
the output disk.
The edit program reads the initializing record. Then it performs the B-200-LOOP. In the
B-200-LOOP the program sets a VALID-REC-IND to "YES". The program then PERFORMs the routine
that check each field for accuracy. If the field is in error, the field and a message are
written to the printer and the VALID-REC-IND gets set to "NO". Each time an error is found,
another line explaining the error gets written on the report. When all the field have been
tested, control returns to the B-200-LOOP and the VALID-REC-IND is checked. If the indicator
still contains YES, the record is accurate and gets written to the disk and 1 gets added to the
valid record count. If the indicator has been changed to NO, no record gets written on the
disk and 1 gets added to the invalid record count. Note that nothing is written to the printer
at this point because the error is printed as soon as it is discovered.