Editing Data

All data when it is initially entered into the system should be checked for errors so that bad data does not get put onto permanent disk files. Remember the rule: "Garbage in, garbage out!" This process of error checking is called EDITING.

Examples of the types of errors that editing looks for are:

no data in a field where data is required - for example idno is frequently required, as is name, hours worked etc
non numeric data in a numeric field - for example a letter or comma in a numeric field
non alphabetic data in a alphabetic field - for example a hyphen or apostrophe in a alphabetic field
invalid codes - for example if the valid codes are A, B, and C these are the only entries allowed in the field
data that is out of range - for example pay rates range from $6.00 per hour to $25.00 per hour - anything outside this range is invalid
data for a particular code that is out of range - for example pay rates for pay code P range from $6.00 per hour to $20.00 per hour - this involves checking first for code P and then for the range
group of fields evaluated together meet specifications - for example the sum of regular hours, overtime hours, sick hours and vacation hours must equal 40 or more
field blank or contains data depending on value in another field - for example salaried employees cannot have an entry in hours worked or overtime hours while hourly workers must have an entry in hours worked
check digit - a check digit is the calculated last digit of an identification type of number such as employee number or item number - for example, with an eight digit id # the first seven digits would be assigned and the eighth digit (the check digit) would be calculated using special formulas designed for this purpose - the eighth digit now becomes part of the id - any time the id is typed in the calculation can be redone on the first seven digits to see if the answer is the same as the eighth digit, if it is then the id is considered valid - this is a great technique for catching transposition of digits etc.
batch editing - in batch editing a group of transactions are grouped together as a batch, for example 20 transactions might be called a batch - each batch is given a number - before data entry, the batch of transactions are gathered and totals are run on significant numeric fields - this might mean running a total on part number, on hand, cost etc. - as many totals can be gathered as needed - this total information is entered into a batch header along with the batch number - when the data is being keyed in, the batch header is keyed in followed by all of the transactions in the batch and then another batch header followed by its transactions - in the edit program the batch header is read and the totals on it are stored in memory, then the transactions are read one at a time and the same totals are accumulated (if you did part number, on hand and cost you would total the same three fields) - when a new batch header is read, it is the signal that the old batch is complete and the totals are compared - if the totals that were accumulated during processing do not match the totals from the batch header, the batch is considered to be unbalanced and the information is printed out - the advantage of this system is that the unbalanced batch only involves 20 transactions so finding the error or errors is a much less significant problem than searching for the errors on thousands of transactions

Edit Program:

Possible Input to EDIT Program	Possible Output from EDIT Program
Transactions coming in on a disk Transactions being keyed in	Written report of all error Written report of valid transactions Disk file of valid transactions Disk file of records flagged as invalid - to be processed and corrected - frequently using on-line processing

When the data on input records is processed in an EDIT program, the errors are reported so that the data can be corrected and reentered into the system. The input data usually comes in on either a screen or a disk. The valid records are usually written on a disk and a record is made of them on a report. The bad records are definitely written on a report and they may also be written to an error disk where they can be corrected. If you are processing inactively, the bad records may be displayed on the screen so the user can fix them immediately. After corrections have been made, the record is rechecked. If the corrections were satisfactory the record gets written to the permanent disk file, if not it is either redisplayed or written to a print report.

There are two major categories of data entry. The first is data entry by a clerk that does not know the data and is entering the data as fast and as accurately as possible. Since the clerk does not know the data, this kind of data entry frequently takes what is keyed and creates a disk file to be checked in an EDIT program. The second is data entry by a clerk that knows the data and can make many of the corrections immediately if they are pointed out by an EDIT program. This kind of data entry is frequently done interactively with the clerk sitting at the computer and entering the data onto fields on the screen. The EDIT program analyzes the entry and reports errors that can be interactively fixed by the clerk. Hopeless errors or errors where there is insufficient data are report, but the rest are fixed on-line.

As can be seen, reporting is an important part of editing. Both valid and invalid records are reported - usually on separate reports, but occasionally you will see reports that mix valid and invalid record reporting. The report can be done using a variety of styles depending on the needs of the users. The important thing is that on a report of valid transactions the entire record is printed and on an error report, for each field in error, the user can identify:

the id# or some other identifying field from the record
the contents of the field that is in error
an error message that explains the error

A few possible error report styles:

For a valid transaction report, the most common style is to print each field in a column - if there are too many fields, then you can stagger the columns (and the headers) so that some fields appear on the first line and some fields appear on the second line but they are staggered under the appropriate headers
For an error report, the easiest style is to write one line for every error that is found
An error report can be done so that the data appears on one line and the error message appears on the line beneath it
An error report can be done so that each error results in a code and the codes are all printed in the error code column while the data appears in regular columns - the user will receive a handbook containing all codes and their meanings

COBOL

Before we start to look at the logic of an edit program, there are some COBOL and programming techniques that you should be aware of.

Writing records to a disk

If you plan to write a record to a disk you will be creating a disk file. Therefore the file must be defined in both the SELECT statement and in the FILE SECTION with a FD. The record that you will write must pass through the 01 level of the FD. The programmer can either define the fields on the record under the 01 level of the FD in the FILE SECTION or set up the record in WORKING-STORAGE and define the fields there. In the PROCEDURE DIVISION, the file must be OPENed as an OUTPUT file since you are writing to the disk. When the record is ready to be written, the WRITE statement will be used. Since you are writing to a disk instead of the printer, there are no AFTER ADVANCING clauses. When the program is complete, the file must also be CLOSEd.

Example

	SELECT  NEW-DISK-FILE
		ASSIGN TO "A:\NEWFILE.DAT".

FILE SECTION.
FD  NEW-DISK-FILE
       DATA RECORD IS NEW-DISK-REC.
01   NEW-DISK-REC.
       05  IDNO-DSK                      PIC X(4).
       05  NAM-DSK		       PIC X(20).
       ....
PROCEDURE DIVISION.
...
A-100-INITIALIZE.
       OPEN INPUT...
	      OUTPUT NEW-DISK-FILE
...
...
B-200-LOOP.
       ...
      code to setup the record my moving data to IDNO-DSK, NAM-DSK etc.
      WRITE NEW-DISK-REC.
       ...

C-100-TERMINATE.
     CLOSE ...
                   NEW-DISK-FILE.
   
	If the record was set up in WORKING-STORAGE, for example using 01  OUTPUT-REC,  then the 
        write statement would read like this:

	WRITE NEW-DISK-REC FROM OUTPUT-REC.

Redefines

The REDEFINES clause is used when you have a field that you want to look at two ways. For example a field can be given a numeric picture and then redefined and given an alphanumeric picture:

        05  FLDX			PIC 9(5).
	05  RDF-FLDX REDEFINES FLDX	PIC X(5).

If you use the name FLDX you are referring to a numeric field that you may use in a calculation or move to an edited field. If you use the name RDF-FLDX, you are referring to an alphanumeric field. This is useful if you are checking an incoming field to see if it is numeric. If it is you want to print it on the report as an edited numeric, if it isn't you want to print it on the report as a non-numeric. In other words, you would move FLDX if it passed your editing tests and you would move RDF-FLDX if it did not pass the tests and was therefore not numeric.

Another way you might use the REDEFINES is on the print line. For example, suppose you want to print a numeric field in a column if the field passes your input tests and print a message in the field if it does not pass the tests. This can be done using the REDEFINES or it can be done by breaking down the field into a numeric sub field. Both ways are illustrated below:

Using the REDEFINES:

	05   AMT-PR 				PIC $ZZZ,ZZZ.99.
	05   RDF-AMT-PR REDEFINES AMT-PR	PIC X(11).

In the PROCEDURE DIVISION, when the programmer wants to move a number to the area they would code:

				MOVE AMT-WS TO AMT-PR.

If instead, they were moving a message they would code either of the following:

				MOVE MSG-WS TO RDF-AMT-PR.
				MOVE "* INVALID *" TO RDF-AMT-PR.

Breaking up the field:

	 05  MSG-PR.
	     10  AMT-PR		PIC $ZZZ,ZZZ.99.
	     10  FILLER		PIC X(9).

Two things need to be remembered. First, when a field is broken up into sub fields, the top field (the one that is being broken up) is not given a PIC. The PIC is the sum of the sub fields beneath it. Second, the field that is divided is considered to be alphanumeric even though the parts may all be numeric. In this case, the parts are a mixture, but MSG-PR is considered to be alphanumeric. In this example, if there is valid numeric data, the programmer will move the data to AMT-PR. However, the programmer decided that the error message needed more room. This caused the addition of the second 10 level which gives MSG-PR an additional nine characters beyond AMT-PR. The MOVE statements that could be used in this program to move either a number or a message to the column on the print line are illustrated below. First, to move a number the following MOVE statement could be used:

			MOVE AMT-WS TO AMT-PR.

If instead the programmer wanted to move a message to the field on the print line they would code either of the following:

			MOVE MSG-WS TO MSG-PR.
			MOVE "*** INVALID DATA ***" TO MSG-PR.

Before moving on to the next topic, we will examine another use of the REDEFINES that doesn't relate to editing. In dealing with percents, you want to use the decimal number for calculations and the whole number to print or display. The REDEFINES lets you set this up easily:

		05  PERC					PIC V99.
		05  RDF-PERC REDEFINES PERC 	PIC 99.

When the programmer is using the percent in a calculation they will use the name PERC which as the PICTURE of V99, however, when the program wants to move the percent to the printline, RDF-PERC is the field that will be moved.

Is Numeric or Is Alphabetic Test

COBOL has a numeric or alphabetic test that can be used to test data and make sure that it contains the expected catagorie of characters. A field can be tested to see if it contains just numeric digits or just alphabetic characters (spaces are acceptable). The test is a clause that can be used with the IF statement:

SYNTAX:
		IF {fieldname} IS NUMERIC
			       IS ALPHABETIC

EXAMPLE:
	
	IF ONHAND IS NUMERIC
	     MOVE ONHAND TO ONHAND-PR
	ELSE
             error processing to handle the non-numeric data.	

NOTE:  With many compilers you cannot move a non-numeric field to a numeric output field so this 
test becomes very important.

Indicators

You have already seen an indicator used to tell when the end of the file has been reached. Indicators can also be used in other ways to make the program work well. For example in an edit program, records with no errors may get written to a disk while records with errors will be printed on a report. Since there are many fields on the record, and you want to check every field for accuracy, checking of the records may involve a lot of code and several different routines. To make sure you know whether any errors have been found, an indicator can be used. For example, whenever an error is found the indicator can be set by moving "YES" to the indicator. It doesn't matter whether you move "YES" to the indicator once because you found one error or ten times because you found ten errors, the indicator saying YES will indicate that the record is invalid and therefore should not be written to the disk. The indicator should be set up in WORKING-STORAGE with the other indicators and it can be given level 88 names if you want to:

	01  INDICATORS.
	    05  MORE-RECS		PIC XXX		VALUE "YES".
	    05  VALID-REC-IND           PIC XXX		VALUE "YES".
		88   VALID-REC				VALUE "YES".
		88    INVALID-REC			VALUE "NO ".

Sample editing program

The sample edit program (PAYEDIT.CBL) is a simple version of an edit. The input comes in on a disk, Good output is written to a disk and errors are written to a report (one error per line). Note that in the real world, good records might get written to a separate report in addition to being written on the disk. This would involve creating two printer reports and our sample does not do this.

The sample edit program has the following input and output files:

disk file input containing the records to be edited
disk file output containing the records that passed the edit tests and are to become a permanent part of our system
printer file containing the records that were found to contain errors - these records were not written on the disk file output - on our report, each error will be printed on a separate line

The sample program is editing payroll transactions. Each transaction record is checked for the following:

The first character of the employee id must contain an F, P, C, or T - these are set up with a level 88 for VALID-EMP-CD rather than checking for each letter within the IF statement.
The rest of the id number (4 characters) must be numeric
The work hours must be numeric and cannot be greater than 40
The overtime hours must be numeric and cannot be greater than 20
The sick hours must be numeric and cannot be greater than 40
The vacation hours must be numeric and cannot be greater than 40
The holiday hours must be numeric and cannot be greater than 8
If holiday hours = 0 then the holiday code must be blank. If there are holiday hours than the code must be N, K, M, L, T or C
Bonus pay must be numeric and cannot be greater than 1000
If employee code is F (full-time) than the some of the employee hours (regular, overtime, sick, vacation, and holiday) cannot be less than 40
The sum of the employee hours (regular, overtime, sick, vacation and holiday) cannot exceed 80

If an error is found a line is printed on the error report. This means if a transaction has 5 errors, there will be 5 lines on the error report. Only records with no errors are written to the output disk.

The edit program reads the initializing record. Then it performs the B-200-LOOP. In the B-200-LOOP the program sets a VALID-REC-IND to "YES". The program then PERFORMs the routine that check each field for accuracy. If the field is in error, the field and a message are written to the printer and the VALID-REC-IND gets set to "NO". Each time an error is found, another line explaining the error gets written on the report. When all the field have been tested, control returns to the B-200-LOOP and the VALID-REC-IND is checked. If the indicator still contains YES, the record is accurate and gets written to the disk and 1 gets added to the valid record count. If the indicator has been changed to NO, no record gets written on the disk and 1 gets added to the invalid record count. Note that nothing is written to the printer at this point because the error is printed as soon as it is discovered.