Programming in COBOL

COBOL as a language

COBOL stands for COmmon Business Oriented Language and that is what it is. It is a language designed in the early 1960s by a collaboration of business, government, and universities for business use and it has standardization rules that make one implementation work essentially like all others. The government started the pressure for a common business language by saying it would only buy from companies that supported the language. As a result, COBOL became, and still is, the most popular business language. In 1968, the American National Standards Institute (ANSI) approved COBOL . COBOL 74 can still be seen at some installations but most companies use COBOL 85 and many use the upgraded version of COBOL 85. The newest version of COBOL is due shortly and will incorporate object oriented COBOL abilities in the language. Some manufacturers have already released versions.

COBOL was designed to be, and is, largely machine independent, which means that with minor modifications, usually to the way input and output paths are given, COBOL can be run on most computers today. This makes COBOL a high level language because the inner workings of the machine are transparent to the programmer. Because it is a high level language, COBOL must be compiled, which essentially means translated into a language that the computer can understand and execute. Different COBOL compilers may allow certain extras or be more forgiving about the interpretation of some rules, but all ANSI compilers support the official COBOL standards.

COBOL handles large amounts of data exceptionally well and excels at the type of processing that involves reading files one record at a time and producing output. This is because the detailed COBOL DATA DIVISION, lays out the format of each record that is being read and each record that is being written thus facilitating the use of tape or disk files. COBOL also supports screen input and output so it really is a versatile language. The kind of processing where files of records are read and written is classified as record I/O and the kind of processing where the data is keyed in and assigned to variables is classified as stream I/O. Many languages handle one or the other well, COBOL handles both record I/O and stream I/O, but its handling of record I/O sets it apart from many other languages.

Writing a program

The job of writing a well designed and effective program, requires that the programmer follow these steps:

Understanding the objective of the project that has been assigned. This means developing an understanding of what the program is supposed to accomplish and all of the problems associated with meeting this objective.
Analyzing the problem and developing a solution that is appropriate for computer implementation. Programmers are developing approachs to solving problems that will eventually be turned into computer programs, so they must not only consider the logical solution, they must consider technical aspects of the solution. (more about this in the section called Programming Logic)
Develop a logical solution to the problem. This means developing a detailed plan involving the logical steps that the program must follow. Frequently, the programmer uses a logic flowchart, pseudocode or some other logic tool to develop the logic of the program.
Write the program. The programmer must now write the program using an appropriate programming language. Obviously, in this course that means COBOL. In writing the program, the logical solution developed above will be the framework from which the programmer works. The program is simply a way to express the logical solution in a computer language. Frequently the language choice is dictated by the company that the program is being written for. Today, there are many computerized programming tools available to assist the programmer with this stage.
Some programmers write the program entirely on paper. In writing a COBOL program, the layout rules must be followed. Columns 1-6 are reserved for sequence number (1-3 for page and 4-6 for line number). Usually the COBOL programmer today makes no entry in these columns. Column 7 is reserved for special symbols: * for a remark and - for continuation. Column 8 is the beginning of Margin A (margin A includes characters 8, 9, 10 and 11). Certain COBOL entries are required to begin in margin A, for example DIVISIONs, SECTIONS, paragraph names, FDs, and 01s. Column 12 is the beginning of Margin B. COBOL commands, fields stored with a number other than 01 and many other COBOL entries start in Margin B. Margin B ends at column 72 and columns 73 through 80 are not used. Officially they were used for COBOL program names to identify the program. My theory is that in the days of programs written on card decks, if you dropped the deck of cards the identification numbers in 1-6 helped you reorganize the deck and if you dropped two decks at once the identification in 72-80 helped you determine which cards went with which program.
Today, more programmers write the program as they key it in to a text editor. If they are using a COBOL text editor it is frequently set up to start in column 8 and not allow entries after 72. Whatever approach is used, the entire program must eventually be entered into the computer. This is usually accomplished through the keyboard. When the programmer wants to key in the program, they load an text editor from the disk into memory. An editor is a program like a word processor, but designed to help the programmer key in the instructions in the correct format. The programmer keys in the program as input to the editor and saves the program as a disk file output from the editor. This saved program is called the SOURCE PROGRAM.
COMPILE STAGE: The SOURCE PROGRAM must now be translated into machine language. The machine language version of the program is called the OBJECT PROGRAM. This process of translating the COBOL program into an object program is called the COMPILE stage. The SOURCE PROGRAM was frequently written in an English like high level language that was easy for the programmer to work with. However, the computer needs the program in machine language. This translation is done with a compiler that reads the high level COBOL language of the SOURCE PROGRAM and converts each high level COBOL instruction into one or more machine language instructions. The OBJECT PROGRAM is the result. As the compiler compiles the program it checks for misuse of the language - known as syntax errors. If serious syntax errors are encountered, the source program cannot be translated. In this case the programmer will have to look at the errors the compiler generates and then go back to the COBOL Source Program and make the corrections. After correcting the program, it should now be recompiled. This process will continue until you get a clean compile which generates the Object Program.(see below for details)
LINK STAGE: Once the program has been successfully compiled, it must be linked. The link stage creates executable programs from OBJECT programs. The link stage essentially resolves addresses and combines code to produce an EXECUTABLE OBJECT PROGRAM. (see below for details) Today the link stage is frequently invisible. The programmer only sees compile and execute/run.
EXECUTE STAGE: The EXECUTABLE OBJECT PROGRAM can now be executed or RUN. This means that the executable object program itself is loaded into memory and the computer steps through the instructions in it, one instruction at a time and executes them. If the program is reading a file, that file will be read from the disk. If the program is writing a report, that report will be sent to the printer. The program is in control and it is being executed so what ever processing the programmer coded into the program is now being executed. The programmer now needs to examine the results. The program executed the commands as they were coded, but there may be problems in the logic of the program or there may be careless mistakes that result in incorrect output. If the output is incorrect, the programmer must determine the reasons and go back to the text editor, bring up the program, make modifications to the COBOL code and then recompile, link and execute the program and hope. Needless to say, the results must again be examined and errors must be handled in the manner described above.(see below for more details)

Program Documentation: Program documentation falls into two groups. Internal documentation which IS included as remarks within the program and external documentation which gives a complete write-up of the program. External documentation (frequently a manual) includes:

overview write-up of what the program does
layouts of the files, screens and reports
detailed description of what the program does
logic flowchart or pseudocode
program printout (of the COBOL code)
sample files, screens and outputs generated in test runs
controls in the program
operating description - when run, files needed, special paper etc.
documentation of all modifications made to the program

SUMMARY:

Understanding the project.
Analyze the problem and develop a solution that is appropriate for computer implementation.
Develop the logic of the program.
Write the program using a high level language and key it into the computer using an text editor. The result is a SOURCE PROGRAM stored on disk. This source program can be modified.
COMPILE the SOURCE PROGRAM to produce an OBJECT PROGRAM. If syntax errors are discovered during the compile, they must be fixed and the program recompiled until an object program is produced.
Link the OBJECT PROGRAM and produce an EXECUTABLE OBJECT PROGRAM.
EXECUTE the EXECUTABLE OBJECT PROGRAM. If the results are correct, the project is complete. If not. the programMER must reenter these steps at step 1, 2, 3, or 4 depending on the complexity of the problem.
DOCUMENT the program

Compiling

A COBOL program is written in an English type language that cannot be understood by the machine, therefore it has to be translated using a compiler. A compiler translates the code all at once and produces a machine version of the program. (Contrast to an interpreter which is supported by some other languages and translates a command and then executes) The machine version is produced only when the program is free of significant errors. TO DO A COMPILE: The compiler is loaded into memory from the disk and it is the compiler that is being executed:

INPUT	The SOURCE PROGRAM (high level language program) that has been keyed in and stored as a file on a disk is read as input to the compiler
PROCESSING	The compiler attempts to translate the program.
OUTPUT	A listing of the program including any syntax errors that were discovered or an on screen listing of the program with errors. If the compile was "clean", meaning no significant errors were encountered, the OBJECT PROGRAM (machine language version of the program) is produced and stored as a file on the disk.

If the attempt at a compile resulted in significant syntax errors (not warnings), they must be corrected. The programmer will look at the list of errors, determine the problem and determine how to correct it. Then the programmer will bring up the text editor at the computer, bring up the SOURCE PROGRAM file from the disk and use the text editor to make the corrections. The corrected SOURCE PROGRAM will be saved and a new compile will be tried. This process continues until a clean compile is achieved.

On some systems the link must be executed between the compile of the program and the execution of the program. In other systems the link is invisible and is done automatically. If your system requires the link step, the following will be done. TO DO A LINK: The LINK program is loaded into memory from the hard disk and then it is executed.

INPUT	Object program
PROCESSING	The link resolves addresses, combines code etc..
OUTPUT	Executable OBJECT program

Executing

Executing is running the executable object program that is a compiled and linked translation of your COBOL program. When a program is executed, it means that it is loaded into memory and the computer steps through the instructions one instruction at a time and executes the instruction. The program that the programmer wrote is now being executed. After execution is complete, it is critical that the programmer carefully check the output to make sure it is exactly what was intended. Frequently test data that contains a lot of purposeful problems is used to test the program to make sure that all programs are handled correctly. Usually, the program does not work correctly the first time or for that matter, the first couple of times that it is executed. Sometimes the program does nothing, sometimes it process the first few records from the file it is reading and then stops, sometimes the whole report it is producing is "garbage" other times there are problems with a couple of columns on the report. Other problems might be incorrect calculations, empty columns, missing or incorrect totals etc. These errors are LOGIC ERRORS. It is the programmer's job to examine what is happening and then go back to the program and determine what is causing the problem. Sometimes the problem is easy to solve. For example the missing column might mean the programmer simply forgot to move anything to that column and the missing totals might simply mean the programmer forgot to write the totals on the report. Other times the problem is complex and requires going back and analyzing the problem, rethinking some of the logic and making significant modifications to the instructions (code) of the program.

The systems analyst or someone else in charge of the project should also carefully review the output to make sure it is what is needed. This is very important, because sometimes a "second set of eyes" will catch something that the programmer missed! If the program works correctly it is usually tested more completely on live data and system tested which means output from one program is read as input to another program to make sure that data is being processed correctly (note your program could be the one producing the output or reading the input). When the program has successfully passed all of its tests, it is put into production.

If the program does not work correctly, the programmer has to figure out what is wrong, go back to the text editor and bring up the source program, fix it, recompile it (hopefully without errors, but if not they will have to be fixed), and then test the program again to see if the desired output has been produced. Honest, programming is fun!!!

TO EXECUTE A PROGRAM: The executable program is loaded into memory from the disk.

INPUT	The input to this process is whatever input is called for by the program. It may involve reading one or more disk files, accepting data from the keyboard etc.
PROCESSING	The executable object program is executed one instruction at a time until the program is complete (machine translation of STOP RUN is encountered)
OUTPUT	The output from this process is whatever output the program generates. It might be a report, it might be displays on the screen, it might be one or more disk files etc.

Programming Logic

Program Design:

Program design is a multiple step process. One approach that works well is:

Analyze the problem to make you sure you understand what your program is supposed to accomplish. Talk to the systems analyst and/or the program requestor for clarification.
Analyze the data. The programmer needs to look at the specifications for the output the program is supposed to produce and look at the input files or specifications and truly understand the layout. If the data being read is on existing files, the programmer should throughly understand what is contained in each field within the file. This includes understanding what is numeric, what is alphanumeric, what is coded, the length of the data, whether it is packed, display or binary, whether it is signed, whether it is grouped and whether the progammer needs to access only certain parts of the data in a field. The programmer should understand what data is valid in each field, whether the data that the program will read is edited data or whether invalid data needs to be screened. Sometimes the input data can be used directly to create the output, sometimes calculations need to be done or decisions need to be made and sometimes after analyzing the data it is clear that the data is insufficient to produce the desired output.
Analyze the processing to determine exactly what needs to be done to produce the desired output from the given input. This process is called FUNCTIONAL DECOMPOSITION (explained in the next section).
Design the logic of the program. This means layout out a step by step plan using a logic flowchart, pseudocode or some other logic tool that the programmer finds of value.

Functional Decomposition:

Functional decomposition is breaking down a large task/function/program into smaller parts which are known as subfunctions. To start the process of functional decomposition, the programmer must indentify the inputs, outputs and major tasks to be accomplished by the program. Next, the programmer logically groups the required tasks as functions. In doing functional decomposition, the programmer focuses on the tasks that are necessary to turn the given input into the required output. The focus here is on the process that will have to be accomplished, not how many times you will do the task.

Therefore, the starting point in functional decomposition is to examine the output and determine what is needed. Then the programmer looks at the input to determine if the data includes everything that will be needed to produce the output. Finally the programmer, decides on the processing that is necessary to turn the given input into the required output. The questions to ask are:

Exactly what output is required?
What input is needed to produce the required output?
What processing is needed to turn the given input into the required output?

Once the necessary functions have been defined, the programmer must break the functions down into activities. A function frequently includes multiple interrelated activites. For example, if the function is to calculate net pay then the activities might include: checking to see if the person is salary or hourly and if they are hourly calculating gross pay, then subtracting deductions and taxes from gross pay to get net pay. Each of these decisions and mathematical calculations is an activity within the calculate net pay function. Functional decomposition is the essence behind the concept of top-down design.

Top-down design is starting with the whole problem and then breaking the problem down into a series of tasks or functions which can then be broken down into detailed activities,essentially doing functional decomposition. Frequently the resulting break down is represented using a HIERARCHY CHART. Obviously, the next step is to take the functional decomposition or top-down design and proceed to the programming stage.

TOP-DOWN PROGRAMMING

Top-down programming is very compatible with structured programming. Essentially, top-down programming calls for the program to be broken up into modules with the high-level modules containing the programming elements that control the program and the low-level modules containing the details. In the sample programs, the program is first broken up into three modules: A-100-INITIALIZE, B-100-PROCESS, and C-100-TERMINATE. These modules will determine when the next level down, the 200 level will be executed, and in turn the 200 level will determine when the 300 level will be executed. For example B-100-PROCESS performs B-200-LOOP and B-200-LOOP determines when to perform B-300-HDR-ROUT. In this series of modules, the B-100-PROCESS contains the big picture and controls the entire processing element of the program. The B-200-LOOP controls how each individual record is processed. The details may appear in the B-200-LOOP or if the coding is too complex, the B-200-LOOP may have a command to perform B-310-CALC where the calculations are done or B-320-SETUP where the printline is setup. If a report is being produce, at the least, the B-200-LOOP will probably have the command to perform B-300-HDR-ROUT where the headers for each page are done.

When writing a program the functions are grouped into three categories:

Initialization
Processing
Termination

The mainline (MAINLINE) of the program is the first level of your hierarchy chart.

INITIALIZING
- open the files
- get the date if needed
PROCESSING
- read the initializing/first record
- set any hold areas or indicators, if needed
- perform a processing loop (B-200-LOOP) until a certain condition is met - frequently the condition is end of file
  - the processing loop - B-200-LOOP includes the instructions to process the record, write it and then read another record - it may execute other paragraphs such as a B-300-ROUT and if needed a B-310-CALC-ROUT
- after processing is the B-200-LOOP is complete, do any wrapup processing such as writing final total lines in a B-210-FINAL-TOTALS
TERMINATION
- close files

The structure of these modules is shown in a HIERARCHY CHART. The hierarchy chart shows the modules by name, but does not give any information about the processing that takes place in the module. Its function is to show the control, which module executes which module and where the control will return after the module has been executed. To understand the detail in each module, the programmer can check the flowchart, the pseudocode or the actual COBOL code.

Structured Programming

Structured programming is a method of programming that helps the programmer develop clean, well written programs. It has been shown that a structured program is more readable, easier to test and debug, easier to maintain, more reliable, and more effective. Structured programming is comprised of three logical structures: sequence, selection and iteration. Sequence structure: In a sequential structure, the commands are executed in sequence. The flow of the program is to complete one instruction and then drop down and execute the next instruction and then the next until something terminates the sequence such as the end of a paragraph.

Example:

                MOVE NAME-IN TO NAME-PR.
		MULTIPLY AMT-IN BY 1.5
		    GIVING AMT-PLUS-WS.
		MOVE AMT-PLUS-WS TO AMT-PLUS-PR.
		WRITE PRINTZ 
		    AFTER ADVANCING 1 LINES.

Selection structure: In a selection structure the processing is dependent on a condition that is being tested. In COBOL, the selection structure is usually accomplished with an IF or an EVALUATE (the implementation of the case structure in COBOL) or with an implied IF such as the AT END clause in the READ statement.

Example #1:

                  	IF EMPLOYEE-CODE = "S"
	                    MOVE SALARY TO SALARY-PR
                        ELSE
                            MULTIPLY PAY-HRS BY HRS-WK
                                GIVING WEEK-PAY-WS
                            MOVE WEEK-PAY-WS TO WEEK-PAY-PR.

Note: The ELSE in this example also uses the sequence structure since there are two commands - this example indicates the combination of the two structures.

Example #2: (implied if)

                        READ INPUT-FILE
			    AT END
				MOVE "YES" TO EOF-IND.

Iteration structure: (LOOP STRUCTURE) The iteration structure causes something to be executed over and over again until some condition terminates the repetition. This structure is essentially the looping structure that has been used in all of the programs. When defining iteration, there are two basic structures that a language may implement: Do-While and Do-Until. The difference between the two structures is when the condition is tested. In the Do-While structure the condition is tested before the loop is executed while in the Do-Until structure the condition is tested after the loop has been executed. This means that with the Do-While structure there is a possibility that the loop will never be executed. The PERFORM...UNTIL used in the sample programs is an example of the Do-While structure because the condition is tested before the loop is executed. For example in the PERFORM B-200-LOOP UNTIL EOF-IND = "YES", if the EOF-IND is already set to YES the B-200-LOOP will never be executed. In the Do-Until structure, since the test is done after the loop has been executed the loop will always be executed at least once. COBOL allows the programmer to modify the PERFORM...UNTIL to conform to the Do-Until structure by adding the clause WITH TEST AFTER.

Example of COBOL PERFORM using the standard DO-WHILE structure:

		PERFORM B-200-LOOP	
                    UNTIL EOF-IND = "YES".

Example of COBOL PERFORM using the DO-UNTIL structure:

		PERFORM B-300-CALC
	            WITH TEST AFTER
		    UNTIL CT-FLD > 12.

In this example, B-300-CALC will be executed before the CT-FLD > 12 test is done, therefore it will always be executed once. After B-300-CALC has been executed once, the test will be made to determine if the paragraph should be executed again.