INTRODUCTION
Design of the course:
In this course, you will learn to program through example and expand upon
the examples given to produce your own COBOL code. We will start out by
examining a simple COBOL program and add the more subtle elements of the
language when needed to write our sample programs.
Structure of a COBOL Program:
The structure of the COBOL program makes COBOL a relatively easy language to
learn and work with, a fact which in some way explains its popularity and
longevity. The COBOL program is set up to allow for very flexible data
handling and very organized processing and therefore fits in well with the
concepts of STRUCTURED programming.
The COBOL program is made up of four DIVISIONS:
- IDENTIFICATION DIVISION
- ENVIRONMENT DIVISION
- DATA DIVISION
- PROCEDURE DIVISION
Within a DIVISION, there can be further breakdowns which are called
SECTIONS.
Each of these four divisions has specific words or clauses that are
classified as reserved words because they have a specific function.
The reserved words have a specific meaning when the program is being compiled
and it is important that they be used correctly. Frequently, in conjunction
with reserved words, there are entries that the programmer must make which are
specific to the program. In addition, the programmer must be careful to
observe the margin or column structure of the program. We will look at this
in more detail, but for now note that COBOL has two margins: margin A and
margin B. Margin A starts in column 8 and goes through column 11 while
margin B starts in column 12. Many COBOL entries are required to start in
either margin A or margin B, and the programmer must be aware of the rules.
FIRST PROGRAM
The program that is being illustrated here reads a record from a disk and
prints the information out on a printed report.
IDENTIFICATION DIVISION: On its simplest level, this DIVISION
identifies the name of the program and its author. Other identifying
information can be added for documentation purposes.
IDENTIFICATION DIVISION.
PROGRAM-ID. SAMPLE1.
AUTHOR. GROCER.
The words IDENTIFICATION DIVISION, define the current division. On the next
line, the word PROGRAM-ID specifies that the word entered here by the
programmer will give the program a name. The rules of COBOL specify that
the name the programmer enters must be made up of letters and numbers and be
8 characters or less. Finally the word AUTHOR calls for the programmer to
identify themselves as the author of the program. In looking at this
program, note the use of the periods after the reserved words and after the
programmer's entries. It should be noted that these three lines start in
margin A which for practical purposes means they start in column 8
(technically, they can start anywhere in margin A which runs from 8 through
11).
ENVIRONMENT DIVISION:This DIVISION defines and identifies the
environment in which the program will be run. At the beginning level, one
of its primary functions is to provide the link between the physical file
that is being read and/or written and the logical file name that is used
internally within the program.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT CUSTOMER-FILE
ASSIGN TO "C:\MFCOBOL\SOURCEPG\C12FIRST.DAT".
SELECT CUSTOMER-REPORT
ASSIGN TO PRINTER.
This is the ENVIRONMENT DIVISION which contains an INPUT-OUTPUT SECTION which
defines the files that are being read or written. Beneath the section
statement you see the reserved clause FILE-CONTROL which indicates that the
next lines will specifically define the files being used in the program.
Each file is defined with a SELECT statement followed by the programmer's
logical file name. The logical file name is the name the
programmer will use in the program when referring to the file. It must follow
the standard naming conventions: the name must be from 1 to 30 characters
long, contain only letters, numbers and hyphens and contain no embedded
spaces. The ASSIGN clause ties the logical file name to the
physical file name and location of the file. In the case of the disk
file, the physical file name includes the path name on my disk, the
name of the file C12FIRST and the file extension
.DAT. In the case of the print file, the output is to go directly to the
printer so it is assigned to PRINTER. Notice that the SELECT clauses start
in margin B and the second line of each is indented by custom to make it
clear that it is part of the statement. The other clauses in the ENVIRONMENT
DIVISION illustrated all start in margin A.
DATA DIVISION: This division defines the data that is being used by
the program, including data that is being read or written, and data that is
being used in work areas.
The structure of the COBOL DATA DIVISION gives the programmer tremendous
flexibility and power in the way data is defined and used. It is one of the
most powerful features of the COBOL program and a lot of time should be spent
understanding the wide range of possibilities it offers.
In our simple beginning program, we will see two SECTIONS within the DATA
DIVISION: the FILE SECTION and the WORKING-STORAGE SECTION.
The FILE SECTION is used to define the files that will be read and
written. All data that is being read from files to be processed or written
to files (including the print file) must pass through the FILE SECTION.
DATA DIVISION.
FILE SECTION.
FD CUSTOMER-FILE
DATA RECORD IS CUSTOMER-RECORD.
01 CUSTOMER-RECORD.
05 CUSTOMER-ID PIC X(4).
05 CUSTOMER-NAME PIC X(20).
05 CUSTOMER-STREET PIC X(20).
05 CUSTOMER-CITY PIC X(15).
05 CUSTOMER-STATE PIC X(2).
05 CUSTOMER-ZIP PIC X(5).
05 FILLER PIC X(10).
The code above describes the layout of the input file that is being read by
this program. First, we see the clause DATA DIVISION to tell us what
division we are in followed by the clause FILE SECTION to tell us we are in
the section of the data division that describes input and output files.
(Note that both of these start in margin A.) The following line starts the
specific definition - it too starts in margin A. The letters FD stand for
File Description and are followed by two spaces to move us into margin B and
then the name of the file. This is the logical file name that was
used in the SELECT statement above. The next line (which starts in margin B)
gives the name of the individual record in the file. The clause DATA RECORD
IS is a reserved word clause for this function. The clause is followed by
the name the programmer is assigning to each record on the file. (Note: this
clause is not required but is here because it provides good
documentation). The record name must again follow the naming conventions
for a COBOL data name:
- the length of the name must be from 1 to 30 characters
- the name must be composed of letters, numbers and hyphens
- the name must contain no embedded blanks
On the next line, starting in margin A, there is 01 followed by the record
name defined in the DATA RECORD IS clause. When COBOL programmers lay out
the file, they use an outline setup where 01 is the whole, in this case the
whole record, and the numbers beneath break the whole record down into its
parts which are frequently fields. The common convention, when breaking down
the whole 01 level, is to use 05 levels for the parts or fields. The reason
for this is to leave room so that at some later time the programmer can
decide to group several 05s together into a group and use a number between
01 and 05 to designate this group (more about this later). Note that on this
file, all records have the same layout (in more advanced programs, this will
not always be true).
The first field on all the records in this file is the customer
identification number which is a character or alphanumeric field that is
four characters long. A character or alphanumeric field can contain
anything: letters, numbers, special characters. The programmer starts the
line by placing an 05 to designate the field in margin B. Then the
programmer defines a name for the field using the COBOL data name
conventions. In this case, I named the field CUSTOMER-ID (Note the use of
the hyphen where I might have wanted to put a space. Since spaces are not
allowed in a COBOL dataname, the programmer frequently uses the hyphen
instead of the space). Next, the programmer needs to define the attributes
of the field which in this case are that it is a 4 character, alphanumeric
field. This is done using the picture clause. PIC or PICTURE are reserved
words used to designate the attributes of the field. The clause X(4) that
follows the reserved word PIC designates this field as an alphanumeric field
(the X stands for alphanumeric) and the 4 in parenthesis tells the length.
Note that the picture clause fields line up under each other - this is
convention and not a rule. Usually the programmer moves over to a column
such as 32 and starts to code the picture clauses. After the picture clause
has been coded, the entry is terminated with a period to indicate that the
definition of this field is now complete.
Legal COBOL data types:
- X - for alphanumeric data (includes, letters, numbers, and special
characters as well as spaces in the field)
- 9 - for numeric data (includes digits)
- A - for alphabetic data (includes letters of the alphabet and spaces) -
note this type is rarely used
Length can be shown using either the type followed by the length in
parenthesis or the type character repeated once for each character in the
field:
- X(4) can also be written as XXXX
- X(1) can also be written as X
- X(20) can also be written as XXXXXXXXXXXXXXXXXXXX
- 9(3) can also be written as 999
- A(5) can also be written as AAAAA
Continuing on with the field layout. The second field on all the records is
customer name which is a 20 character alphanumeric field. It is shown by:
05 CUSTOMER-NAME PIC X(20).
Following the customer name, there is a 20 character customer street field,
a 15 character customer city field, a 2 character customer state field and a
five character customer zip code field. Finally in the layout of the file,
the programmer was told that the last 10 characters of the field contained
data that will not be used in this program. The programmer acknowledges the
existence of this data by setting up a field with no name and a PIC of X(10).
The field can either be set up by simply omitting a name or the reserved word
FILLER can be used to indicate a field that will not be used in this program.
The field needed to be listed in the layout because the entire length of
the record must be described when doing the layout. The record is 76
characters long and each of these characters must be accounted for in the
record layout. You should note that COBOL record layouts are positional and
accumulative. The first field was the customer identification number with a
PIC X(4) which meant the data was stored in positions 1, 2, 3, and 4. The
second field was the customer name with a PIC X(20) which means that the
customer name started in position 5 and went for the next 20 characters
ending in position 24. The third field was the customer street which
started in position 25 and went for 20 characters ending in position 44.
Let's say for example that the customer street started in position 30 and
that characters 25, 26, 27, 28 and 29 contained data that was not being used
in this program. In that case the layout of the file would have looked like
this:
05 CUSTOMER-ID PIC X(4).
05 CUSTOMER-NAME PIC X(20).
05 FILLER PIC X(5).
05 CUSTOMER-STREET PIC X(20).
Continuing the FILE SECTION with the second file layout:
FD CUSTOMER-REPORT
DATA RECORD IS PRINTZ.
01 PRINTZ.
05 FILLER PIC X.
05 CUSTOMER-ID-PR PIC X(4).
05 FILLER PIC X(2).
05 CUSTOMER-NAME-PR PIC X(20).
05 FILLER PIC X(2).
05 CUSTOMER-STREET-PR PIC X(20).
05 FILLER PIC X(2).
05 CUSTOMER-CITY-PR PIC X(15).
05 FILLER PIC X(2).
05 CUSTOMER-STATE-PR PIC X(2).
05 FILLER PIC X(2).
05 CUSTOMER-ZIP-PR PIC X(5).
05 FILLER PIC X(3).
The second FD starts the File Description of the second file described in the
SELECT statements in the ENVIRONMENT DIVISION. Notice that the name used
after the FD is the logical file name used in the SELECT statement. Again
the File Description is followed by the DATA RECORD IS clause which gives the
name PRINTZ to each line being printed on the printer. The line is then
defined starting with 01 PRINTZ. Each field on the line has an 05 in front
of it. Notice that every other field has the name FILLER, this is because
between each field on the printline it looks nice to leave a couple of blank
characters. The fillers serve this purpose (Note: that the word FILLER is
not required on these lines).
It is a rule in COBOL that all datanames must be unique. This is done
so there will be no confusion. When the programmer refers to a name, the
program will understand exactly what field the programmer is referring to.
To establish uniqueness, the names on the printline must be different from
the names on the input record. Since this program is simply going to take
the data that is read and print it out, the programmer modified the input
names by adding the -PR to the dataname when it was used on the printline.
This is a convention that I frequently use. If a field is going to be moved
to the printer and printed, I use the original name and append the -PR,
making the dataname unique.
When setting up the printline, I knew that the printer I was working with
supported 80 characters across, so my printline is 80 characters. The
purpose of the program was to see the name and address information on the
input record so I set up a field for each piece of data on the output print
line. Between each piece of data I used the FILLER to leave some blank space.
In addition, I used a FILLER at the top of the record and a FILLER at the
bottom of the record. I do this because different versions of COBOL carry
the carriage control character as the first character on the line or the
last character on the line. By leaving them both blank, I assure myself
that the carriage control character will be correctly handled.
The WORKING-STORAGE SECTION contains only one piece of information in this
program, a field to indicate that the end of the file has been reached.
WORKING-STORAGE SECTION.
01 INDICATORS.
05 END-OF-FILE PIC XXX VALUE "NO ".
In the WORKING-STORAGE SECTION, there may be more than one indicator so the
01 designation for INDICATORS is where they will all be listed. Since this
program only has one indicator, there is only one entry and that entry has
been given the dataname: END-OF-FILE. It has been set up as a three
character field in memory and the VALUE clause has been used to give it an
initial value of NO followed by a space. The VALUE clause initializes the
three characters called END-OF-FILE to whatever is programmed in the VALUE
clause. Note that the word NO is enclosed in quotes. When giving a
character or alphanumeric field an initial VALUE, it has to be enclosed in
quotes.
PROCEDURE DIVISION: The PROCEDURE DIVISION is where the actual
processing is done. The PROCEDURE DIVISION is broken up into PARAGRAPHS
each of which contains instructions that will be executed when the program
is run.
The PROCEDURE DIVISION for a structured program is setup with a main
paragraph that controls all processing. This main paragraphs performs other
paragraphs where the work is actually done. There are a wide variety of
styles that are used to number the paragraphs in a meaningful way. We will
start out using one style and explore others as the course continues.
There are three major processing components in most programs:
- the initialization or setup
- the actual processing that has to be done - frequently this is a
repetitive loop that processes each record in a file
- the termination or wrapup
Each of these has a specific function within the program. The processing of
these procedures is controlled in the main paragraph of the program. Please
note that in the PROCEDURE DIVISION, paragraph names start in margin A and
commands or instructions start in margin B.
PROCEDURE DIVISION.
MAIN-PROGRAM.
PERFORM A-100-INITIALIZATION.
PERFORM B-100-PROCESS-FILE.
PERFORM C-100-WRAPUP.
STOP RUN.
Using this style, the main paragraph which comes directly after the words
PROCEDURE DIVISION contains just a paragraph name. This is the control
paragraph that directs the processing. This paragraph controls the execution
of three other paragraphs: A-100-INITIALIZATION, B-100-PROCESS-FILE, and
C-100-WRAPUP. The style that we are using designates the paragraphs by
using a letter to designate the type of processing being done in the
paragraph (any paragraph used for initialization will start with an A, any
paragraph used for processing will start with a B and any paragraph used for
termination will start with a C). This letter is followed by a hyphen and
then a number designating the paragraphs place in the hierarchy. The first
paragraph in each type of processing will be given the number 100. Finally,
there is another hyphen and then a name the programmer makes up that explains
the intent of the paragraph.
Each of the paragraphs in the programming segment above must be executed and
to do this the programmer uses the PERFORM verb. The PERFORM verb has
many variations, but for now we are looking at the most straightforward
version of the command which says to go out and do a particular paragraph
and then return and move on to the next instruction. This format for the
PERFORM verb is:
PERFORM paragraph-name.
When COBOL encounters this command, the processing moves to the paragraph
that is being performed and the instructions in that paragraph are executed.
COBOL knows that the paragraph is over when it encounters another paragraph
name or the end of the program. At this point, COBOL returns from the
perform and executes the next sequential instruction.
In the programming segment illustrated, the paragraph named
A-100-INITIALIZATION will be executed first. When the execution of that
paragraph is complete, control will return to the MAIN-PROGRAM paragraph
and drop through to the next instruction which says to perform the paragraph
named B-100-PROCESS-FILE. That paragraph will now be executed. When the
execution is done (note that processing frequently contains a lot to be done
and takes a significant amount of time), control will return to the
MAIN-PARAGRAPH and drop down and execute the paragraph C-100-WRAP-UP. After
this paragraph has been executed, control again returns to the MAIN-PROGRAM
paragraph where the instruction STOP RUN is encountered. The STOP RUN
statement terminates processing. The format as we are using it is simply:
STOP RUN.
The initialization paragraph is where the files are opened so that processing can begin.
A-100-INITIALIZATION
OPEN INPUT CUSTOMER-FILE
OUTPUT CUSTOMER-REPORT.
In our simple example, the only command in this paragraph is the OPEN command.
The file that will be read is opened as an INPUT file and the file that will
be written is opened as an OUTPUT file. The file names that are used are the
names that were originally defined in the SELECT statement in the ENVIRONMENT
DIVISION and the FILE SECTION of the DATA DIVISION. The simple format for
the OPEN verb, as used in this program, is:
OPEN (INPUT/OUTPUT) file-name.
Note that all files being used in the program can be opened with one OPEN
statement.
The processing portion of the program is really the heart of the processing.
Usually it is a loop that is done repeatedly until all of the
data has been processed.
B-100-PROCESS-FILE.
READ CUSTOMER-FILE
AT END
MOVE "YES" TO END-OF-FILE.
PERFORM B-200-PROCESS-RECORD
UNTIL END-OF-FILE = "YES".
The paragraph above is the first paragraph in the processing portion of the
program. In this program, the first function of this paragraph is to read
the first record in the file (called the INITIALIZING READ). The READ
statement moves a record from the disk file into memory where it can be
processed. The format of the READ statement is:
READ file-name
AT END
processing to be done if there is no more data
The READ statement reads the file-name which was defined in the SELECT
statement (the logical file name) and the FD portion of the FILE SECTION
and then opened as an input file with the OPEN statement. The AT END clause
tells what processing is to be done when there is no more data in the file.
In the program being illustrated, when there is no more data the word YES is
moved to the memory location defined in the WORKING-STORAGE SECTION as the
END-OF-FILE indicator. Remember, this field was given an initial value of
NO and now the program is changing the value to YES. The use of YES and NO
are convenient, but the important thing is that the value of the indicator
has been changed.
Following the READ statement is a PERFORM statement which sets up a loop that
will be done over and over until there is no more data (shown by the value
of END-OF-FILE being YES). The format of this version of the PERFORM
statement is:
PERFORM paragraph-name
UNTIL a condition has been met
The PERFORM statement in the sample program will perform a paragraph named
B-200-PROCESS-RECORD over and over again until the condition END-OF-FILE
indicator equals "YES" is met. The processing works this way:
- before the paragraph is executed the condition is checked
- if the condition is not true, than the paragraph is executed
- control then returns to the statement and the condition is checked again
- if the condition is not true, than the paragraph is executed
- if at any time the condition is true, the paragraph will not be executed
and control will drop through to the next statement - in this case there is
no next statement which means the paragraph is complete and control will
return to the MAIN-PROGRAM paragraph where it will drop through and execute
C-100-WRAP-UP.
The paragraph that is being executed by the PERFORM...UNTIL, is the
B-200-PROCESS-RECORD paragraph shown below. The one critical thing that
must be included in this paragraph is the code to change the answer to the
condition UNTIL END-OF-FILE = "YES". If there is no way to change this
condition so that the UNTIL condition will finally be met, the paragraph
will be executed indefinitely and the program will be in a never ending loop.
B-200-PROCESS-RECORD.
MOVE SPACES TO PRINTZ.
MOVE CUSTOMER-ID TO CUSTOMER-ID-PR.
MOVE CUSTOMER-NAME TO CUSTOMER-NAME-PR.
MOVE CUSTOMER-STREET TO CUSTOMER-STREET-PR.
MOVE CUSTOMER-CITY TO CUSTOMER-CITY-PR.
MOVE CUSTOMER-STATE TO CUSTOMER-STATE-PR.
MOVE CUSTOMER-ZIP TO CUSTOMER-ZIP-PR.
WRITE PRINTZ
AFTER ADVANCING 1 LINES.
READ CUSTOMER-FILE
AT END
MOVE "YES" TO END-OF-FILE.
The code in this paragraph is set up to process one record and read another.
The first code that we see, is the processing code. This program, simply
moves the information from the record that was read to the print line and
then writes the line. Looking at the specifics of the code, the first
statement says:
MOVE SPACES TO PRINTZ.
The word SPACES is a reserved word that means fill the area with blanks.
By moving SPACES TO PRINTZ, the programmer has cleaned out the whole print
record that was defined in the FILE SECTION as 01 PRINTZ. After the area
has been cleaned out, the programmer moves the fields on the input record
to fields on the print line using the MOVE statement. The format of the
MOVE statement is:
MOVE field name TO field name.
In this program we are moving fields that occur on the input record to fields
that occur on the output record. After all of the fields have been filled,
it is time to WRITE the line. The format of the WRITE statement to write a
line on the printer is:
WRITE record-name
AFTER ADVANCING (the number of lines to move before writing) LINES.
The line that is to be written is the line that is defined in the 01 level
of the FD for the print file. All data must pass through the FILE SECTION
as it is read or written, so when a line is being written it must be the line
that was defined as the record to be written in the FILE SECTION. Please
note that in COBOL you READ files (defined in the FD and you WRITE records
(defined in the 01 level of the FD).
Specifically, the sample program wants to move to a new line before writing
and it wants the report to be single spaced. Therefore the specific WRITE
statement will say:
WRITE PRINTZ
AFTER ADVANCING 1 LINES.
Again, it should be noted that the MOVE statements put the data on the line
so that when it is written, the data will be printed.
After the record is written, it is time to read the next record. This is
down with a READ statement that is identical to the READ statement that
read the initial record in the B-100-PROCESS-FILE paragraph.
READ CUSTOMER-FILE
AT END
MOVE "YES" TO END-OF-FILE.
Please pay special attention to the MOVE statement that happens when there
is no more data. Remember that in the WORKING-STORAGE SECTION we set up the
indicator END-OF-FILE with an initial VALUE of "NO ". Then in the PERFORM
statement that controls the repetitive looping in the B-100-PROCESS-FILE,
we said to PERFORM B-200-PROCESS-RECORD where the actual processing of the
record is done until END-OF-FILE = "YES". Now we are providing the means to
change that initial value to the "YES" that will terminate processing. If
the READ statement encounters the end of the file, a MOVE statement will be
executed that moves "YES" to the indicator, END-OF-FILE. When this happens
the repetitive processing of B-200-PROCESS-RECORD will terminate.
The logic of the initializing READ statement and other reads being done at
the bottom of the loop is very effective. First, before
PERFORMing the loop, there is an INITIALIZING READ that will read the first
record on the file. If there is no first record, the END-OF-FILE indicator
will be immediately set to "YES" and the PERFORM of the loop will never be
executed. Assuming there was a record, after the INITIALIZING READ the
repetitive processing of the B-200-PROCESS-RECORD paragraph will begin. In
this paragraph the data from the record that is currently in memory will be
moved to fields on the line to be printed and then the line will be written.
After the line has been written, the READ will be executed (Note that all
records except the first record on the data file are read with this READ).
If there is another record, the END-OF-FILE indicator will remain with a
value of "NO". When the READ has been executed the B-200-PROCESS-RECORD
paragraph is done and control will return to the PERFORM where the UNTIL
clause will check the END-OF-FILE indicator to determine if the
B-200-PROCESS-RECORD paragraph should be performed again. Since the value
of the END-OF-FILE indicator is still "NO " the processing will continue.
The data from the record that was just read will be moved to the print line,
the line will be written and another record will be read. By positioning
the READ as the last statement in the paragraph, everything moves smoothly.
After the READ is executed, the control returns to the PERFORM which checks
the results of the READ as it affects the END-OF-FILE indicator to determine
if the paragraph should be executed again. The end result
of this logic is clean processing that works!
After the END-OF-FILE indicator gets changed to "YES" indicating that in fact
there is no more data the B-200-PROCESS-RECORD paragraph will not be executed
again and in fact the PERFORM is done. Since the PERFORM was the last
statement in the B-100-PROCESS-FILE paragraph, that paragraph to is complete
and control returns to the MAIN-PROGRAM paragraph. At this point control
drops through to the next PERFORM instruction which says
PERFORM C-100-WRAP-UP.
C-100-WRAP-UP.
CLOSE CUSTOMER-FILE
CUSTOMER-REPORT.
In this paragraph, the two files that were opened in the
A-100-INITIALIZATION are closed. It should be noted that while you have to
specify INPUT or OUTPUT when you are OPENing files, you do not have to
specify this when you CLOSE the files. The file names that are CLOSEd
here are the ones that appeared in the SELECT, were defined in the FD, were
OPENed , and in the case of the input file was READ.
When the C-100-WRAP-UP paragraph has been executed, control again returns to
the MAIN-PROGRAM paragraph and drops through to the next instruction:
STOP RUN
This statement terminates the execution of the program.