Indexed Files

Section 1

Introduction:

Much of the power in computers comes from the ability to randomly access a record from a file containing thousands of records. We see this power when the computer registers a price for something we are buying, when we call for information about airplane flights and when we call the doctor for an appointment. With sequential access, the processing would involve starting with record one and proceeding sequentially through the file until the record that contains the needed information is found. With random access, the processing takes advantage of an index structure and retrieves the record containing the needed information.

Indexed files are created with an index or indexes specified by the programmer. The most common indexed method is VSAM (Virtual Storage Access Method). In VSAM the indexed file has a prime index that is usually the identification number and optional alternate indexes. For example, in a personal file, the social security number would frequently be the prime index and the employee name the alternate index. Prime indexes must be unique, while alternate indexes can have duplicates. Frequently when you call for information about yourself or someone else you are asked for the social security number. If you dont have the social security number you will then be asked for the name of the person. Translating this - the first attempt to locate the record on the computer would be the prime key or social security number, if that is not available the name serves as an alternate key.

When specifying a file in COBOL, there are two pieces of important information that need to be provided in the SELECT statement: the organization of the file and the access method that will be used.

Chart of allowed organization and access methods:

 

Organization

Access:

Sequential

Indexed

Sequential

Allowed

Allowed

Random

Not Allowed

Allowed

Dynamic

Not Allowed

Allowed

 

SELECT statement clauses:

The SELECT statement is used to:


SELECT logical file name
    ASSIGN TO "physical file name"
    ORGANIZATION IS INDEXED
    ACCESS IS {SEQUENTIAL}
              {RANDOM    }
              {DYNAMIC   }
    RECORD KEY IS field name>
    ALTERNATE KEY IS field name WITH DUPLICATES.

Example #1:

SELECT INVEN-FILE
    ASSIGN TO "C:\DATA\INVEN.DAT"
    ORGANIZATION IS INDEXED
    ACCESS IS SEQUENTIAL
    RECORD KEY IS ITEM-NUM.

In this example the file is INVEN-FILE and is stored on the hard disk in the DATA directory as INVEN.DAT. I set this up as an example of a file that is being created in sequential order with an index ITEM-NUM.

Example #2:

SELECT INVEN-R\FILE
   ASSIGN TO "C:\DATA\INVEN.DAT"
   ORGANIZATION IS INDEXED
   ACCESS IS RANDOM
   RECORD KEY IS ITEM-NUM.

When the file created in Example #1 is read the index has already been established so the programmer can choose to access the file sequentially, randomly or dynamically. If the programmer wanted the program to access the file randomly by ITEM-NUM, the SELECT statement coded above would be used.

 

Example #3:

SELECT PAYROLL-FILE
    ASSIGN TO "A:PAYROLL.DAT"
    ORGANIZATION IS INDEXED
    ACCESS IS RANDOM
    RECORD KEY IS SOC-SEC-NUM
    ALTERNATE KEY IS EMP-NAME WITH DUPLICATES.

In this example, the payroll file has the logical name PAYROLL-FILE that is the name within the program. The physical name on the disk is PAYROLL-DAT. The file is an indexed file that is going to be randomly accessed in this program. The primary key used to randomly access the file is SOC-SEC-NUM. If the social security number is unknown, the alternate key to the file is EMP-NAME. Since more than one employee might have the same name, duplicates are allowed in the alternate key.

Invalid key clause with I/O statements:

When the programmer reads or writes an indexed file, there should be a way to catch index problems. One way of doing this is with the INVALID KEY clause. A second way is with FILE STATUS (this will be covered later). The invalid key clause can be used with all I/O statements connected with an indexed file. When the file is being created, the invalid key clause is used with the write statement to catch duplicate or out of sequence records. When the file is being read randomly, the invalid key clause is used with the read statement to give the program a way of knowing when the record is not there.

Working with indexed files:

When the programmer is working with indexed files, there are several tasks that frequently have to be accomplished. Standard examples include:

Creating an Indexed File:

When creating an indexed file, the input is sorted transactions/records (sorted by the field you plan to use on the new indexed file). Frequently the transactions are records on a disk or transactions keyed in through a screen. The program is a simple read/write program that will read the transactions and create an indexed output file. The SELECT statement for the indexed output file will define the organization as indexed and the access as sequential since the output file is reading a record from the input file and writing it on the output file one record at a time. The SELECT statement must also define the field that the file is being indexed by in the record key clause. The record key must be defined under the 01 level of the FD. It the file has alternate keys, they must also be defined when the file is created. This is done by using the ALTERNATE KEY clause in the SELECT statement and defining the field or fields under the 01 level of the FD. When writing the indexed output file, the INVALID KEY clause will be used to catch records that are duplicates or records that are out of sequence.

SELECT INDEXED-FILE
    ASSIGN TO C:\COBOL\INDEXED.DAT"
    ORGANIZATION IS INDEXED
    ACCESS IS SEQUENTIAL
    RECORD KEY IS ID-N0-KEY.

DATA DIVISION.
FILE SECTION.
FD  INDEXED-FILE...
01  INDEXED-RECORD.
    05  ID-NO-KEY        PIC 9(5).

Again note that the record key and alternate keys must be defined in the file section.

The WRITE statement to write the record on the indexed file is:

WRITE INDEXED-RECORD
    INVALID KEY
       PERFORM B-400-INVALID-REC.

If the record has a duplicate ID-NO-KEY or the record is out of sequence the record will not be written, the INVALID KEY clause will be triggered, and B-400-INVALID-REC will be performed. This routine can write an error message or handle the error in any way the programmer finds appropriate. Sometimes the programmer will choose to terminate processing if an error is encountered, more frequently an error message is printed saying that a certain record is not being written to the file.

Sequentially reading an indexed file:

Even when you are sequentially processing an indexed file, the programmer use the ORGANIZATION IS INDEXED clause in the SELECT statement and for documentation includes the ACCESS IS SEQUENTIAL (even though sequential access is the default). The programmer also includes the RECORD KEY IS clause.

SELECT MASTER-FILE
    ASSIGN TO "C:\PCOBWIN\VSAM\VSAM1.DAT"
    ORGANIZATION IS INDEXED
    ACCESS IS SEQUENTIAL
    RECORD KEY IS MID.

Because the read is being done sequentially the standard READ AT END statement is used. Looking at the program, the only sign that this is an indexed file rather than a sequential file is in the SELECT statement.

If you want to sequentially read the file starting at a point other than the beginning, the START verb can be used to locate a particular starting point and then the program can sequentially process from that point forward. Note that the START verb locates a point, it does not actually read a record. The READ statement has to be issued to read the record.

START VERB:

   

EQUAL TO

   
   

=

   
   

GREATER THAN

   

START filename

KEY IS

>

Key-field

 
   

NOT LESS THAN

   
   

NOT <

   
   

GREATER THAN OR EQUAL TO

   
   

>=

   
 

[INVALID KEY statement-set-1]

   
 

[NOT INVALID KEY statement-set-2]

   

[END-START]

     

To sequentially read from a given point, the starting point would be moved to the prime key and the START verb issued prior to the READ. If the START verb is testing for equality, code must be included to handle the situation where the start point is not a valid record on the file. To avoid this, frequently the GREATER THAN OR EQUAL TO option is used. This will find the first record that matches or is greater than the starting point that was entered. The code below tests for equality since that is the more complex code.

B-100-PROCESS.
    DISPLAY "ENTER START POINT - ENTER 999 TO END".
    ACCEPT START-PT.
    IF START-PT = 999
         MOVE "YES" TO EOF-IND
    ELSE
         PERFORM U-000-START-FILE
             UNTIL FOUND-IND = "YES" OR EOF-IND = "YES".
    END-IF.
    IF FOUND-IND = "YES"
         READ MASTER-FILE
             AT END
                 MOVE "YES" TO EOF-IND
         END-READ
         PERFORM B-200-LOOP
             UNTIL EOF-IND = "YES"
     END-IF.
B-200-LOOP.
    Processing
U-000-START-FILE.
    MOVE START-PT TO MID.
    START MASTER-FILE
        KEY EQUAL TO MID
        INVALID KEY
           DISPLAY "RECORD NOT FOUND FOR " START-PT
           DISPLAY "ENTER START POINT"
           DISPLAY "ENTER 999 TO END"
           ACCEPT START-PT
           IF START-PT = 999
                MOVE "YES" TO EOF-IND
           END-IF 
        NOT INVALID KEY
           MOVE "YES" TO FOUND-IND
    END-START.

Sequentially reading an indexed file using an alternate key:

To sequentially read an indexed file using an alternate key, the file has to have been created with both a prime and an alternate key. The SELECT statement has to contain the ALTERNATE KEY clause and if the file when it was created allowed for duplicates than the SELECT used when reading the file also has to allow for duplicates. Remember, both the record key and the alternate record key must be defined under the 01 level of the FD.

SELECT MASTER-FILE
    ASSIGN TO "C:\PCOBWIN\VSAM\VSAMALT.DAT"
    ORGANIZATION IS INDEXED
    ACCESS IS SEQUENTIAL
    RECORD KEY IS MID
    ALTERNATE RECORD KEY IS MITEM-NAME WITH DUPLICATES.

When you are reading an indexed file, the default is that you will be reading using the PRIME KEY. When you want to use the alternate key, you must establish that using the START verb. The START verb is used to position a logical pointer at a particular record in the file using the alternate key path. This is done by moving the start point to the alternate key (in this example MITEM-NAME) and using the alternate key name in the KEY clause of the START verb. If you want to start at the beginning of the file, you move all zeros to a numeric key or LOW-VALUES to a non-numeric key.

    MOVE LOW-VALUES TO MITEM-NAME.
    START MASTER-FILE
        KEY GREATER THAN MITEM-NAME
        INVALID KEY
            DISPLAY "RECORD NOT FOUND"
            MOVE "YES" TO EOF-IND
    END-START.
    IF EOF-IND = "NO "
        READ MASTER-FILE
            AT END
                MOVE "YES" TO EOF-IND
        END-READ
        PERFORM B-200-LOOP
            UNTIL EOF-IND = "YES".

Randomly reading an indexed file using the primary key:

One of the main reasons for indexing a file is to be able to quickly access information using a random read. The information you want to retrieve could come from a transaction file, but today it will most likely be input at a screen. The programmer will put up a screen asking for the prime key (usually some kind of identification number) of the file that needs to be retrieved. The information that is keyed in will be used to establish the key. That is the number that is keyed in will be moved to the field which has been defined as the prime key or if a screen section is in use it can be transferred to that area directly using the TO.

Once the key has been established (that is the prime key contains data), a random read will be executed. If the read is successful the information will be processed. If the read is unsuccessful either an error message will be printed or the user will be informed that the attempted retrieval was unsuccessful.

Example #1:

    MOVE RETR-ID TO MID.
    READ MASTER-FILE
       INVALID KEY
          PERFORM B-310-INVALID
       NOT INVALID KEY
          PERFORM B-300-PROCESS
    END-READ.

In this example, RETR-ID is the name of the field that the user keyed in or the name of the field on a transaction that was read. RETR-ID is moved to MID which has been defined as the RECORD KEY. The READ is executed. If the INVALID KEY is triggered it means the READ was unsuccessful so B-310-INVALID will be processed to deal with the error. If the READ was successful then the NOT INVALID KEY clause will cause B-300-PROCESS to be performed. END-READ terminates the read.

Note that if the data that was keyed in through a screen section as the clause TO MID, then the MOVE statement would not be needed.

Example #2:

    MOVE RETR-ID TO MID.
    READ MASTER-FILE
        INVALID KEY
            MOVE "NO " TO MSTR-FOUND.
            IF MSTR-FOUND = "YES"
                PERFORM B-300-PROCESS
            ELSE
                PERFORM B-310-INVALID.

This example accomplishes the same thing by using an indicator instead of directing the processing from within the READ.

Randomly reading an indexed file using the alternate key:

If you want to read the file using the alternate key the key you are looking for has to be moved to the field that was defined as the alternate key and the READ statement needs a clause that defines the KEY as the alternate key. The default when a random read is done is the primary or RECORD KEY. If the alternate key is to be used, the read has to know. Again, the key comes in either from a screen or from a record on a transaction file. If the screen TO puts the input directly into MITEM-NAME, then the MOVE is not required.

    MOVE RETR-NAME TO MITEM-NAME.
    READ MASTER-FILE
        KEY IS MITEM-NAME
        INVALID KEY
           PERFORM B-410-INVALID
        NOT INVALID KEY
           PERFORM B-400-PROCESS.

Randomly reading an indexed file using either the prime key or the alternate key:

Frequently the user wants to make a choice as to whether they will use the prime or the alternate key. For example, if a customer or patient cannot give their identification number then you would want to try the retrieval by name. In this case identification number would be the prime or RECORD KEY and name would be the ALTERNATE KEY. The user would be given a menu choice asking whether they wanted to retrieve by identification number or by name. If they selected identification number processing would go to the paragraph to do a random read using the RECORD KEY. If they selected name processing would go to a paragraph to do a random read using the ALTERNATE KEY.