COBOL Data Definition Entry Processing

Use a COBOL Record Layout to Process an EBCDIC File

Steven F. Lott

Contents

Introduction

When dealing with "Flat Files" from legacy COBOL problems, there are several problems that need to be solved.

The files have a fixed field layout, without delimiters. This means that the offset of each field must be used to decompose the record into it's individual elements.
Developing the offsets to each field is tedious and error-prone work. Accounting for signs, alignment, and redefines makes this difficult. It's necessary to parse the "Copy Book", which has the original COBOL source definition for the file.
Numeric fields can have an implied decimal point, making if difficult to determine the value of a string of digits. The "Copy Book" is essential for parsing the file contents.
COBOL can make use of numeric data represented in a variety of "Computational" forms. The "Computational-3" ("COMP-3") form is particularly complex because decimal digits are packed two per byte and the trailing half-byte encodes sign information.
The string data may be encoded in EBCDIC.
COBOL encourages the use of data aliases (or "unions") via the REDEFINES clause. Without the entire suite of COBOL programs, the handling of REDEFINES can become an insoluable problem.

Generally, COBOL files are defined by a "Data Definition Entry" (DDE) that provides the record layout.

This library helps parse DDE's to determine the offset, size, and encoding of each field. This information can be used by Python programs to process files that originate from COBOL systems.

Applications

There are two common applications for a Python-based analysis of COBOL data.

Extract, Transform and Load (ETL).
Data Profiling.

ETL. Extract Transform and Load (ETL) processing is a common pipeline between legacy COBOL ("Flat File") applications and relational database (or Object Database) applications. A Python application can be used to Extract data from COBOL flat files and either load a database or create a file in a more usable notation (e.g., XML, JSON or CSV).

The cobol_dde module helps to create an ETL application.

It parses the COBOL "Copy Book" source to permit interpretation of the file.
Given a block of bytes, it can extract fields as byte strings or as interpreted values. It will produce int or Decimal values. (Currently, floating-point "Comp-2" fields are not supported.)
Python application logic can be used to disentangle the various record types encoded via REDEFINES clauses.
With the copy book and the DDE module, a Python program can then rewrite a COBOL file.

Profiling. Data Warehouse processing requires complete understanding of source application data domains. This complete understanding of a domain is part of analyzing data quality. Data domains, when not formalized by a programming language or database design, tend to grow in sometimes obscure ways, including bad data, special-purpose data and undocumented data.

Bad data are simply invalid values used in application files. These can be tolerated either because of programming bugs or dependencies that permit invalid data under certain circumstances. Typically, the latter case indicates a normalization issue. The domain values are illegal.

Special-purpose data is often a "patch" or "hack" to work around a problem. It may be documented, but rarely used, and unexpected by ETL programmers. Sometimes the special-purpose data represents an operational hack to work around a problem without changing the programming. Irrespective of the origin of the hack, the domain values are legal, but unexpected and uncommon.

Undocumented data is ordinary operational values that are actually widely used by unexpected because they are undocumented. Often this is because of data that is not part of the essential use cases for an application, but is additional data with an obscure final destination. The domain values are legal, unexpected but common.

Flat files processed by COBOL programs are very common and often suffer from the above data quality problems. The language imposes few rules on data domains. Programs can be easily modified to extend a domain. Processing rules can be quite obscure, making it necessary to analyze actual data rather than program source.

Requirements

This application produces simple reports on the range of values found for particular fields in a file. This information is used to understand data quality, the actual values in a domain, and support reverse engineering software.

The primary use case is described below.

Analyze a File

The user is given a file and the associated record definition (also known as a "Copy Book").

The user creates an application program based on data_profile to name the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form.

The user runs the profiler with the file, driver and copybook.

The application produces a summary report showing each of the named fields, their complete domain of values, and the occurance count for each value.

Extract Data

The user is given a file and the associated record definition (also known as a "Copy Book").

The user creates an application program based on cobol_dde to extract the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form. The application loads a database or writes an output file in a more usable format (e.g., JSON).

The user installs the application for operational use.

Design Overview

The cobol_dde module has two distinct phases of operation. The first phase parses a COBOL Data Description Entry (DDE) to understand the record layout of the file. The second phase uses the parsed DDE to extract fields from a record.

The data_profile module is a framework for building data profiling applications based on the cobol_dde module. The data profiler uses COBOL Data Description Entry (DDE) to understand the record layout of the file. The profiling then uses uses the DDE object to examine selected fields of the file.

DDE Structure

The DDE class is a recursive definition of a COBOL group-level DDE. There are two basic species of COBOL DDE's: elemetary items, which have a picture clause, and group-level items, which contain lower-level items. There are everal optional features of every DDE, including an occurs clause and a redefines clause. In additional a picture clause, elementary items can also have and optional usage clause, and optional sign clause.

The picture clause specifies how to interpret a sequence of bytes. The picture clause interacts with the optional usage clause, sign clause and synchronized clause to fully define the bytes. The picture clause uses a complex format of code characters to define either individual character bytes (when the usage is display) or dual decimal digit bytes (when the usage is computational).

The occurs clause specifies an array of elements. If the occurs clause appears on a group level item, the sub-record is repeated. If the occurs clause appears on an elementary item, that item is repeated.

The redefines clause defines an alias for input bytes. When field R redefines a previously defined field F, the storage bytes are used for both R and F. The record structure does not provide a way to disambiguate the interpretation of the bytes. Program logic must be examined to determine which interpretation is valid.

DDE Class. The parent class, DDE, defines the features of a group-level item. It supports the occurs and redefines features. It can contain a number of DDE items. The leaves of the tree, DDEElement, define the features of an elementary item. It adds support for the picture clause, but removes support for lower-level items.

The optional clauses are handled using a variety of design patterns. The usage information, for instance, is used to create a Strategy object that is used to extract a field from a record's bytes.

The redefines information is used to create a Strategy object that computes the offset to a field. There are two variant strategies: locate the basis field and use that field's offset or use the end of the previous element as the offset.

The Visitor pattern is used to traverse a DDE structure to write reports on the structure. The data_profile module uses a Visitor to write a detailed dump of a given record.

DDE Parser

The RecordFactory object reads a file of text and either creates a DDE or raises an exception. If the text is a valid COBOL record definition, a DDE is created. If there are syntax errors, an exception is raised.

The RecordFactory depends on a Lexer instance to do lexical scanning of COBOL source. The lexical scanner can be subclassed to pre-process COBOL source. This is necessary because of the variety of source formats that are permitted. Shop standards may include or exclude features like program identification, line numbers, format control and other decoration of the input.

The makeRecord() method of the RecordFactory class does the parsing of the record definition. Each individual DDE statement is parsed. The level number information is used to define the correct grouping of elements. When the structure is parsed, it is decorated with size and offset information for each element.

Field Values

There are two broad types of character interpretation:

Character ("usage is display").
Numeric ("usage is computational"). There are several subtypes of computational. The most common computational form is "COMP-3", which is a binary-encoded decimal format.

These require different strategies for decoding the input bytes. Note that the COBOL languages, and IBM's extensions, provide for a number of usage options. In this application, three basic types of usage strategies are supported: display, comp and comp-3.

Display. These are bytes, one per character, described by the picture clause. They can be EBCDIC or ASCII. Python offers a codecs module to convert EBCDIC characters to Unicode for further processing.
COMP. These are binary fields of 2, 4 or 8 bytes, with the size implied by the picture clause.
COMP-3. These are packed decimal fields, with the size derived from the picture clause; there are two digits packed into each byte, with an extra half-byte for a sign.

Sample Data Profile Application

A typical data profiling application program has the following general form.

import cobol_dde
import data_profile

rf= cobol_dde.RecordFactory()
dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

# Create a FieldScan for the three fields we care about
fieldList= data_profile.FieldScan( [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ),
    data_profile.NumFieldValue( dde, 'MCUDBI-YR' ),
    data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' )
] )

# Create a FileScan for the file, using the given FieldScan list of fields
fs= data_profile.FileScan( dde, fieldList, aFileName )

# Process through the given ending record
fs.process( end )

Future Directions

Is EBCDIC->ASCII conversion a feature of DDE? May need subclass or strategy for conversion.

Consider combining PIC, USAGE, and SIGN information into a single data type specification.

Add capability to search using a path string instead of individual get() calls in DDE

Create subclass of DDE for non-group-level items that adds PICTURE and USAGE features and removes the container.

Implementation

We'll define the cobol_dde module and a test_dde unit test for this module.

We'll reuly on the decimal module is used to do fixed-precision decimal arithmetic.

Note

The legacy implementation was the FixedPoint module. While the FixedPoint module is handy, it is not as robust as the decimal module.

The cobol_dde module provides the DDE record definition and Lexical scanning capability.

Separately, we'll look at the data_profile module. This defines the scanning and analyzing features.

cobol_dde

The cobol_dde module has the following structure.

cobol_dde.py (1)

→ DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2)    → (3)    → (4)    → (5)    → (6)
→ DDE Exception Definitions (7)

# 1. Basic class definitions
→ DDE Visitor base class - to analyze a complete DDE tree structure (8)
→ DDE Usage Strategy class hierarchy - to extract data from input buffers (9)
→ DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10)

# 2. DDE class definition
→ DDE Class Hierarchy - defines group and elementary data descriptions elements (11)

# 3. Some utility classes for reporting
→ DDE Common Visitors for reporting on a DDE structure (15)

# 4. The Lexical Scanning and Parsing of an input record layout
→ DDE Lexical Scanner base class provides the default lexical scanner implementation (16)
→ DDE RecordFactory parses a record clause to create a DDE instance (17)

Overheads

Overheads includes the following: the shell escape, the doc string, imports and any CVS cruft.

The shell escape line allows this module to be run as a stand-alone application.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2)

#!/usr/bin/env python

Used by: cobol_dde.py (1)

The doc string provides documentation embedded within this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (3)

"""COBOL Data Description Entries (a/k/a Record-Layout Objects)

A COBOL Record is a collection of data description entries.
Each entry is either a simple field (with a PICTURE) or a group of fields.
Each field has an optional occurs clause, or redefines clause.
Each field has a usage (DISPLAY or COMP or COMP-3).
Each field is assigned an offset, size and data type (numeric or alpha).

This module includes the following class definitions:

DDE
    Defines a COBOL record layout object.
    Each record layout object has operations to locate individual
    fields or occurance instances.

Usage
UsageDisplay
UsageComp
UsageComp3
    Various USAGE clauses; these classes provide a valueOf() method
    which decodes record bytes to a proper value.

Redefines
NonRedefines
    Two strategies for computing a field's offset - either it is after
    the previous field in memory, or it redefines another field's location
    in memory.

RecordFactory
    Parses a COBOL copybook to
    create the DDE structure used to parse a character string into record fields.

Lexer
    A COBOL lexical scanner.  If necessary, this can be
    subclassed to handle unusual file formats or other
    record definition copy book problems.

Visitor
Source
Report
Dump
    A Visitor can traverse the DDE hierarchy.
    Each DDE has a visit() method that applies the visitor to
    the parent and each child in order.
    Source displays canonical source from the original input.
    Report displays the fields including size and offset information.
    Dump is used by visitOccurance() to dump each occurance
    of each field of a record.

SyntaxError
    Raised for a COBOL syntax error.
UnsupportedError
    Raised for a COBOL feature that is not supported by this module.
UsageError
    Raised for a DDE that is not used properly;
    e.g., occurs-clause out of range.
"""

Used by: cobol_dde.py (1)

The following imports are used by this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (4)

import re
import struct
import string
import decimal

Used by: cobol_dde.py (1)

The CVS cruft provides a place for CVS or other version control tool to place the revision number information within this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (5)

__version__ = """$Revision$"""

Used by: cobol_dde.py (1)

We also place a pyweb warning in the overheads. This reminds anyone reading the .py file that it is generated from a pyweb .w source.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (6)

### DO NOT EDIT THIS FILE!
### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'.
### From source DDE.w modified Sun Mar 14 10:46:18 2010.
### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.

Used by: cobol_dde.py (1)

Exception Definitions

The SyntaxError exception is raised during parsing for a few egregious COBOL syntax problems. One presumption underlying this program is that all copybooks are from production source programs, and have no syntax errors.

The UnsupportedError exception is raised during parsing for features of COBOL DDE's that are not supported by this program. These includes features like the OCCURS a TO b variation on the occurs clause, the OCCURS DEPEND ON clause, the RENAMES clause, the SIGN clause, the SYNCHRONIZED clause.

The UsageError exception is raised during analysis of a field when something invalid has happened during field extract.

DDE Exception Definitions (7)

class SyntaxError( Exception ):
    """COBOL syntax error."""
    pass
class UnsupportedError( Exception ):
    """A COBOL DDE has features not supported by this module."""
    pass
class UsageError( Exception ):
    """A COBOL DDE is not used properly, e.g., occurs-clause out of range."""
    pass

Used by: cobol_dde.py (1)

Base Class Definitions

The Base Class definitions can be separated into four high-level subject areas: (1) some basic definitions, (2) the DDE class hierarchy, (3) utility classes for reporting, and (4) the lexical scanning and parsing classes.

The basic definitions include the Visitor base class, the Usage strategy class hierarchy and the Redefines strategy class hierarch.

The DDE class hierarchy is the DDE class and the DDEElement class.

The utility classes include a number of common Visitor subclasses. These incldue Source, Report and Dump. Source produces a canonical report on the COBOL source. Report produces an analysis of the fields, their sizes and offsets. Dump can be used to dump all fields of a record.

The lexical scanning and parsing classes are Lexer and RecordFactory.

Visitor Class

The Visitor design pattern is used to simplify recursive-descent depth-first in-order traversal of the parse tree. An instance of this class must provide a dde() method definition. Each individual element is passed to this method from the top of the DDE structure down each branch in depth-first order.

An instance of visitor may provide an __init__() method that can be used to initialize any internal data structures. An instance may also provide a finish() method that can be called at the end of a traversal to write a summary of the structure.

DDE Visitor base class - to analyze a complete DDE tree structure (8)

class Visitor( object ):
    """Visits each node of a DDE, doing a depth-first traversal of the structure."""
    def __init__( self ):
        self.indent= 0
    def enterSub( self ):
        self.indent += 1
    def exitSub( self ):
        self.indent -= 1
    def dde( self, aDDE ):
        """Given a DDE, perform the requested process."""
        pass
    def finish( self ):
        """Any summary information at the end of the visit."""
        pass

Used by: cobol_dde.py (1)

Usage Strategy Hierarchy

Usage is used to combine information in the picture, usage, sign and synchronized clauses.

The Strategy design pattern allows a DDE element to delegate the size() and valueOf() operations to this class.

The size() method returns the number of bytes used by the data element. For usage display, the size can be computed from the picture clause. For usage computational, the size is 2, 4 or 8 bytes. For usage computational-3, the picture clause digits are packed two per byte with an extra half-byte for sign information.

The valueOf() method returns a usable Python value extracted from the record's bytes. The UsageDisplay subclass does numeric conversion for numeric pictures, otherwise the data is left as a string. The UsageComp subclass does numeric conversion for binary coded data. This handles the mainframe endian conversion. The UsageComp3 subclass unpacks the digits into a character string and then does character-to-numeric conversion.

DDE Usage Strategy class hierarchy - to extract data from input buffers (9)

class Usage( object ):
    """Covert numeric data based on Usage clause."""
    def __init__( self, name_ ):
        self.myName= name_
        self.numeric= None
        self.originalSize= None
        self.scale= None
        self.precision= None
        self.signed= None
    def setTypeInfo( self, **typeInfo ):
        """After parsing a PICTURE clause, provide additional usage information."""
        self.numeric = typeInfo['numeric']
        self.originalSize = typeInfo['length']
        self.scale = typeInfo['scale']
        self.precision = typeInfo['precision']
        self.signed = typeInfo['signed']
        self.decimal = typeInfo['decimal']
    def valueOf( self, buffer ):
        """Convert this data to a decimal number."""
        return None
    def size( self, picture ):
        """Return the actual size of this data, based on PICTURE and SIGN."""
        return len(picture)

class UsageDisplay( Usage ):
    """Convert from ordinary character data to numeric."""
    # NOTE: EBCDIC->ASCII conversion handled by the DDE as a whole.
    def __init__( self ):
        Usage.__init__( self, "DISPLAY" )
    def valueOf( self, buffer ):
        if self.numeric and self.precision != 0:
            if self.decimal == '.':
                return decimal.Decimal( buffer )
            # Insert the implied decimal point.
            return decimal.Decimal( buffer[:-self.precision]+"."+buffer[-self.precision:] )
        elif self.numeric and self.precision == 0:
            return int(buffer)
        return buffer

class UsageComp( Usage ):
    """Convert from COMP data to numeric.

    This may need to be overridden to handle little-endian data."""
    def __init__( self ):
        Usage.__init__( self, "COMP" )
    def valueOf( self, buffer ):
        n= struct.unpack( self.sc, buffer )
        return decimal.Decimal( n[0] )
    def size( self, picture ):
        if len(picture) <= 4:
            self.sc= '>h'
            return 2
        elif len(picture) <= 9:
            self.sc= '>i'
            return 4
        else:
            self.sc= '>q'
            return 8

class UsageComp3( Usage ):
    """Convert from COMP-3 data to numeric."""
    def __init__( self ):
        Usage.__init__( self, "COMP-3" )
    def valueOf( self, buffer ):
        display= []
        for c in buffer:
            n= struct.unpack( "B", c )
            display.append( str(n[0]/16) )
            display.append( str(n[0]%16) )
        #print repr(buffer), repr(display)
        #Last position has sign information: 'd' is <0, 'f' is unsigned, and 'c' >=0
        f= decimal.Decimal( "".join(display[:-1]) )
        if display[-1]==13: return -f
        return f
    def size( self, picture ):
        return int((len(picture)+2)/2)

Used by: cobol_dde.py (1)

Redefines Strategy Pattern

Redefines is used to reset the offset to a specific group or elementary item. There are only two cases, modeled by two subclasses: Redefines and NonRedefines. An element can redefine another element; in this case the two elements have the same offset; this is handled by the Redefines class. An element can be independent; in this case it begins after the end of the lexically preceeding element; in this case the offset is computed from the previous element's offset + size.

The Strategy design pattern allows an element to delegate the offset(), indexedOffset() and size() methods. The Redefines subclass uses the redefines name to look up the offset and size information. The NonRedefines subclass uses the offset and size information currently being computed during the visit loop.

DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10)

class Redefines( object ):
    """Lookup size and offset from the field we rename."""
    def __init__( self, name_=None ):
        self.myName= name_
    def offset( self, offset, aDDE ):
        return aDDE.top.get( self.myName ).offset
    def indexedOffset( self, offset, aDDE ):
        return aDDE.top.get( self.myName ).indexedOffset
    def size( self, aDDE ):
        return 0

class NonRedefines( Redefines ):
    """More typical case is that we have our own size and offset."""
    def offset( self, offset, aDDE ):
        return offset
    def indexedOffset( self, offset, aDDE ):
        return offset + aDDE.occurSize*aDDE.currentIndex
    def size( self, aDDE ):
        return aDDE.size

Used by: cobol_dde.py (1)

DDE Class

The DDE class itself defines a single element (group or elementary) of a record. There are several broad areas of functionality for a DDE: (1) construction, (2) reporting, (3) record scanning.

The class definition includes the attributes determined at parse time, attributes added during decoration time and attributes used during decoration processing.

Note that Group-level vs. item-level can be separate subclasses of DDE. And item-level definition has a picture clause; group level does not. A simple Visitor, then, can accumulate all item-level fields.

DDE Class Hierarchy - defines group and elementary data descriptions elements (11)

class DDE( object ):
    """A Data Description Entry.

    This is either a group-level item, which contains DDE's, or
    it is a lowest-level DDE, defined by a PICTURE clause.
    All higher-level DDE's are effectively string-type data.
    A lowest-level DDE with a numeric PICTURE is numeric-type data.
    Occurs and Redefines can occur at any level.  Almost anything
    can be combined with anything else.

    Each entry is defined by the following attributes
        level       COBOL level number 01 to 49, 66 or 88.
        myName      COBOL variable name
        occurs      the number of occurances (default is 1)
        picture     the exploded picture clause, with ()'s expanded
        initValue   any initial value provided
        offset      offset to this field from start of record
        size        overall size of this item, including all occurances
        occurSize   the size of an individual occurance
        sizeScalePrecision  ( numeric, size, scale (# of P's), precision)
        redefines   an instance of Redefines used to compute the offset
        usage       an instance of Usage used to do data conversions
        contains    the list of contained fields
        parent      the immediate parent DDE
        top         the overall record definition DDE
        currentIndex    the current index values used for locating data
        indexedOffset   the current offset based on current index values

    The primary interface is get(), setIndex(), of() and valOf().
    get('dataname') returns the DDE for the given dataname
    setIndex(x,...) sets the current indexes for the various occurs clauses
    of(record) locates this DDE's bytes within the given record
    valOf(record) locates this DDE's bytes and interprets them as a number
    """
    def __init__( self, level, name_, usage=None, pic=None, occurs=None, redefines=None, ssp=(None,None,None,None), initValue=None ):
        self.level= level
        self.myName= name_
        self.offset= 0
        self.size= 0
        self.occurs= occurs
        self.occurSize= None
        self.picture= pic
        self.sizeScalePrecision= ssp
        self.redefines= redefines
        self.usage= usage
        self.initValue= initValue
        self.contains= []
        self.parent= None
        self.top= None
        self.currentIndex= 0
        self.indexedOffset= None
    def __repr__( self ):
        return "%s %s %s" % ( self.level, self.myName, map(str,self.contains) )
    def __str__( self ):
        oc= ""
        pc= ""
        rc= ""
        if self.occurs > 1: oc= " OCCURS %s" % self.occurs
        if self.picture: pc= " PIC %s USAGE %s" % ( self.picture, self.usage.myName )
        if self.redefines.myName: rc= " REDEFINES %s" % ( self.redefines.myName )
        return "%-2s %-20s%s%s%s." % ( self.level, self.myName, rc, oc, pc )
        → DDE Class Construction methods (12)
        → DDE Class Reporting methods (13)
        → DDE Class Record Scanning methods (14)

Used by: cobol_dde.py (1)

Construction occurs in three general steps: (1) the DDE is created, (2) source attributes are set, (3) the DDE is decorated with size, offset and other details.

DDE Class Construction methods (12)

def append( self, aDDE ):
    """Add a substructure to this DDE.

    This is used by RecordFactory to assemble the DDE."""
    self.contains.append( aDDE )
    aDDE.parent= self
def setTop( self, topDDE ):
    """Set the immediate parentage and top-level record for this DDE.

    Used by RecordFactory to assemble the DDE.
    Required before setSizeAndOffset()."""
    self.top= topDDE
    for f in self.contains:
        f.parent= self
        f.setTop( topDDE )
def setSizeAndOffset( self, offset=0 ):
    """Compute the size and offset for each field of this DDE.

    Used by RecordFactory to assemble the DDE.
    Requires setTop be done first.

    Note: 88-level items inherit attributes from their parent.
    """
    # Wire in a single occurance, it simplifies the math, below.
    if not self.occurs: self.occurs= 1
    # If this is a redefines, get a different offset, otherwise use this offset
    self.offset= self.redefines.offset( offset, self )
    # Set the default indexedOffset
    self.indexedOffset= self.offset
    # PICTURE - elementary item; otherwise group-level item
    if self.picture:
        # Get the correct size based on USAGE
        self.occurSize= self.usage.size(self.picture)
        self.size= self.occurSize * self.occurs
        # Any contained items?  These would be 88-level items.
        for f in self.contains:
            assert '88' == f.level, "Unexpected Level {0!r}".format(f.level)
            f.setSizeAndOffset(self.offset)
    elif self.level == '88':
        self.occurSize= self.parent.occurSize
        self.size = self.parent.size
        self.usage= self.parent.usage
    else:
        # Get the correct size based on each element of the group
        s= 0 # Was self.offset???? Wasn't That Funny?
        for f in self.contains:
            # Element size and offset
            f.setSizeAndOffset(s)
            # non-redefines add to the size; redefines add 0 to the size
            s += f.redefines.size( f )
        self.occurSize= s
        # Multiply by the number of occurances to get the total size
        self.size= self.occurSize * self.occurs

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

The Visitor design pattern requires that each DDE have a method that is used to implement the visitor traversal. The visit() method visits each element. The visitOccurance() method visits each occurance of each element.

DDE Class Reporting methods (13)

def visit( self, visitor ):
    """Visit this DDE and each element."""
    visitor.dde( self )
    if self.contains:
        visitor.enterSub()
        for f in self.contains:
            f.visit( visitor )
        visitor.exitSub()
def visitOccurance( self, visitor ):
    """Visit each occurance of this DDE
    and each occurance of each element."""
    if not self.occurs: return
    for self.currentIndex in range(0,self.occurs):
        self.top.setIndexedOffset(0)    # compute offsets for this new index value
        visitor.dde( self )
        if self.contains:
            visitor.enterSub()
            for f in self.contains:
                f.visitOccurance( visitor )
            visitor.exitSub()

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

The process of scanning a record involves methods to locate a specific field, set the occurance index of a field, and pick bytes of a record input buffer.

DDE Class Record Scanning methods (14)

def pathTo( self ):
    """Return the complete path to this DDE."""
    if self.parent: return self.parent.pathTo() + "." + self.myName
    return self.myName
def get( self, name_ ):
    """Find the named field, and return the substructure.

    If necessary, search down through levels."""
    for c in self.contains:
        if c.myName == name_:
            return c
    for c in self.contains:
        try:
            f= c.get(name_)
            if f: return f
        except UsageError, e:
            pass
    raise UsageError( "Field %s unknown in this record" % name_ )
def setIndex( self, *occurance ):
    """Set the index values for locating specific data bytes."""
    # Handles multi-dimensional short-cut syntax.
    # Work up through parentage to locate occurs clauses and pop off indexes
    if self.occurs > 1:
        if self.occurs < occurance[-1] or occurance[-1] <= 0:
            raise UsageError( "Occurs value %r out of bounds %r" % ( occurance, self ) )
        self.currentIndex= occurance[-1]-1
        #print self.myName, 'occurs', self.occurs, 'index', self.currentIndex+1
        # Recursive call to setIndex for all remaining index values.
        if occurance[:-1]:
            self.parent.setIndex( *occurance[:-1] )
    else:
        #print self.myName, 'search upward',repr(occurance)
        self.parent.setIndex( *occurance )
    # Compute offsets for these new index values
    self.top.setIndexedOffset(0)
    return self
def setIndexedOffset( self, offset=0 ):
    """Given index values, compute the indexed offsets into occurs clauses.

    Used by setIndex to compute indexed offsets."""
    # TODO: may be able to eliminate this if-statement!
    if self.occurSize:
        # Redefines will use an offset from another field, otherwise use the offset provided
        self.indexedOffset= self.redefines.indexedOffset( offset, self )
        s= self.indexedOffset
        for f in self.contains:
            # Update elements within this group
            f.setIndexedOffset( s )
            # Redefines add zero to the size, otherwise increment offset with the size
            s += f.redefines.size( f )
def of( self, aString ):
    """Pick the data bytes out of an input string.

    TODO: May require EBCDIC->ASCII conversion.

    Requires setIndexedOffset() call if indexes were changed without calling setIndex()
    Use valOf to handle packed decimal data (USAGE COMP-3).
    """
    b= self.indexedOffset
    return aString[b:b+self.occurSize]
def valOf( self, aString ):
    """Pick the data bytes out of an input string and interpret as a number."""
    bytes= self.of( aString )
    return self.usage.valueOf( bytes )

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

Common Visitors

Two common visitor needs are: (1) visit all elements, producing a listing that is a canonical version of the original source; (2) visit all elements producing additional details (e.g., size, offset, data type). Additionally, when examining actual data values, it is necessary to visit each element displaying the current value of that element. This traversal needs to visit each occurance, also. This depends on the visitOccurance() method of a DDE.

DDE Common Visitors for reporting on a DDE structure (15)

class Source( Visitor ):
    """Display canonical source from copybook parsing."""
    def dde( self, aDDE ):
        print self.indent*'  ', aDDE

class Report( Visitor ):
    """Report on copybook structure."""
    def dde( self, aDDE ):
        numeric,size,scale,precision= aDDE.sizeScalePrecision
        if numeric:
            nSpec= '%d.%d' % ( size, precision )
        else:
            nSpec= ""
        print "%-65s %3d %3d %5s" % (self.indent*'  '+str(aDDE), aDDE.offset, aDDE.size, nSpec)

class Dump( Visitor ):
    """Dump the data values of this structure."""
    def __init__( self, data ):
        Visitor.__init__( self )
        self.data= data
    def dde( self, aDDE ):
        db= aDDE.of(self.data)
        dstr= []
        for c in db:
            dstr.append( "%2s"%hex( ord(c) )[2:] )
        r= " ".join(dstr) # or r=db
        if aDDE.occurs > 1:
            print "%-65s %3d %3d %3d '%s'" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size, aDDE.currentIndex+1, r)
        elif aDDE.picture and aDDE.myName != "FILLER":
            print "%-65s %3d %3d '%s'" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size, r)
        else:
            print "%-65s %3d %3d" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size)

Used by: cobol_dde.py (1)

Lexical Scanning

The lexical scanner can be subclassed to extend its capability. The default lexical scanner provides a lineClean() function that simply removes comments. This may need to be overridden to remove line numbers (from positions 72-80), module identification (from positions 1-5), and format control directives.

DDE Lexical Scanner base class provides the default lexical scanner implementation (16)

class Lexer( object ):
    """Lexical scanner for COBOL.

    Given a block of text, this scanner will remove comment lines.
    next() will step through the tokens
    unget(token) will back up a token
    """
    def __init__( self, text ):
        """Initialize the scanner by cleaning the text."""
        self.lines= self.lineClean( text )
        self.backup= []
        self.separator= re.compile( r'[.,;]?s' )
        self.quote1= re.compile( r"'[^']*'" )
        self.quote2= re.compile( r'"[^"]*"' )
    def lineClean( self, text ):
        """Default cleaner skips comments."""
        return [ l[6:]+' ' for l in text.split('n') if len(l) > 6 and l[6] not in ('*','/') ]
    def next( self ):
        """Locate the next token in the input stream."""
        if self.backup:
            return self.backup.pop()
        #print "self.lines=", self.lines
        if not self.lines[0]:
            self.lines.pop(0)
        if not self.lines:
            print "EOF"
            return None
        while self.lines and self.lines[0] and self.lines[0][0] in string.whitespace:
            self.lines[0]= self.lines[0].lstrip()
            if not self.lines[0]:
                self.lines.pop(0)
            if not self.lines:
                return None
        if self.lines[0][0] == "'":
            # quoted string, break on balancing quote
            match= self.quote1.match( self.lines[0] )
            space= match.end()
        elif self.lines[0][0] == '"':
            # quoted string, break on balancing quote
            match= self.quote2.match( self.lines[0] )
            space= match.end()
        else:
            match= self.separator.search( self.lines[0] )
            space= match.start()
            if space == 0: # starts with separator
                space= match.end()-1
        token, self.lines[0] = self.lines[0][:space], self.lines[0][space:]
        #print token
        return token
    def unget( self, token ):
        """Push one token back into the input stream."""
        self.backup.append( token )

Used by: cobol_dde.py (1)

Parsing

The RecordFactory class is the parser for record definitions. The parser has three basic sets of methods: (1) clause parsing methods, (2) element parsing methods and (3) Complete record layout parsing.

Parsing a record layout involves parsing a sequence of elements and assembling them into a proper structure. Each element consists of a sequence of individual clauses.

DDE RecordFactory parses a record clause to create a DDE instance (17)

class RecordFactory( object ):
    """Parse a copybook, creating a DDE structure."""
    def __init__( self ):
        self.lex= None
        self.token= None
        self.context= []
        self.noisewords= ("WHEN","IS","TIMES")
        self.keywords= ("BLANK","ZERO","ZEROS","ZEROES",
            "DATE","FORMAT","EXTERNAL","GLOBAL",
            "JUST","JUSTIFIED","LEFT","RIGHT"
            "OCCURS",
            "PIC","PICTURE",
            "REDEFINES","RENAMES",
            "SIGN","LEADING","TRAILING","SEPARATE","CHARACTER",
            "SYNCH","SYNCHRONIZED",
            "USAGE","DISPLAY","COMP-3",
            "VALUE",".")
        → DDE Picture Clause Parsing (18)
        → DDE Blank When Zero Clause Parsing (19)
        → DDE Justified Clause Parsing (20)
        → DDE Occurs Clause Parsing (21)
        → DDE Redefines Clause Parsing (22)
        → DDE Renames Clause Parsing (23)
        → DDE Sign Clause Parsing (24)
        → DDE Synchronized Clause Parsing (25)
        → DDE Usage Clause Parsing (26)
        → DDE Value Clause Parsing (27)

        → DDE Element Parsing (28)
        → DDE Record Parsing (29)

Used by: cobol_dde.py (1)

DDE Picture Clause Parsing (18)

def picParse( self, pic ):
    """Rewrite a picture clause to eliminate ()'s, S's, V's, P's, etc.

    Returns expanded, normalized picture and (type,length,scale,precision,signed) information."""
    out= []
    scale, precision, signed, decimal = 0, 0, False, None
    while pic:
        c= pic[:1]
        if c in ('A','B','X','Z','9','0','/',',','+','-','*','$'):
            out.append( c )
            if decimal: precision += 1
            pic= pic[1:]
        elif pic[:2] in ('DB','CR'):
            out.append( pic[:2] )
            pic= pic[2:]
        elif c == '(':
            irpt= 0
            pic= pic[1:]
            # A regular expression may be quicker and simpler!
            try:
                while pic and pic[:1].isdigit():
                    irpt = 10*irpt+int( pic[:1] )
                    pic= pic[1:]
            except ValueError, t:
                raise SyntaxError( "picture error in %r"%pic )
            out.append( (irpt-1)*out[-1] )
            assert pic[0] == ')', SyntaxError( "picture error in %r"%pic )
            pic= pic[1:]
        elif c == 'S':
            # silently drop an "S".
            # Note that 'S' plus a SIGN SEPARATE option increases the size of the picture!
            signed= True
            pic= pic[1:]
        elif c  == 'P':
            # silently drop a "P", since it just sets scale and isn't represented.
            scale += 1
            pic= pic[1:]
        elif c  == "V":
            decimal= "V"
            pic= pic[1:]
        elif c  == ".":
            decimal= "."
            out.append( "." )
            pic= pic[1:]
        else:
            raise SyntaxError( "picture error in %s"%pic )

    final= "".join( out )
    alpha= ('A' in final) or ('X' in final) or ('/' in final)
    #print pic, final, alpha, scale, precision
    # Note: Actual size depends on len(final) and usage!
    return dict(
        final=final, alpha=alpha, numeric=not alpha,
        length=len(final), scale=scale,
        precision= precision, signed=signed,
        decimal=decimal)
def picture( self ):
    """Parse a PICTURE clause."""
    if self.token == "IS":
        self.token= self.lex.next()
    pic= self.lex.next()
    self.token= self.lex.next()
    return self.picParse(pic)