COBOL Data Definition Entry Processing

Use a COBOL Record Layout to Process an EBCDIC File

Steven F. Lott

Contents

Introduction

When dealing with "Flat Files" from legacy COBOL problems, there are several problems that need to be solved.

  1. The files have a fixed field layout, without delimiters. This means that the offset of each field must be used to decompose the record into it's individual elements.
  2. Developing the offsets to each field is tedious and error-prone work. Accounting for signs, alignment, and redefines makes this difficult. It's necessary to parse the "Copy Book", which has the original COBOL source definition for the file.
  3. Numeric fields can have an implied decimal point, making if difficult to determine the value of a string of digits. The "Copy Book" is essential for parsing the file contents.
  4. COBOL can make use of numeric data represented in a variety of "Computational" forms. The "Computational-3" ("COMP-3") form is particularly complex because decimal digits are packed two per byte and the trailing half-byte encodes sign information.
  5. The string data may be encoded in EBCDIC.
  6. COBOL encourages the use of data aliases (or "unions") via the REDEFINES clause. Without the entire suite of COBOL programs, the handling of REDEFINES can become an insoluable problem.

Generally, COBOL files are defined by a "Data Definition Entry" (DDE) that provides the record layout.

This library helps parse DDE's to determine the offset, size, and encoding of each field. This information can be used by Python programs to process files that originate from COBOL systems.

Applications

There are two common applications for a Python-based analysis of COBOL data.

  • Extract, Transform and Load (ETL).
  • Data Profiling.

ETL. Extract Transform and Load (ETL) processing is a common pipeline between legacy COBOL ("Flat File") applications and relational database (or Object Database) applications. A Python application can be used to Extract data from COBOL flat files and either load a database or create a file in a more usable notation (e.g., XML, JSON or CSV).

The cobol_dde module helps to create an ETL application.

  1. It parses the COBOL "Copy Book" source to permit interpretation of the file.
  2. Given a block of bytes, it can extract fields as byte strings or as interpreted values. It will produce int or Decimal values. (Currently, floating-point "Comp-2" fields are not supported.)
  3. Python application logic can be used to disentangle the various record types encoded via REDEFINES clauses.
  4. With the copy book and the DDE module, a Python program can then rewrite a COBOL file.

Profiling. Data Warehouse processing requires complete understanding of source application data domains. This complete understanding of a domain is part of analyzing data quality. Data domains, when not formalized by a programming language or database design, tend to grow in sometimes obscure ways, including bad data, special-purpose data and undocumented data.

Bad data are simply invalid values used in application files. These can be tolerated either because of programming bugs or dependencies that permit invalid data under certain circumstances. Typically, the latter case indicates a normalization issue. The domain values are illegal.

Special-purpose data is often a "patch" or "hack" to work around a problem. It may be documented, but rarely used, and unexpected by ETL programmers. Sometimes the special-purpose data represents an operational hack to work around a problem without changing the programming. Irrespective of the origin of the hack, the domain values are legal, but unexpected and uncommon.

Undocumented data is ordinary operational values that are actually widely used by unexpected because they are undocumented. Often this is because of data that is not part of the essential use cases for an application, but is additional data with an obscure final destination. The domain values are legal, unexpected but common.

Flat files processed by COBOL programs are very common and often suffer from the above data quality problems. The language imposes few rules on data domains. Programs can be easily modified to extend a domain. Processing rules can be quite obscure, making it necessary to analyze actual data rather than program source.

Requirements

This application produces simple reports on the range of values found for particular fields in a file. This information is used to understand data quality, the actual values in a domain, and support reverse engineering software.

The primary use case is described below.

Analyze a File

The user is given a file and the associated record definition (also known as a "Copy Book").

The user creates an application program based on data_profile to name the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form.

The user runs the profiler with the file, driver and copybook.

The application produces a summary report showing each of the named fields, their complete domain of values, and the occurance count for each value.

Extract Data

The user is given a file and the associated record definition (also known as a "Copy Book").

The user creates an application program based on cobol_dde to extract the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form. The application loads a database or writes an output file in a more usable format (e.g., JSON).

The user installs the application for operational use.

Design Overview

The cobol_dde module has two distinct phases of operation. The first phase parses a COBOL Data Description Entry (DDE) to understand the record layout of the file. The second phase uses the parsed DDE to extract fields from a record.

The data_profile module is a framework for building data profiling applications based on the cobol_dde module. The data profiler uses COBOL Data Description Entry (DDE) to understand the record layout of the file. The profiling then uses uses the DDE object to examine selected fields of the file.

DDE Structure

The DDE class is a recursive definition of a COBOL group-level DDE. There are two basic species of COBOL DDE's: elemetary items, which have a picture clause, and group-level items, which contain lower-level items. There are everal optional features of every DDE, including an occurs clause and a redefines clause. In additional a picture clause, elementary items can also have and optional usage clause, and optional sign clause.

The picture clause specifies how to interpret a sequence of bytes. The picture clause interacts with the optional usage clause, sign clause and synchronized clause to fully define the bytes. The picture clause uses a complex format of code characters to define either individual character bytes (when the usage is display) or dual decimal digit bytes (when the usage is computational).

The occurs clause specifies an array of elements. If the occurs clause appears on a group level item, the sub-record is repeated. If the occurs clause appears on an elementary item, that item is repeated.

The redefines clause defines an alias for input bytes. When field R redefines a previously defined field F, the storage bytes are used for both R and F. The record structure does not provide a way to disambiguate the interpretation of the bytes. Program logic must be examined to determine which interpretation is valid.

DDE Class. The parent class, DDE, defines the features of a group-level item. It supports the occurs and redefines features. It can contain a number of DDE items. The leaves of the tree, DDEElement, define the features of an elementary item. It adds support for the picture clause, but removes support for lower-level items.

The optional clauses are handled using a variety of design patterns. The usage information, for instance, is used to create a Strategy object that is used to extract a field from a record's bytes.

The redefines information is used to create a Strategy object that computes the offset to a field. There are two variant strategies: locate the basis field and use that field's offset or use the end of the previous element as the offset.

The Visitor pattern is used to traverse a DDE structure to write reports on the structure. The data_profile module uses a Visitor to write a detailed dump of a given record.

DDE Parser

The RecordFactory object reads a file of text and either creates a DDE or raises an exception. If the text is a valid COBOL record definition, a DDE is created. If there are syntax errors, an exception is raised.

The RecordFactory depends on a Lexer instance to do lexical scanning of COBOL source. The lexical scanner can be subclassed to pre-process COBOL source. This is necessary because of the variety of source formats that are permitted. Shop standards may include or exclude features like program identification, line numbers, format control and other decoration of the input.

The makeRecord() method of the RecordFactory class does the parsing of the record definition. Each individual DDE statement is parsed. The level number information is used to define the correct grouping of elements. When the structure is parsed, it is decorated with size and offset information for each element.

Field Values

There are two broad types of character interpretation:

  • Character ("usage is display").
  • Numeric ("usage is computational"). There are several subtypes of computational. The most common computational form is "COMP-3", which is a binary-encoded decimal format.

These require different strategies for decoding the input bytes. Note that the COBOL languages, and IBM's extensions, provide for a number of usage options. In this application, three basic types of usage strategies are supported: display, comp and comp-3.

  • Display. These are bytes, one per character, described by the picture clause. They can be EBCDIC or ASCII. Python offers a codecs module to convert EBCDIC characters to Unicode for further processing.
  • COMP. These are binary fields of 2, 4 or 8 bytes, with the size implied by the picture clause.
  • COMP-3. These are packed decimal fields, with the size derived from the picture clause; there are two digits packed into each byte, with an extra half-byte for a sign.

Sample Data Profile Application

A typical data profiling application program has the following general form.

import cobol_dde
import data_profile

rf= cobol_dde.RecordFactory()
dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

# Create a FieldScan for the three fields we care about
fieldList= data_profile.FieldScan( [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ),
    data_profile.NumFieldValue( dde, 'MCUDBI-YR' ),
    data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' )
] )

# Create a FileScan for the file, using the given FieldScan list of fields
fs= data_profile.FileScan( dde, fieldList, aFileName )

# Process through the given ending record
fs.process( end )

Future Directions

Is EBCDIC->ASCII conversion a feature of DDE? May need subclass or strategy for conversion.

Consider combining PIC, USAGE, and SIGN information into a single data type specification.

Add capability to search using a path string instead of individual get() calls in DDE

Create subclass of DDE for non-group-level items that adds PICTURE and USAGE features and removes the container.

Implementation

We'll define the cobol_dde module and a test_dde unit test for this module.

We'll reuly on the decimal module is used to do fixed-precision decimal arithmetic.

Note

The legacy implementation was the FixedPoint module. While the FixedPoint module is handy, it is not as robust as the decimal module.

The cobol_dde module provides the DDE record definition and Lexical scanning capability.

Separately, we'll look at the data_profile module. This defines the scanning and analyzing features.

cobol_dde

The cobol_dde module has the following structure.

cobol_dde.py (1)

→ DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2)    → (3)    → (4)    → (5)    → (6)
→ DDE Exception Definitions (7)

# 1. Basic class definitions
→ DDE Visitor base class - to analyze a complete DDE tree structure (8)
→ DDE Usage Strategy class hierarchy - to extract data from input buffers (9)
→ DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10)

# 2. DDE class definition
→ DDE Class Hierarchy - defines group and elementary data descriptions elements (11)

# 3. Some utility classes for reporting
→ DDE Common Visitors for reporting on a DDE structure (15)

# 4. The Lexical Scanning and Parsing of an input record layout
→ DDE Lexical Scanner base class provides the default lexical scanner implementation (16)
→ DDE RecordFactory parses a record clause to create a DDE instance (17)

Overheads

Overheads includes the following: the shell escape, the doc string, imports and any CVS cruft.

The shell escape line allows this module to be run as a stand-alone application.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2)

#!/usr/bin/env python

Used by: cobol_dde.py (1)

The doc string provides documentation embedded within this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (3)

"""COBOL Data Description Entries (a/k/a Record-Layout Objects)

A COBOL Record is a collection of data description entries.
Each entry is either a simple field (with a PICTURE) or a group of fields.
Each field has an optional occurs clause, or redefines clause.
Each field has a usage (DISPLAY or COMP or COMP-3).
Each field is assigned an offset, size and data type (numeric or alpha).

This module includes the following class definitions:

DDE
    Defines a COBOL record layout object.
    Each record layout object has operations to locate individual
    fields or occurance instances.

Usage
UsageDisplay
UsageComp
UsageComp3
    Various USAGE clauses; these classes provide a valueOf() method
    which decodes record bytes to a proper value.

Redefines
NonRedefines
    Two strategies for computing a field's offset - either it is after
    the previous field in memory, or it redefines another field's location
    in memory.

RecordFactory
    Parses a COBOL copybook to
    create the DDE structure used to parse a character string into record fields.

Lexer
    A COBOL lexical scanner.  If necessary, this can be
    subclassed to handle unusual file formats or other
    record definition copy book problems.

Visitor
Source
Report
Dump
    A Visitor can traverse the DDE hierarchy.
    Each DDE has a visit() method that applies the visitor to
    the parent and each child in order.
    Source displays canonical source from the original input.
    Report displays the fields including size and offset information.
    Dump is used by visitOccurance() to dump each occurance
    of each field of a record.

SyntaxError
    Raised for a COBOL syntax error.
UnsupportedError
    Raised for a COBOL feature that is not supported by this module.
UsageError
    Raised for a DDE that is not used properly;
    e.g., occurs-clause out of range.
"""

Used by: cobol_dde.py (1)

The following imports are used by this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (4)

import re
import struct
import string
import decimal

Used by: cobol_dde.py (1)

The CVS cruft provides a place for CVS or other version control tool to place the revision number information within this module.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (5)

__version__ = """$Revision$"""

Used by: cobol_dde.py (1)

We also place a pyweb warning in the overheads. This reminds anyone reading the .py file that it is generated from a pyweb .w source.

DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (6)

### DO NOT EDIT THIS FILE!
### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'.
### From source DDE.w modified Sun Mar 14 10:46:18 2010.
### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.

Used by: cobol_dde.py (1)

Exception Definitions

The SyntaxError exception is raised during parsing for a few egregious COBOL syntax problems. One presumption underlying this program is that all copybooks are from production source programs, and have no syntax errors.

The UnsupportedError exception is raised during parsing for features of COBOL DDE's that are not supported by this program. These includes features like the OCCURS a TO b variation on the occurs clause, the OCCURS DEPEND ON clause, the RENAMES clause, the SIGN clause, the SYNCHRONIZED clause.

The UsageError exception is raised during analysis of a field when something invalid has happened during field extract.

DDE Exception Definitions (7)

class SyntaxError( Exception ):
    """COBOL syntax error."""
    pass
class UnsupportedError( Exception ):
    """A COBOL DDE has features not supported by this module."""
    pass
class UsageError( Exception ):
    """A COBOL DDE is not used properly, e.g., occurs-clause out of range."""
    pass

Used by: cobol_dde.py (1)

Base Class Definitions

The Base Class definitions can be separated into four high-level subject areas: (1) some basic definitions, (2) the DDE class hierarchy, (3) utility classes for reporting, and (4) the lexical scanning and parsing classes.

The basic definitions include the Visitor base class, the Usage strategy class hierarchy and the Redefines strategy class hierarch.

The DDE class hierarchy is the DDE class and the DDEElement class.

The utility classes include a number of common Visitor subclasses. These incldue Source, Report and Dump. Source produces a canonical report on the COBOL source. Report produces an analysis of the fields, their sizes and offsets. Dump can be used to dump all fields of a record.

The lexical scanning and parsing classes are Lexer and RecordFactory.

Visitor Class

The Visitor design pattern is used to simplify recursive-descent depth-first in-order traversal of the parse tree. An instance of this class must provide a dde() method definition. Each individual element is passed to this method from the top of the DDE structure down each branch in depth-first order.

An instance of visitor may provide an __init__() method that can be used to initialize any internal data structures. An instance may also provide a finish() method that can be called at the end of a traversal to write a summary of the structure.

DDE Visitor base class - to analyze a complete DDE tree structure (8)

class Visitor( object ):
    """Visits each node of a DDE, doing a depth-first traversal of the structure."""
    def __init__( self ):
        self.indent= 0
    def enterSub( self ):
        self.indent += 1
    def exitSub( self ):
        self.indent -= 1
    def dde( self, aDDE ):
        """Given a DDE, perform the requested process."""
        pass
    def finish( self ):
        """Any summary information at the end of the visit."""
        pass

Used by: cobol_dde.py (1)

Usage Strategy Hierarchy

Usage is used to combine information in the picture, usage, sign and synchronized clauses.

The Strategy design pattern allows a DDE element to delegate the size() and valueOf() operations to this class.

The size() method returns the number of bytes used by the data element. For usage display, the size can be computed from the picture clause. For usage computational, the size is 2, 4 or 8 bytes. For usage computational-3, the picture clause digits are packed two per byte with an extra half-byte for sign information.

The valueOf() method returns a usable Python value extracted from the record's bytes. The UsageDisplay subclass does numeric conversion for numeric pictures, otherwise the data is left as a string. The UsageComp subclass does numeric conversion for binary coded data. This handles the mainframe endian conversion. The UsageComp3 subclass unpacks the digits into a character string and then does character-to-numeric conversion.

DDE Usage Strategy class hierarchy - to extract data from input buffers (9)

class Usage( object ):
    """Covert numeric data based on Usage clause."""
    def __init__( self, name_ ):
        self.myName= name_
        self.numeric= None
        self.originalSize= None
        self.scale= None
        self.precision= None
        self.signed= None
    def setTypeInfo( self, **typeInfo ):
        """After parsing a PICTURE clause, provide additional usage information."""
        self.numeric = typeInfo['numeric']
        self.originalSize = typeInfo['length']
        self.scale = typeInfo['scale']
        self.precision = typeInfo['precision']
        self.signed = typeInfo['signed']
        self.decimal = typeInfo['decimal']
    def valueOf( self, buffer ):
        """Convert this data to a decimal number."""
        return None
    def size( self, picture ):
        """Return the actual size of this data, based on PICTURE and SIGN."""
        return len(picture)

class UsageDisplay( Usage ):
    """Convert from ordinary character data to numeric."""
    # NOTE: EBCDIC->ASCII conversion handled by the DDE as a whole.
    def __init__( self ):
        Usage.__init__( self, "DISPLAY" )
    def valueOf( self, buffer ):
        if self.numeric and self.precision != 0:
            if self.decimal == '.':
                return decimal.Decimal( buffer )
            # Insert the implied decimal point.
            return decimal.Decimal( buffer[:-self.precision]+"."+buffer[-self.precision:] )
        elif self.numeric and self.precision == 0:
            return int(buffer)
        return buffer

class UsageComp( Usage ):
    """Convert from COMP data to numeric.

    This may need to be overridden to handle little-endian data."""
    def __init__( self ):
        Usage.__init__( self, "COMP" )
    def valueOf( self, buffer ):
        n= struct.unpack( self.sc, buffer )
        return decimal.Decimal( n[0] )
    def size( self, picture ):
        if len(picture) <= 4:
            self.sc= '>h'
            return 2
        elif len(picture) <= 9:
            self.sc= '>i'
            return 4
        else:
            self.sc= '>q'
            return 8

class UsageComp3( Usage ):
    """Convert from COMP-3 data to numeric."""
    def __init__( self ):
        Usage.__init__( self, "COMP-3" )
    def valueOf( self, buffer ):
        display= []
        for c in buffer:
            n= struct.unpack( "B", c )
            display.append( str(n[0]/16) )
            display.append( str(n[0]%16) )
        #print repr(buffer), repr(display)
        #Last position has sign information: 'd' is <0, 'f' is unsigned, and 'c' >=0
        f= decimal.Decimal( "".join(display[:-1]) )
        if display[-1]==13: return -f
        return f
    def size( self, picture ):
        return int((len(picture)+2)/2)

Used by: cobol_dde.py (1)

Redefines Strategy Pattern

Redefines is used to reset the offset to a specific group or elementary item. There are only two cases, modeled by two subclasses: Redefines and NonRedefines. An element can redefine another element; in this case the two elements have the same offset; this is handled by the Redefines class. An element can be independent; in this case it begins after the end of the lexically preceeding element; in this case the offset is computed from the previous element's offset + size.

The Strategy design pattern allows an element to delegate the offset(), indexedOffset() and size() methods. The Redefines subclass uses the redefines name to look up the offset and size information. The NonRedefines subclass uses the offset and size information currently being computed during the visit loop.

DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10)

class Redefines( object ):
    """Lookup size and offset from the field we rename."""
    def __init__( self, name_=None ):
        self.myName= name_
    def offset( self, offset, aDDE ):
        return aDDE.top.get( self.myName ).offset
    def indexedOffset( self, offset, aDDE ):
        return aDDE.top.get( self.myName ).indexedOffset
    def size( self, aDDE ):
        return 0

class NonRedefines( Redefines ):
    """More typical case is that we have our own size and offset."""
    def offset( self, offset, aDDE ):
        return offset
    def indexedOffset( self, offset, aDDE ):
        return offset + aDDE.occurSize*aDDE.currentIndex
    def size( self, aDDE ):
        return aDDE.size

Used by: cobol_dde.py (1)

DDE Class

The DDE class itself defines a single element (group or elementary) of a record. There are several broad areas of functionality for a DDE: (1) construction, (2) reporting, (3) record scanning.

The class definition includes the attributes determined at parse time, attributes added during decoration time and attributes used during decoration processing.

Note that Group-level vs. item-level can be separate subclasses of DDE. And item-level definition has a picture clause; group level does not. A simple Visitor, then, can accumulate all item-level fields.

DDE Class Hierarchy - defines group and elementary data descriptions elements (11)

class DDE( object ):
    """A Data Description Entry.

    This is either a group-level item, which contains DDE's, or
    it is a lowest-level DDE, defined by a PICTURE clause.
    All higher-level DDE's are effectively string-type data.
    A lowest-level DDE with a numeric PICTURE is numeric-type data.
    Occurs and Redefines can occur at any level.  Almost anything
    can be combined with anything else.

    Each entry is defined by the following attributes
        level       COBOL level number 01 to 49, 66 or 88.
        myName      COBOL variable name
        occurs      the number of occurances (default is 1)
        picture     the exploded picture clause, with ()'s expanded
        initValue   any initial value provided
        offset      offset to this field from start of record
        size        overall size of this item, including all occurances
        occurSize   the size of an individual occurance
        sizeScalePrecision  ( numeric, size, scale (# of P's), precision)
        redefines   an instance of Redefines used to compute the offset
        usage       an instance of Usage used to do data conversions
        contains    the list of contained fields
        parent      the immediate parent DDE
        top         the overall record definition DDE
        currentIndex    the current index values used for locating data
        indexedOffset   the current offset based on current index values

    The primary interface is get(), setIndex(), of() and valOf().
    get('dataname') returns the DDE for the given dataname
    setIndex(x,...) sets the current indexes for the various occurs clauses
    of(record) locates this DDE's bytes within the given record
    valOf(record) locates this DDE's bytes and interprets them as a number
    """
    def __init__( self, level, name_, usage=None, pic=None, occurs=None, redefines=None, ssp=(None,None,None,None), initValue=None ):
        self.level= level
        self.myName= name_
        self.offset= 0
        self.size= 0
        self.occurs= occurs
        self.occurSize= None
        self.picture= pic
        self.sizeScalePrecision= ssp
        self.redefines= redefines
        self.usage= usage
        self.initValue= initValue
        self.contains= []
        self.parent= None
        self.top= None
        self.currentIndex= 0
        self.indexedOffset= None
    def __repr__( self ):
        return "%s %s %s" % ( self.level, self.myName, map(str,self.contains) )
    def __str__( self ):
        oc= ""
        pc= ""
        rc= ""
        if self.occurs > 1: oc= " OCCURS %s" % self.occurs
        if self.picture: pc= " PIC %s USAGE %s" % ( self.picture, self.usage.myName )
        if self.redefines.myName: rc= " REDEFINES %s" % ( self.redefines.myName )
        return "%-2s %-20s%s%s%s." % ( self.level, self.myName, rc, oc, pc )
        → DDE Class Construction methods (12)
        → DDE Class Reporting methods (13)
        → DDE Class Record Scanning methods (14)

Used by: cobol_dde.py (1)

Construction occurs in three general steps: (1) the DDE is created, (2) source attributes are set, (3) the DDE is decorated with size, offset and other details.

DDE Class Construction methods (12)

def append( self, aDDE ):
    """Add a substructure to this DDE.

    This is used by RecordFactory to assemble the DDE."""
    self.contains.append( aDDE )
    aDDE.parent= self
def setTop( self, topDDE ):
    """Set the immediate parentage and top-level record for this DDE.

    Used by RecordFactory to assemble the DDE.
    Required before setSizeAndOffset()."""
    self.top= topDDE
    for f in self.contains:
        f.parent= self
        f.setTop( topDDE )
def setSizeAndOffset( self, offset=0 ):
    """Compute the size and offset for each field of this DDE.

    Used by RecordFactory to assemble the DDE.
    Requires setTop be done first.

    Note: 88-level items inherit attributes from their parent.
    """
    # Wire in a single occurance, it simplifies the math, below.
    if not self.occurs: self.occurs= 1
    # If this is a redefines, get a different offset, otherwise use this offset
    self.offset= self.redefines.offset( offset, self )
    # Set the default indexedOffset
    self.indexedOffset= self.offset
    # PICTURE - elementary item; otherwise group-level item
    if self.picture:
        # Get the correct size based on USAGE
        self.occurSize= self.usage.size(self.picture)
        self.size= self.occurSize * self.occurs
        # Any contained items?  These would be 88-level items.
        for f in self.contains:
            assert '88' == f.level, "Unexpected Level {0!r}".format(f.level)
            f.setSizeAndOffset(self.offset)
    elif self.level == '88':
        self.occurSize= self.parent.occurSize
        self.size = self.parent.size
        self.usage= self.parent.usage
    else:
        # Get the correct size based on each element of the group
        s= 0 # Was self.offset???? Wasn't That Funny?
        for f in self.contains:
            # Element size and offset
            f.setSizeAndOffset(s)
            # non-redefines add to the size; redefines add 0 to the size
            s += f.redefines.size( f )
        self.occurSize= s
        # Multiply by the number of occurances to get the total size
        self.size= self.occurSize * self.occurs

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

The Visitor design pattern requires that each DDE have a method that is used to implement the visitor traversal. The visit() method visits each element. The visitOccurance() method visits each occurance of each element.

DDE Class Reporting methods (13)

def visit( self, visitor ):
    """Visit this DDE and each element."""
    visitor.dde( self )
    if self.contains:
        visitor.enterSub()
        for f in self.contains:
            f.visit( visitor )
        visitor.exitSub()
def visitOccurance( self, visitor ):
    """Visit each occurance of this DDE
    and each occurance of each element."""
    if not self.occurs: return
    for self.currentIndex in range(0,self.occurs):
        self.top.setIndexedOffset(0)    # compute offsets for this new index value
        visitor.dde( self )
        if self.contains:
            visitor.enterSub()
            for f in self.contains:
                f.visitOccurance( visitor )
            visitor.exitSub()

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

The process of scanning a record involves methods to locate a specific field, set the occurance index of a field, and pick bytes of a record input buffer.

DDE Class Record Scanning methods (14)

def pathTo( self ):
    """Return the complete path to this DDE."""
    if self.parent: return self.parent.pathTo() + "." + self.myName
    return self.myName
def get( self, name_ ):
    """Find the named field, and return the substructure.

    If necessary, search down through levels."""
    for c in self.contains:
        if c.myName == name_:
            return c
    for c in self.contains:
        try:
            f= c.get(name_)
            if f: return f
        except UsageError, e:
            pass
    raise UsageError( "Field %s unknown in this record" % name_ )
def setIndex( self, *occurance ):
    """Set the index values for locating specific data bytes."""
    # Handles multi-dimensional short-cut syntax.
    # Work up through parentage to locate occurs clauses and pop off indexes
    if self.occurs > 1:
        if self.occurs < occurance[-1] or occurance[-1] <= 0:
            raise UsageError( "Occurs value %r out of bounds %r" % ( occurance, self ) )
        self.currentIndex= occurance[-1]-1
        #print self.myName, 'occurs', self.occurs, 'index', self.currentIndex+1
        # Recursive call to setIndex for all remaining index values.
        if occurance[:-1]:
            self.parent.setIndex( *occurance[:-1] )
    else:
        #print self.myName, 'search upward',repr(occurance)
        self.parent.setIndex( *occurance )
    # Compute offsets for these new index values
    self.top.setIndexedOffset(0)
    return self
def setIndexedOffset( self, offset=0 ):
    """Given index values, compute the indexed offsets into occurs clauses.

    Used by setIndex to compute indexed offsets."""
    # TODO: may be able to eliminate this if-statement!
    if self.occurSize:
        # Redefines will use an offset from another field, otherwise use the offset provided
        self.indexedOffset= self.redefines.indexedOffset( offset, self )
        s= self.indexedOffset
        for f in self.contains:
            # Update elements within this group
            f.setIndexedOffset( s )
            # Redefines add zero to the size, otherwise increment offset with the size
            s += f.redefines.size( f )
def of( self, aString ):
    """Pick the data bytes out of an input string.

    TODO: May require EBCDIC->ASCII conversion.

    Requires setIndexedOffset() call if indexes were changed without calling setIndex()
    Use valOf to handle packed decimal data (USAGE COMP-3).
    """
    b= self.indexedOffset
    return aString[b:b+self.occurSize]
def valOf( self, aString ):
    """Pick the data bytes out of an input string and interpret as a number."""
    bytes= self.of( aString )
    return self.usage.valueOf( bytes )

Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)

Common Visitors

Two common visitor needs are: (1) visit all elements, producing a listing that is a canonical version of the original source; (2) visit all elements producing additional details (e.g., size, offset, data type). Additionally, when examining actual data values, it is necessary to visit each element displaying the current value of that element. This traversal needs to visit each occurance, also. This depends on the visitOccurance() method of a DDE.

DDE Common Visitors for reporting on a DDE structure (15)

class Source( Visitor ):
    """Display canonical source from copybook parsing."""
    def dde( self, aDDE ):
        print self.indent*'  ', aDDE

class Report( Visitor ):
    """Report on copybook structure."""
    def dde( self, aDDE ):
        numeric,size,scale,precision= aDDE.sizeScalePrecision
        if numeric:
            nSpec= '%d.%d' % ( size, precision )
        else:
            nSpec= ""
        print "%-65s %3d %3d %5s" % (self.indent*'  '+str(aDDE), aDDE.offset, aDDE.size, nSpec)

class Dump( Visitor ):
    """Dump the data values of this structure."""
    def __init__( self, data ):
        Visitor.__init__( self )
        self.data= data
    def dde( self, aDDE ):
        db= aDDE.of(self.data)
        dstr= []
        for c in db:
            dstr.append( "%2s"%hex( ord(c) )[2:] )
        r= " ".join(dstr) # or r=db
        if aDDE.occurs > 1:
            print "%-65s %3d %3d %3d '%s'" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size, aDDE.currentIndex+1, r)
        elif aDDE.picture and aDDE.myName != "FILLER":
            print "%-65s %3d %3d '%s'" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size, r)
        else:
            print "%-65s %3d %3d" % (self.indent*'  '+str(aDDE), aDDE.indexedOffset, aDDE.size)

Used by: cobol_dde.py (1)

Lexical Scanning

The lexical scanner can be subclassed to extend its capability. The default lexical scanner provides a lineClean() function that simply removes comments. This may need to be overridden to remove line numbers (from positions 72-80), module identification (from positions 1-5), and format control directives.

DDE Lexical Scanner base class provides the default lexical scanner implementation (16)

class Lexer( object ):
    """Lexical scanner for COBOL.

    Given a block of text, this scanner will remove comment lines.
    next() will step through the tokens
    unget(token) will back up a token
    """
    def __init__( self, text ):
        """Initialize the scanner by cleaning the text."""
        self.lines= self.lineClean( text )
        self.backup= []
        self.separator= re.compile( r'[.,;]?s' )
        self.quote1= re.compile( r"'[^']*'" )
        self.quote2= re.compile( r'"[^"]*"' )
    def lineClean( self, text ):
        """Default cleaner skips comments."""
        return [ l[6:]+' ' for l in text.split('n') if len(l) > 6 and l[6] not in ('*','/') ]
    def next( self ):
        """Locate the next token in the input stream."""
        if self.backup:
            return self.backup.pop()
        #print "self.lines=", self.lines
        if not self.lines[0]:
            self.lines.pop(0)
        if not self.lines:
            print "EOF"
            return None
        while self.lines and self.lines[0] and self.lines[0][0] in string.whitespace:
            self.lines[0]= self.lines[0].lstrip()
            if not self.lines[0]:
                self.lines.pop(0)
            if not self.lines:
                return None
        if self.lines[0][0] == "'":
            # quoted string, break on balancing quote
            match= self.quote1.match( self.lines[0] )
            space= match.end()
        elif self.lines[0][0] == '"':
            # quoted string, break on balancing quote
            match= self.quote2.match( self.lines[0] )
            space= match.end()
        else:
            match= self.separator.search( self.lines[0] )
            space= match.start()
            if space == 0: # starts with separator
                space= match.end()-1
        token, self.lines[0] = self.lines[0][:space], self.lines[0][space:]
        #print token
        return token
    def unget( self, token ):
        """Push one token back into the input stream."""
        self.backup.append( token )

Used by: cobol_dde.py (1)

Parsing

The RecordFactory class is the parser for record definitions. The parser has three basic sets of methods: (1) clause parsing methods, (2) element parsing methods and (3) Complete record layout parsing.

Parsing a record layout involves parsing a sequence of elements and assembling them into a proper structure. Each element consists of a sequence of individual clauses.

DDE RecordFactory parses a record clause to create a DDE instance (17)

class RecordFactory( object ):
    """Parse a copybook, creating a DDE structure."""
    def __init__( self ):
        self.lex= None
        self.token= None
        self.context= []
        self.noisewords= ("WHEN","IS","TIMES")
        self.keywords= ("BLANK","ZERO","ZEROS","ZEROES",
            "DATE","FORMAT","EXTERNAL","GLOBAL",
            "JUST","JUSTIFIED","LEFT","RIGHT"
            "OCCURS",
            "PIC","PICTURE",
            "REDEFINES","RENAMES",
            "SIGN","LEADING","TRAILING","SEPARATE","CHARACTER",
            "SYNCH","SYNCHRONIZED",
            "USAGE","DISPLAY","COMP-3",
            "VALUE",".")
        → DDE Picture Clause Parsing (18)
        → DDE Blank When Zero Clause Parsing (19)
        → DDE Justified Clause Parsing (20)
        → DDE Occurs Clause Parsing (21)
        → DDE Redefines Clause Parsing (22)
        → DDE Renames Clause Parsing (23)
        → DDE Sign Clause Parsing (24)
        → DDE Synchronized Clause Parsing (25)
        → DDE Usage Clause Parsing (26)
        → DDE Value Clause Parsing (27)

        → DDE Element Parsing (28)
        → DDE Record Parsing (29)

Used by: cobol_dde.py (1)

DDE Picture Clause Parsing (18)

def picParse( self, pic ):
    """Rewrite a picture clause to eliminate ()'s, S's, V's, P's, etc.

    Returns expanded, normalized picture and (type,length,scale,precision,signed) information."""
    out= []
    scale, precision, signed, decimal = 0, 0, False, None
    while pic:
        c= pic[:1]
        if c in ('A','B','X','Z','9','0','/',',','+','-','*','$'):
            out.append( c )
            if decimal: precision += 1
            pic= pic[1:]
        elif pic[:2] in ('DB','CR'):
            out.append( pic[:2] )
            pic= pic[2:]
        elif c == '(':
            irpt= 0
            pic= pic[1:]
            # A regular expression may be quicker and simpler!
            try:
                while pic and pic[:1].isdigit():
                    irpt = 10*irpt+int( pic[:1] )
                    pic= pic[1:]
            except ValueError, t:
                raise SyntaxError( "picture error in %r"%pic )
            out.append( (irpt-1)*out[-1] )
            assert pic[0] == ')', SyntaxError( "picture error in %r"%pic )
            pic= pic[1:]
        elif c == 'S':
            # silently drop an "S".
            # Note that 'S' plus a SIGN SEPARATE option increases the size of the picture!
            signed= True
            pic= pic[1:]
        elif c  == 'P':
            # silently drop a "P", since it just sets scale and isn't represented.
            scale += 1
            pic= pic[1:]
        elif c  == "V":
            decimal= "V"
            pic= pic[1:]
        elif c  == ".":
            decimal= "."
            out.append( "." )
            pic= pic[1:]
        else:
            raise SyntaxError( "picture error in %s"%pic )

    final= "".join( out )
    alpha= ('A' in final) or ('X' in final) or ('/' in final)
    #print pic, final, alpha, scale, precision
    # Note: Actual size depends on len(final) and usage!
    return dict(
        final=final, alpha=alpha, numeric=not alpha,
        length=len(final), scale=scale,
        precision= precision, signed=signed,
        decimal=decimal)
def picture( self ):
    """Parse a PICTURE clause."""
    if self.token == "IS":
        self.token= self.lex.next()
    pic= self.lex.next()
    self.token= self.lex.next()
    return self.picParse(pic)

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Blank When Zero Clause Parsing (19)

def blankWhenZero( self ):
    """Gracefully skip over a BLANK WHEN ZERO clause."""
    self.token= self.lex.next()
    if self.token == "WHEN":
        self.token= self.lex.next()
    if self.token in ("ZERO","ZEROES","ZEROS"):
        self.token= self.lex.next()

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Justified Clause Parsing (20)

def justified( self ):
    """Gracefully skip over a JUSTIFIED clause."""
    self.token= self.lex.next()
    if self.token == "RIGHT":
        self.token= self.lex.next()

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Occurs Clause Parsing (21)

def occurs( self ):
    """Parse an OCCURS clause."""
    occurs= self.lex.next()
    if occurs == "TO":
        # format 2 - occurs depending on with assumed 1 for the lower limit
        # TODO - parse the Occurs Depending On clause
        raise UnsupportedError( "Occurs depending on" )
    self.token= self.lex.next()
    if self.token == "TO":
        # format 2 - occurs depending on
        # TODO - parse the Occurs Depending On clause
        raise UnsupportedError( "Occurs depending on" )
    else:
        # format 1 - fixed-length
        if self.token == "TIMES":
            self.token= self.lex.next()
        if self.token in ("ASCENDING","DESCENDING"):
            self.token= self.lex.next()
        if self.token == "KEY":
            self.token= self.lex.next()
        if self.token == "IS":
            self.token= self.lex.next()
        # get key data names
        while self.token not in self.keywords:
            self.token= self.lex.next()
        if self.token == "INDEXED":
            self.token= self.lex.next()
        if self.token == "BY":
            self.token= self.lex.next()
        # get indexed data names
        while self.token not in self.keywords:
            self.token= self.lex.next()
        return int(occurs)

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Redefines Clause Parsing (22)

def redefines( self ):
    """Parse a REDEFINES clause."""
    redef= self.lex.next()
    self.token= self.lex.next()
    return Redefines(redef)

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Renames Clause Parsing (23)

def renames( self ):
    """Raise an exception on a RENAMES clause."""
    ren1= self.lex.next()
    self.token= self.lex.next()
    if self.token in ("THRU","THROUGH"):
        ren2= self.lext.next()
        self.token= self.lex.next()
    raise UnsupportedError( "Renames clause" )

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

There are two variations on the SIGN clause syntax.

DDE Sign Clause Parsing (24)

def sign1( self ):
    """Raise an exception on a SIGN clause."""
    self.token= self.lex.next()
    if self.token == "IS":
        self.token= self.lex.next()
    if self.token in ("LEADING","TRAILING"):
        self.sign2()
    # TODO: this may change the size to add a sign byte
    raise UnsupportedError( "Sign clause" )
def sign2( self ):
    """Raise an exception on a SIGN clause."""
    self.token= self.lex.next()
    if self.token == "SEPARATE":
        self.token= self.lex.next()
    if self.token == "CHARACTER":
        self.token= self.lex.next()
    raise UnsupportedError( "Sign clause" )

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Synchronized Clause Parsing (25)

def synchronized( self ):
    """Raise an exception on a SYNCHRONIZED clause."""
    self.token= self.lex.next()
    if self.token == "LEFT":
        self.token= self.lex.next()
    if self.token == "RIGHT":
        self.token= self.lex.next()
    raise UnsupportedError( "Synchronized clause" )

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

There are two variations on the USAGE clause syntax.

DDE Usage Clause Parsing (26)

def usage( self ):
    """Parse a USAGE clause."""
    self.token= self.lex.next()
    if self.token == "IS":
        self.token= self.lex.next()
    use= self.token
    self.token= self.lex.next()
    return self.usage2( use )
def usage2( self, use ):
    """Create a correct Usage instance based on the USAGE clause."""
    if use == "DISPLAY": return UsageDisplay()
    elif use == "COMPUTATIONAL": return UsageComp()
    elif use == "COMP": return UsageComp()
    elif use == "COMPUTATIONAL-3": return UsageComp3()
    elif use == "COMP-3": return UsageComp3()
    else: raise SyntaxError( "Unknown usage clause %r" % use )

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Value Clause Parsing (27)

def value( self ):
    """Parse a VALUE clause."""
    if self.token == "IS":
        self.token= self.lex.next()
    lit= self.lex.next()
    self.token= self.lex.next()
    return lit

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Element Parsing (28)

def makeDDE( self ):
    """Create a single DDE from an entry of clauses."""
    # Pick off the level
    level= self.token
    # Pick off a name, if present
    name_= self.lex.next()
    if name_ in self.keywords:
        self.lex.unget( name_ )
        name_= "FILLER"
    # Accumulate the relevant clauses, dropping noise words and irrelevant clauses.
    usage= UsageDisplay()
    pic, typeInfo= None, None
    occurs= None
    redefines= NonRedefines()
    self.token= self.lex.next()
    while self.token and self.token != '.':
        if self.token == "BLANK":
            self.blankWhenZero()
        elif self.token in ("EXTERNAL","GLOBAL"):
            self.token= self.lex.next()
        elif self.token in ("JUST","JUSTIFIED"):
            self.justified()
        elif self.token == "OCCURS":
            occurs= self.occurs()
        elif self.token in ("PIC","PICTURE"):
            self.typeInfo= self.picture()
            pic= self.typeInfo['final']
        elif self.token == "REDEFINES":
            redefines= self.redefines()
        elif self.token == "RENAMES":
            self.renames()
        elif self.token == "SIGN":
            self.sign1()
        elif self.token in ("LEADING","TRAILING"):
            self.sign2()
        elif self.token == "SYNCHRONIZED":
            self.synchronized()
        elif self.token == "USAGE":
            usage= self.usage()
        elif self.token == "VALUE":
            self.value()
        else:
            try:
                # Keyword USAGE is optional
                usage= self.usage2( self.token )
                self.token= self.lex.next()
            except SyntaxError, e:
                raise SyntaxError( "%s unrecognized" % self.token )
    # Create the DDE and return it
    # TODO: Add a subclass for elementary items different from group-level items
    if pic:
        usage.setTypeInfo(**self.typeInfo)
        return DDE( level, name_, pic=pic, usage=usage, occurs=occurs, redefines=redefines )
    else:
        return DDE( level, name_, occurs=occurs, redefines=redefines )

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

DDE Record Parsing (29)

def makeRecord( self, lex ):
    """Parse an entire copybook block of text."""
    self.lex= lex
    self.token= self.lex.next()
    # Parse the first DDE and establish the context stack.
    self.context= [ self.makeDDE() ]
    self.token= self.lex.next()
    while self.token:
        # Parse the next DDE
        dde= self.makeDDE()
        #print dde, ":", self.context[-1]
        # If a lower level # or same level #, pop context
        while dde.level <= self.context[-1].level:
            self.context.pop()
        # Make this DDE part of the parent DDE at the top of the context stack
        self.context[-1].append( dde )
        # Push this DDE onto the context stack
        self.context.append( dde )
        # Get the first token of the next DDE or find the end of the file
        self.token= self.lex.next()
    # Decorate the parse tree with parentage and basic size/offset information
    rec= self.context[0]
    rec.setTop( rec )
    rec.setSizeAndOffset(0)
    return rec

Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)

cobol_dde Unit Test

The unit tests are not exhaustive. They test a number of key features, however.

test_dde.py (30)

#!/usr/bin/env python
import unittest
from cobol_dde import *

class DDE_Test( unittest.TestCase ):
    def setUp( self ):
        # Create a Report() visitor to write a report on a structure
        self.rpt= Report()

        # Create a RecordFactory() to create DDE record definitions
        self.rf= RecordFactory()

→ DDE Test copybook 1 with basic features (31)
→ DDE Test copybook 2 with 88-level item (32)
→ DDE Test copybook 3 with nested occurs level (33)
→ DDE Test copybook from page 174 with nested occurs level (34)
→ DDE Test copybook from page 195 with simple redefines (35)
→ DDE Test copybook from page 197 with another redefines (36)
→ DDE Test copybook from page 198, example a (37)
→ DDE Test copybook from page 198, example b (38)

if __name__ == "__main__":
    unittest.main()

DDE Test copybook 1 with basic features (31)

copy1= """
      * COPY1.COB
       01  DETAIL-LINE.
           05                              PIC X(7).
           05  QUESTION                    PIC ZZ.
           05                              PIC X(6).
           05  PRINT-YES                   PIC ZZ.
           05                              PIC X(3).
           05  PRINT-NO                    PIC ZZ.
           05                              PIC X(6).
           05  NOT-SURE                    PIC ZZ.
           05                              PIC X(7).
"""
class Test_Copybook_1( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_1, self ).setUp()
    def test_should_parse( self ):
        dde1 = self.rf.makeRecord( Lexer(copy1) )
        #dde1.visit( self.rpt )
        self.assertEquals( 7, dde1.get( "QUESTION" ).offset )
        self.assertEquals( 2, dde1.get( "QUESTION" ).size )
        self.assertEquals( "ZZ", dde1.get( "QUESTION" ).picture )
        self.assertEquals( "DISPLAY", dde1.get( "QUESTION" ).usage.myName )
        self.assertEquals( 15, dde1.get( "PRINT-YES" ).offset )
        self.assertEquals( 2, dde1.get( "PRINT-YES" ).size )
        self.assertEquals( "ZZ", dde1.get( "PRINT-YES" ).picture )
        self.assertEquals( 20, dde1.get( "PRINT-NO" ).offset )
        self.assertEquals( 2, dde1.get( "PRINT-NO" ).size )
        self.assertEquals( "ZZ", dde1.get( "PRINT-NO" ).picture )
        self.assertEquals( 28, dde1.get( "NOT-SURE" ).offset )
        self.assertEquals( 2, dde1.get( "NOT-SURE" ).size )
        self.assertEquals( "ZZ", dde1.get( "NOT-SURE" ).picture )
        data= "ABCDEFG01HIJKLM02OPQ03RSTUVW04YZabcde"
        #d= Dump( data )
        #dde1.visitOccurance( d )
        self.assertEquals( "01", dde1.get('QUESTION').of(data) )
        self.assertEquals( "02", dde1.get('PRINT-YES').of(data) )
        self.assertEquals( "03", dde1.get('PRINT-NO').of(data) )
        self.assertEquals( "04", dde1.get('NOT-SURE').of(data) )

Used by: test_dde.py (30)

Future Expansion: we need to use the default value provided with an 88-level item to create a boolean function.

DDE Test copybook 2 with 88-level item (32)

copy2= """
      * COPY2.COB
       01  WORK-AREAS.
           05  ARE-THERE-MORE-RECORDS      PIC X(3)    VALUE 'YES'.
               88  NO-MORE-RECORDS                     VALUE 'NO '.
           05  ANSWER-SUB                  PIC 99.
           05  QUESTION-SUB                PIC 99.
"""
class Test_Copybook_2( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_2, self ).setUp()
    def test_should_parse( self ):
        dde2= self.rf.makeRecord( Lexer(copy2) )
        #dde2.visit( self.rpt )
        self.assertEquals( 0, dde2.get("ARE-THERE-MORE-RECORDS").offset )
        self.assertEquals( 3, dde2.get("ARE-THERE-MORE-RECORDS").size )
        self.assertEquals( "XXX", dde2.get("ARE-THERE-MORE-RECORDS").picture )
        self.assertEquals( 0, dde2.get("NO-MORE-RECORDS").offset )
        self.assertEquals( 3, dde2.get("NO-MORE-RECORDS").size )
        self.assertEquals( 3, dde2.get("ANSWER-SUB").offset )
        self.assertEquals( 5, dde2.get("QUESTION-SUB").offset )
        data= "NO 4567"
        d= Dump( data )
        #print dde2.visitOccurance( d )
        self.assertEquals( "NO ", dde2.get("ARE-THERE-MORE-RECORDS").of(data) )
        self.assertEquals( "NO ", dde2.get("NO-MORE-RECORDS").valOf(data) )

Used by: test_dde.py (30)

DDE Test copybook 3 with nested occurs level (33)

copy3= """
      * COPY3.COB
       01  SURVEY-RESPONSES.
           05  QUESTION-NUMBER         OCCURS 10 TIMES.
               10  RESPONSE-CATEGORY     OCCURS 3 TIMES.
                   15  ANSWER                          PIC 99.
"""
class Test_Copybook_3( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_3, self ).setUp()
    def test_should_parse( self ):
        dde3= self.rf.makeRecord( Lexer(copy3) )
        #dde3.visit( self.rpt )
        data = "111213212223313233414243515253616263717273818283919293010203"
        d= Dump(data)
        #dde3.visitOccurance( d )
        self.assertEquals( 12, dde3.get('ANSWER').setIndex(1,2).valOf(data) )
        self.assertEquals( 21, dde3.get('ANSWER').setIndex(2,1).valOf(data) )
        try:
            self.assertEquals( 21, dde3.get('ANSWER').setIndex(1,4).valOf(data) )
            self.fail()
        except UsageError, e:
            pass

Used by: test_dde.py (30)

From IBM COBOL Language Reference Manual, fourth edition: SC26-9046-03.

DDE Test copybook from page 174 with nested occurs level (34)

page174= """
       01 TABLE-RECORD.
          05 EMPLOYEE-TABLE OCCURS 10 TIMES
                ASCENDING KEY IS WAGE-RATE EMPLOYEE-NO
                INDEXED BY A, B.
             10 EMPLOYEE-NAME PIC X(20).
             10 EMPLOYEE-NO PIC 9(6).
             10 WAGE-RATE PIC 9999V99.
             10 WEEK-RECORD OCCURS 52 TIMES
                   ASCENDING KEY IS WEEK-NO INDEXED BY C.
                15 WEEK-NO PIC 99.
                15 AUTHORIZED-ABSENCES PIC 9.
                15 UNAUTHORIZED-ABSENCES PIC 9.
                15 LATE-ARRIVALS PIC 9.
"""
class Test_Copybook_4( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_4, self ).setUp()
    def test_should_parse( self ):
        dde4= self.rf.makeRecord( Lexer(page174) )
        #dde4.visit( self.rpt )
        self.assertEquals( 2920, dde4.size )
        self.assertEquals( 0, dde4.offset )
        self.assertEquals( 10, dde4.get("EMPLOYEE-TABLE" ).occurs )
        self.assertEquals( 52, dde4.get("WEEK-RECORD" ).occurs )
        self.assertEquals( 5, dde4.get("WEEK-RECORD" ).occurSize )
        self.assertEquals( "999999", dde4.get("EMPLOYEE-NO").picture )
        self.assertEquals( 36,
            dde4.get("LATE-ARRIVALS" ).setIndex(1,1).indexedOffset )
        self.assertEquals( 41,
            dde4.get("EMPLOYEE-TABLE").setIndex(1).get("LATE-ARRIVALS" ).setIndex(2).indexedOffset )

Used by: test_dde.py (30)

DDE Test copybook from page 195 with simple redefines (35)

page195= """
       01  REDEFINES-RECORD.
           05  A PICTURE X(6).
           05  B REDEFINES A.
               10  B-1 PICTURE X(2).
               10  B-2 PICTURE 9(4).
           05  C PICTURE 99V99.
"""
class Test_Copybook_5( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_5, self ).setUp()
    def test_should_parse( self ):
        dde5= self.rf.makeRecord( Lexer(page195) )
        #dde5.visit( self.rpt )
        self.assertEquals( 10, dde5.size )
        self.assertEquals( 6, dde5.get("A").size )
        self.assertEquals( 0, dde5.get("A").offset )
        self.assertEquals( 6, dde5.get("B").size )
        self.assertEquals( 0, dde5.get("B").offset )
        self.assertEquals( 2, dde5.get("B-1").size )
        self.assertEquals( 0, dde5.get("B-1").offset )
        self.assertEquals( 4, dde5.get("B-2").size )
        self.assertEquals( 2, dde5.get("B-2").offset )
        self.assertEquals( "9999", dde5.get("B-2").picture )
        self.assertEquals( 4, dde5.get("C").size )
        self.assertEquals( 6, dde5.get("C").offset )
        data = "AB12345678"
        d= Dump(data)
        #dde5.visitOccurance( d )
        self.assertEquals( "AB1234", dde5.get("A").of(data) )
        self.assertEquals( "AB1234", dde5.get("B").of(data) )
        self.assertEquals( "AB", dde5.get("B-1").of(data) )
        self.assertEquals( "1234", dde5.get("B-2").of(data) )
        self.assertEquals( "5678", dde5.get("C").of(data) )

Used by: test_dde.py (30)

DDE Test copybook from page 197 with another redefines (36)

page197= """
       01  REDEFINES-RECORD.
           05 NAME-2.
              10 SALARY PICTURE XXX.
              10 SO-SEC-NO PICTURE X(9).
              10 MONTH PICTURE XX.
           05 NAME-1 REDEFINES NAME-2.
              10 WAGE PICTURE 999V999.
              10 EMP-NO PICTURE X(6).
              10 YEAR PICTURE XX.
"""
class Test_Copybook_6( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_6, self ).setUp()
    def test_should_parse( self ):
        dde6= self.rf.makeRecord( Lexer(page197) )
        #dde6.visit( self.rpt )
        self.assertEquals( 3, dde6.get("SALARY").size )
        self.assertEquals( 0, dde6.get("SALARY").offset )
        self.assertEquals( 9, dde6.get("SO-SEC-NO").size )
        self.assertEquals( 3, dde6.get("SO-SEC-NO").offset )
        self.assertEquals( 2, dde6.get("MONTH").size )
        self.assertEquals( 12, dde6.get("MONTH").offset )
        self.assertEquals( 6, dde6.get("WAGE").size )
        self.assertEquals( 0, dde6.get("WAGE").offset )
        self.assertEquals( "999999", dde6.get("WAGE").picture )
        self.assertEquals( 3, dde6.get("WAGE").usage.precision )
        self.assertEquals( 6, dde6.get("EMP-NO").size )
        self.assertEquals( 6, dde6.get("EMP-NO").offset )
        self.assertEquals( 2, dde6.get("YEAR").size )
        self.assertEquals( 12, dde6.get("YEAR").offset )

        data1= "ABC123456789DE"
        d1= Dump(data1)
        #dde6.visitOccurance( d1 )
        self.assertEquals( "ABC", dde6.get("SALARY").of( data1 ) )
        self.assertEquals( "123456789", dde6.get("SO-SEC-NO").of( data1 ) )
        self.assertEquals( "DE", dde6.get("MONTH").of( data1 ) )

        data2= "123456ABCDEF78"
        d2= Dump(data2)
        #dde6.visitOccurance( d2 )
        self.assertAlmostEquals( 123.456, float(dde6.get("WAGE").valOf( data2 )) )
        self.assertEquals( "ABCDEF", dde6.get("EMP-NO").of( data2 ) )
        self.assertEquals( "78", dde6.get("YEAR").of( data2 ) )

Used by: test_dde.py (30)

DDE Test copybook from page 198, example a (37)

page198A= """
       01  REDEFINES-RECORD.
           05 REGULAR-EMPLOYEE.
              10 LOCATION PICTURE A(8).
              10 GRADE PICTURE X(4).
              10 SEMI-MONTHLY-PAY PICTURE 9999V99.
              10 WEEKLY-PAY REDEFINES SEMI-MONTHLY-PAY
                  PICTURE 999V999.
           05 TEMPORARY-EMPLOYEE REDEFINES REGULAR-EMPLOYEE.
              10 LOCATION PICTURE A(8).
              10 FILLER PICTURE X(6).
              10 HOURLY-PAY PICTURE 99V99.
"""
class Test_Copybook_7( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_7, self ).setUp()
    def test_should_parse( self ):
        dde7= self.rf.makeRecord( Lexer(page198A) )
        #dde7.visit( self.rpt )
        self.assertEquals( 18, dde7.get("REGULAR-EMPLOYEE").size )
        self.assertEquals( 18, dde7.get("TEMPORARY-EMPLOYEE").size )
        self.assertEquals( 6, dde7.get("SEMI-MONTHLY-PAY").size )
        self.assertEquals( 6, dde7.get("WEEKLY-PAY").size )

        data1= "ABCDEFGHijkl123456"
        d1= Dump(data1)
        #dde7.visitOccurance( d1 )
        self.assertEquals( '1234.56', str(dde7.get("SEMI-MONTHLY-PAY").valOf( data1 )) )
        data2= "ABCDEFGHijklmn1234"
        d2= Dump(data2)
        #dde7.visitOccurance( d2 )
        self.assertEquals( '12.34', str(dde7.get("HOURLY-PAY").valOf( data2 ) ) )

Used by: test_dde.py (30)

DDE Test copybook from page 198, example b (38)

page198B= """
       01  REDEFINES-RECORD.
           05 REGULAR-EMPLOYEE.
               10 LOCATION PICTURE A(8).
               10 GRADE PICTURE X(4).
               10 SEMI-MONTHLY-PAY PICTURE 999V999.
           05 TEMPORARY-EMPLOYEE REDEFINES REGULAR-EMPLOYEE.
               10 LOCATION PICTURE A(8).
               10 FILLER PICTURE X(6).
               10 HOURLY-PAY PICTURE 99V99.
               10 CODE-H REDEFINES HOURLY-PAY PICTURE 9999.
"""
class Test_Copybook_8( DDE_Test ):
    def setUp( self ):
        super( Test_Copybook_8, self ).setUp()
    def test_should_parse( self ):
        dde8= self.rf.makeRecord( Lexer(page198B) )
        #dde8.visit( self.rpt )
        self.assertEquals( 18, dde8.get("REGULAR-EMPLOYEE").size )
        self.assertEquals( 18, dde8.get("TEMPORARY-EMPLOYEE").size )
        self.assertEquals( 6, dde8.get("SEMI-MONTHLY-PAY").size )
        self.assertEquals( 4, dde8.get("HOURLY-PAY").size )
        self.assertEquals( 4, dde8.get("CODE-H").size )

        rec1= "ABCDEFGHijkl123456"
        d1= Dump(rec1)
        #dde8.visitOccurance( d1 )
        self.assertAlmostEquals( 123.456,
            float( dde8.get('REGULAR-EMPLOYEE')
                .get('SEMI-MONTHLY-PAY').valOf(rec1) ) )

        rec2= "ABCDEFGHijklmn1234"
        d2= Dump(rec2)
        #dde8.visitOccurance( d2 )
        self.assertEquals( 12.34,
            float( dde8.get('TEMPORARY-EMPLOYEE')
                .get('HOURLY-PAY').valOf(rec2) ) )
        self.assertEquals( 1234,
            dde8.get('TEMPORARY-EMPLOYEE').get('CODE-H').valOf(rec2) )
        self.assertEquals( "REDEFINES-RECORD.TEMPORARY-EMPLOYEE.HOURLY-PAY",
            dde8.get('HOURLY-PAY').pathTo() )

Used by: test_dde.py (30)

data_profile Application

This is an application to handle simple data profiling. It will discover the range of values in particular fields.

It can be modified to profile and document relationships among data elements, also.

Design

TBD.

Implementation

The data_profile application has the following structure.

data_profile.py (39)

→ DProfile Shell Escape (42)
→ DProfile DOC String (40)
→ DProfile CVS Cruft and pyweb generator warning (43)
→ DProfile Imports (41)
→ DProfile Utility Functions (44)
→ DProfile Class Definitions (45)

DOC string

DProfile DOC String (40)

"""data_profiledata_profiledata_profiledata_profile - use a cobol_dde to analyze a file.

Given a DDE instance, and a file, either dump fields of records
or accumulate distinct values of fields of a record.

HexDump
    Display a record similarly to the
    way the TSO users see files using the File-Aid screens in TSO.

FieldValue
NumFieldValue
    Support gathering the actual domain for a field in a data file.

FieldDump
FieldScan
    Examine all FieldValue instances for a particular record layout.
    Either dump each FieldValue or scan to gather domain values.

FileScan
    A standardized class for scanning a file to accumulate frequency
    tables for selected fields using FieldDump or FieldScan instance.


This module includes the following utility functions:

E2A
    Convert EBCDIC characters to ASCII characters.
"""

Used by: data_profile.py (39)

Imports

The data_profile module depends on the cobol_dde module.

DProfile Imports (41)

from cobol_dde import *

Used by: data_profile.py (39)

Other Overheads

DProfile Shell Escape (42)

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

Used by: data_profile.py (39)

DProfile CVS Cruft and pyweb generator warning (43)

__version__ = """$Revision$"""

### DO NOT EDIT THIS FILE!
### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'.
### From source DDE.w modified Sun Mar 14 10:46:18 2010.
### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.

Used by: data_profile.py (39)

Utility Functions

DProfile Utility Functions (44)

import codecs

# Static sequence of ASCII character codes that should be used for each
# EBCDIC character.
# See http://www.natural-innovations.com/boo/asciiebcdic.html
# for the source of this mapping.  Note that unassigned EBCDIC characters
# assigned ASCII 0xA4 (164, §)
# Unicode Technical Report 16 has a reversible mapping, but it doesn't
# seem to handle some EBCDIC characters correctly, notably ¢ and ¬.
EBCDIC2ASCII= map( chr, [
    0x00,0x01,0x02,0x03,0xA3,0x09,0x97,0x7F,0xA4,0xA4,0x01,0x0B,0x0C,0x0D,0x0E,0x0F,
    0x10,0x11,0x12,0x16,0xAE,0x15,0x08,0x2D,0x18,0x19,0xA9,0xA9,0x2D,0x2D,0x2D,0x2D,
    0xD0,0x01,0x21,0xA4,0xA6,0x0A,0x17,0x1B,0xA4,0xA4,0x3B,0xA9,0xA4,0x05,0x06,0x07,
    0xA4,0xA4,0x16,0xA4,0xA3,0xBA,0x1F,0x04,0xA4,0xA4,0xA4,0xA9,0x14,0x15,0xA4,0x1A,
    0x20,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA2,0x2E,0x3C,0x28,0x2B,0x2E,
    0x26,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x21,0x24,0x2A,0x29,0x3B,0xAC,
    0x2D,0x2F,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x2C,0x25,0x5F,0x9B,0x3F,
    0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x3A,0x23,0x40,0x27,0x3D,0x22,
    0xA4,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0x6A,0x6B,0x6C,0x6D,0x6E,0x6F,0x70,0x71,0x72,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0xA4,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7A,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x60,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,0x50,0x51,0x52,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0xA4,0xA4,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,
    0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4
    ] )

def E2A_str( string ):
    """Return the ASCII version of this EBCDIC string."""
    #     r= StringIO.StringIO()
    #     for c in string:
    #         r.write( EBCDIC2ASCII[ ord(c) ] )
    #     s= r.getvalue()
    #     r.close()
    chars= [ EBCDIC2ASCII[ ord(c) ] for c in string ]
    return "".join( chars )

def E2A( string ):
    """Return UNICODE version of this EBCDIC string."""
    chars, used= codecs.getdecoder('cp037')(string)
    assert used == len(string)
    return chars

Used by: data_profile.py (39)

Base Class Definitions

DProfile Class Definitions (45)

→ DProfile Hex Dump Class to do raw dump of a record (46)
→ DProfile Field Value Class Hierarchy to accumulate distinct values (47)
→ DProfile Field and Record Scanning does either dumps or disctinct value processing (48)

Used by: data_profile.py (39)

HexDump Class

DProfile Hex Dump Class to do raw dump of a record (46)

# A handy hex dump printer class

class HexDump:
    """Create a Hex Dump object that can dump records from a file."""
    def __init__( self, aFile=None, rowSize=64 ):
        self.theFile= None
        if aFile:
            self.theFile= file(aFile,"rb")
        self.rows= 0
        self.hex= '0123456789ABCDEF'
        self.rowSize= rowSize
        self.positions= "".join([ ("----+----%d"%(i+1))[:10] for i in range(self.rowSize/10) ]) + "----+-----"[:self.rowSize%10]
    def hexPrint( self, row, data ):
        """Print a row of data in two-line hex format."""
        cha= []
        top= []
        bot= []
        for c in data:
            if c in ('n','r','f','t','x00'): cha.append('.')
            else: cha.append( c )
            top.append( self.hex[ ord(c)/16 ] )
            bot.append( self.hex[ ord(c)%16 ] )
        print '%3d|' % (row*self.rowSize+1), "".join( cha )
        print "   |", "".join(top)
        print "   |", "".join(bot)
    def dump( self, bytes=64 ):
        """Dump a record of a given length."""
        self.rows += 1
        data= self.theFile.read(bytes)
        if not data: return None
        print "record %d (%d bytes)" % (self.rows, len(data))
        print "   |",self.positions
        rows= len(data)/self.rowSize
        for i in range(rows):
            self.hexPrint( i, data[i*self.rowSize:(i+1)*self.rowSize] )
        self.hexPrint( rows, data[rows*self.rowSize:] )
        print
        return self
    def dumpAll( self, bytes=64 ):
        """Dump all records in the file."""
        while self.dump(bytes):
            pass

Used by: DProfile Class Definitions (45); data_profile.py (39)

FieldValue Class Hierarchy

DProfile Field Value Class Hierarchy to accumulate distinct values (47)

# Two handy classes for examining individual fields

class FieldValue:
    """Accumulate unique values for a named field of a DDE.

    This will have to be subclassed for indexes of occurs clauses.
    """
    def __init__( self, dde, cobolName ):
        """Given a DDE and a COBOL name, set up a field extractor and frequency mapping."""
        self.cobolName= cobolName
        self.usage = dde.get(cobolName).usage
        self.get_field= dde.get(cobolName)
        self.domain= {}
    def getFrom( self, data ):
        """Get the value from the field, then accumulate in the frequency mapping."""
        v= self.get_field.of( data )
        self.domain[v]= self.domain.setdefault(v,0) + 1
    def fqTable( self ):
        """Return a sequence of tuples with value and frequency count, sorted."""
        val_count= self.domain.items()
        # Sort descending by second field (count), ascending by first field (value)
        val_count.sort( lambda a,b: cmp(b[1],a[1]) or cmp(a[0],b[0]) )
        return val_count

class NumFieldValue( FieldValue ):
    """Accumulate unique values for a named field of a DDE that is numeric.

    This will have to be subclassed for indexes of occurs clauses.
    """
    def fqTable( self ):
        """Return a sequence of tuples with value and frequency count, sorted."""
        val_count= [ (self.usage.valueOf(v),c) for v,c in self.domain.items() ]
        # Sort descending by second field (count), ascending by first field (value)
        val_count.sort( lambda a,b: cmp(b[1],a[1]) or cmp(a[0],b[0]) )
        return val_count

Used by: DProfile Class Definitions (45); data_profile.py (39)

Field and Record Scanning

FieldScan accumulates distinct values in a list of fields. FieldDump dumps each individual field. A FileScan uses either a FieldScan or a FieldDump to accumulate or dump fields.

DProfile Field and Record Scanning does either dumps or disctinct value processing (48)

# Handy classes for examining all fields of all records of a file.

class FieldScan:
    def __init__( self, aFieldList ):
        self.fieldList= aFieldList
    def process( self, recno, data ):
        for f in self.fieldList:
            f.getFrom( data )
    def final( self, records ):
        print "n%d Records" % ( records )
        for f in self.fieldList:
            print "n%-10s %7s" % ( f.cobolName, "count" )
            for di,c in f.fqTable():
                print "%-10s %7d" % ( di,c )

class FieldDump( FieldScan ):
    def process( self, recno, data ):
        print "nRecord %d" % (recno)
        for f in self.fieldList:
            v= f.get_field.of( data )
            print " ", f.cobolName, f.usage.valueOf( v )
    def final( self, records ):
        pass

class FileScan:
    """Basic file scanning operation."""
    def __init__( self, aDDE, aFieldProcess, aFileName ):
        self.dde= aDDE
        self.fieldProcess= aFieldProcess
        self.theFile= file( aFileName, "rb" )
        self.record= 0
    def reclen( self ):
        return self.dde.size
    def process( self, end=-1 ):
        data= self.theFile.read( self.reclen() )
        while data:
            self.record += 1
            self.fieldProcess.process( self.record, data )
            if self.record == end: break
            data= self.theFile.read( self.reclen() )
        self.theFile.close()
        self.fieldProcess.final(self.record)

Used by: DProfile Class Definitions (45); data_profile.py (39)

Data Profiling Unit Test

test_data_profile.py (49)

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import unittest
from cobol_dde import *
from data_profile import *
import collections

→ DProfile Test 1 (50)
→ DProfile Test 2 (51)

if __name__ == "__main__":
    unittest.main()

DProfile Test 1 (50)

copy1= """
      * COPY1.COB
       01  DETAIL-LINE.
           05                              PIC X(7).
           05  QUESTION                    PIC ZZ.
           05                              PIC X(6).
           05  PRINT-YES                   PIC ZZ.
           05                              PIC X(3).
           05  PRINT-NO                    PIC ZZ.
           05                              PIC X(6).
           05  NOT-SURE                    PIC ZZ.
           05                              PIC X(7).
"""
dataset1= (
"ABCDEFG11HIJKLM12NOP13QRSTUV14WXYZabcn",
"ABCDEFG22HIJKLM22NOP23QRSTUV24WXYZabcn",
"ABCDEFG11HIJKLM12NOP33QRSTUV34WXYZabcn",
"ABCDEFG44HIJKLMX2NOP13QRSTUV44WXYZabcn",
"ABCDEFG11HIJKLMX2NOP23QRSTUV54WXYZabcn" )
class Test_DProfile_1( unittest.TestCase ):
    def setUp( self ):
        # Create a Report() visitor to write a report on a structure
        rpt= Report()

        # Create a RecordFactory() to create DDE record definitions
        rf= RecordFactory()

        # copy1= open("copy1.cob","r").read()
        self.dde1= rf.makeRecord( Lexer(copy1) )
        #self.dde1.visit( rpt )

    def test_should_dump( self ):
        # dataset1= open("dataset.dat","r").readlines()
        question_domain= collections.defaultdict( int )
        yes_domain= collections.defaultdict( set )
        for record in dataset1:
            question= self.dde1.get('QUESTION').valOf(record)
            yes= self.dde1.get('PRINT-YES').of(record)
            question_domain[question] += 1
            yes_domain[yes].add( yes )
            #print record.rstrip()
            #print question,yes,no,notsure
        self.assertEquals( 3, len(question_domain ) )
        self.assertEquals( 3, question_domain[11] )
        self.assertEquals( 1, question_domain[22] )
        self.assertEquals( 1, question_domain[44] )
        self.assertEquals( set(['12', '22', 'X2']), set(yes_domain) )

Used by: test_data_profile.py (49)

DProfile Test 2 (51)

class Test_DProfile_2( unittest.TestCase ):
    def setUp( self ):
        self.dataset2="".join( map( chr,
            [ 0x81, 0x82, 0x83, 0x84, 0x85, 0x86,
            0x87, 0x88, 0x89, 0x5f, 0x4a ] ) )
    def test_should_convert( self ):
        self.assertEquals( u"abcdefghixacxa2", E2A(self.dataset2) )
        self.assertEquals( "abcdefghixacxa2",  E2A_str(self.dataset2) )

Used by: test_data_profile.py (49)

TODO

Rewrite hexPrint so we can perform the following test:

def test_should_format_dump( self ):
    print "EBCDIC Data"
    HexDump().hexPrint( 0, self.dataset2 )
    print "ASCII Conversion"
    HexDump().hexPrint( 0, E2A(self.dataset2) )

Complete Test Suite

A combined test suite.

test.py (52)

#!/usr/bin/env python
from __future__ import print_function
"""Combined tests."""
import unittest
import test_dde
import test_data_profile
import logging

def suite():
    s= unittest.TestSuite()
    for m in ( test_dde, test_data_profile ):
        s.addTests( unittest.defaultTestLoader.loadTestsFromModule( m ) )
    return s


if __name__ == "__main__":
    import sys
    logging.basicConfig( stream=sys.stdout, level=logging.CRITICAL )
    tr= unittest.TextTestRunner()
    result= tr.run( suite() )
    logging.shutdown()
    sys.exit( len(result.failures) + len(result.errors) )

Demonstration Main Programs

Design

Five Demos.

Implementation

demo1.py (53)

→ Demo Shell Escape (56)
→ Demo DOC String (54)
→ Demo CVS Cruft and pyweb generator warning (57)
→ Demo Imports (55)
→ Demo Subclass Definitions (58)
→ Demo 1 - complete, detailed examination of a file (59)
→ Demo 2 - low-level hex dump of a file (60)
→ Demo 3 - detailed, field-by-field occurance dump of a record (61)
→ Demo 4 - detailed, field-by-field scan of distinct values of a record (62)
→ Demo 5 - detailed, field-by-field occurance dump of a record (63)
→ Demo Main (64)

Overheads

DOC string

Demo DOC String (54)

"""Examine a sample COBOL file.

This requires that files be transferred in strictly BINARY mode from
the mainframe.  Any ASCII to EBCDIC conversion is a bad thing.

Performance: 15,000 field values per second.

There are five demos:
demo1 collects the ranges of data values from a file
demo2 does low-level hex dumps of records in a file
demo3 does detailed structure dumps of records in a file
demo4 shows the FieldScan and FileScan classes to examine a file (similar to demo1)
demo5 shwos the FieldDump and FileScan classes to examine selected records (similar to demo3)
"""

Used by: demo1.py (53)

Imports

This demo application uses the cobol_dde module to parse a record layout and the data_profile module to analyze data in a file defined by the record layout.

Demo Imports (55)

import os, time
import cobol_dde, data_profile

Used by: demo1.py (53)

Other Overheads

Demo Shell Escape (56)

#!/usr/bin/env python

Used by: demo1.py (53)

Demo CVS Cruft and pyweb generator warning (57)

__version__ = """$Revision$"""

### DO NOT EDIT THIS FILE!
### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'.
### From source DDE.w modified Sun Mar 14 10:46:18 2010.
### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.

Used by: demo1.py (53)

Demo Subclasses

Demo Subclass Definitions (58)

# Extend the cobol_dde module's ``Lexer`` class to override how the lines are cleaned prior to parsing.
class CleanupLexer( cobol_dde.Lexer ):
    """Cleanup as part of Lexing: drop ID from 0:6 and sequence from [72:].
       Also drop "SKIP" commands."""
    def lineClean( self, text ):
        lines= [ l[6:72].rstrip()+' ' for l in text.split('n') if len(l) > 6 and l[6] not in ('*','/') ]
        lines= [ l for l in lines if l.strip() != 'SKIP1' ]
        return lines

# Extend the FieldDump class to dump OVRMCUDB record instances.
class OVRMCUDBdump( data_profile.FieldDump ):
    """Dump OVRMCUDB record instances."""
    def __init__( self, dde ):
        FieldDump.__init__( self, None )
        self.dde= dde
    def process( self, recno, data ):
        """Use the value-length field to decode records of OVRMCUDB file."""
        print "nRecord %d:" % recno
        print ' CUST-NO', self.dde.get('MCUDBI-CUST-NO').valOf( data )
        print ' DATA-ITEM', self.dde.get('MCUDBI-DATA-ITEM').valOf( data )
        print ' YR', self.dde.get('MCUDBI-YR').valOf( data )
        print ' VALUE-LENGTH', self.dde.get('MCUDBI-VALUE-LENGTH').valOf( data )
        value_length= self.dde.get('MCUDBI-VALUE-LENGTH').valOf( data )
        if value_length == 1:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-1').setIndex(i).valOf(data)
        elif value_length == 2:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-2').setIndex(i).valOf(data)
        elif value_length == 3:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-3').setIndex(i).valOf(data)
        elif value_length == 4:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-4').setIndex(i).valOf(data)
        elif value_length == 5:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-5').setIndex(i).valOf(data)
        elif value_length == 6:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-6').setIndex(i).valOf(data)
        elif value_length == 7:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-7').setIndex(i).valOf(data)
        elif value_length == 8:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-8').setIndex(i).valOf(data)
        elif value_length == 8:
            for i in range(1,13):
                print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-9').setIndex(i).valOf(data)
        else:
            print "Invalid record"
            cobol_dde.HexDump().hexPrint(recno,data)

# Extend the FileScan class to handle a damaged OVRMCUDB file where record 1 is damaged.
class OVRMCUDBfile( data_profile.FileScan ):
    """Special-purpose FileScan to handle damaged record in OVRMCUDB."""
    def reclen( self ):
        if self.record == 0: return 92
        return 97

Used by: demo1.py (53)

Demo Functions

The first demo function shows a relatively complex program to dump one record and summarize other records.

  1. Print a simple heading
  2. Create a RecordFactory, rf. Create an CleanupLexer to process the definition file.
  3. Pass the CleanupLexer to rf to create a DDE, called dde.
  4. Create a Report, rpt. Pass this Visitor to dde to write a report on the parsed record structure.

Create a list, fieldList with instance of NumFieldValue.

Get the record size from dde. Open a file, assigning it to theFile.

Dump record number 1. In this case, the record is damaged, and is only 92 bytes in length. The first record is read into data. An instance of the OCRMCUDBdump class, dump1Rec is created from the record definition in dde.

The dump1Rec process method produces a dump of the record in data.

Dump the remaining records. The dump proceeds as follows:

Read a 97-byte record into data. While there is content in data, process each record. Increment the record counter, recno. Set f to each field in fieldList, perform the fields getFrom() method to extract the appropriate bytes from data. If the requested number of records have been processed, break from the loop. Read a 97-byte record into data.

Produce a final report from each NumFieldValue instance in fieldList. Set f to each field in fieldList; print an appropriate heading. The fqTable method of f returns a frequency table; set di, and c to the key and count from the frequency table; print di and c.

Demo 1 - complete, detailed examination of a file (59)

def demo1( aDef, aFileName, end=10 ):
    """Complete examination of a file."""

    # Heading
    print "File:     %snCopybook: %s" % ( aDef, aFileName )

    # Create a RecordFactory() to parse copy books and create DDE record definitions
    rf= cobol_dde.RecordFactory()
    dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

    # Use the Report() visitor to produce a report on the record structure
    print "nRecord Layout"
    rpt= cobol_dde.Report()
    dde.visit( rpt )

    # Identify the fields to be examined
    fieldList= [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ),
        data_profile.NumFieldValue( dde, 'MCUDBI-YR' ),
        data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' )
    ]

    # Get the record size, open the input file, read and dump the records
    reclen= dde.size
    theFile= file(aFileName,"rb")

    # Dump record 1 (92 bytes)
    recno= 1
    data= theFile.read(92)
    dump1Rec= OVRMCUDBdump(dde)
    dump1Rec.process( recno, data )

    # Scan all remaining records (97 bytes each)
    data= theFile.read(97)
    while data:
        recno += 1
        for f in fieldList:
            f.getFrom( data )
        if recno == end: break
        data= theFile.read(97)

    # Final report
    for f in fieldList:
        print "n%-10s %7s" % ( f.cobolName, "count" )
        for di,c in f.fqTable():
            print "%-10s %7d" % ( di,c )

    theFile.close()

Used by: demo1.py (53)

The second demo function shows a simple program to produce a hex dump of the first hundred records.

Set h to a HexDump instance defined on aFileName. Use h dump method to dump a 92-byte record. Set i to 100 values (from 0 to 99); Use h dump method to dump a sequence of 97-byte records.

Demo 2 - low-level hex dump of a file (60)

def demo2( aDef, aFileName ):
    """Low-level hex dump of a file."""
    print "nDump of 100 records"
    h= data_profile.HexDump( aFileName, 80 )
    h.dump(92)
    for i in range(100):
        h.dump(97)

Used by: demo1.py (53)

The third demo function shows a simple program to produce a detailed field-by-field dump of one record.

  1. Print a simple heading
  2. Create a RecordFactory, rf. Create an CleanupLexer to process the definition file.
  3. Pass the CleanupLexer to rf to create a DDE, called dde.
  4. Get the record size from dde. Open a file, assigning it to theFile.
  5. Dump record number 1. In this case, the record is damaged, and is only 92 bytes in length. The first record is read into data. An instance of the Dump Visitor subclass, fd is created from the record data.

The dde visitOccurance method produces a dump of each occurance of each field in the record in fd. Dump record number 2: set data to the next 97 bytes; set fd to a Dump subclass of Visitor; use dde visitOccurance to dump the record.

Demo 3 - detailed, field-by-field occurance dump of a record (61)

def demo3( aDef, aFileName ):
    """Detailed field-by-field occurance dump of a record."""
    # Heading
    print "File:     %snCopybook: %s" % ( aDef, aFileName )

    # Create a RecordFactory() to parse copy books and create DDE record definitions
    rf= cobol_dde.RecordFactory()
    dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

    # Get the record size, open the input file, read and dump the first few records
    reclen= dde.size
    theFile= file(aFileName,"rb")

    # Dump record 1 (92 bytes)
    recno= 1
    data= theFile.read(92)

    # Detailed occurance-by-occurance dump of the most recent record
    print "nField Dump"
    fd= cobol_dde.Dump( data )
    dde.visitOccurance( fd )

    # Dump record 2 (97 bytes)
    data= theFile.read(97)
    fd= cobol_dde.Dump( data )
    dde.visitOccurance( fd )

Used by: demo1.py (53)

The fourth demo function shows a simple program to scan selected fields of a file.

Print a simple heading Create a RecordFactory, rf. Create an CleanupLexer to process the definition file. Pass the CleanupLexer to rf to create a DDE, called dde. Define fieldList as an instance of FieldScan built from three instances of NumFieldValue. Define fs as an instance of OVRMCUDBfile, that uses fieldList to scan the file. Use fs process to examine each row, using fieldList to examine selected fields.

Demo 4 - detailed, field-by-field scan of distinct values of a record (62)

def demo4( aDef, aFileName, end=10 ):
    """Detailed field-by-field scan of a record."""
    # Heading
    print "File:     %snCopybook: %s" % ( aDef, aFileName )

    # Create a RecordFactory() to parse copy books and create DDE record definitions
    rf= cobol_dde.RecordFactory()
    dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

    # Create a FieldScan for the three fields we care about
    fieldList= data_profile.FieldScan( [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ),
        data_profile.NumFieldValue( dde, 'MCUDBI-YR' ),
        data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' )
    ] )

    # Create a FileScan for the file, using the given FieldScan list of fields
    fs= OVRMCUDBfile( dde, fieldList, aFileName )

    # Process through the given ending record
    fs.process( end )

Used by: demo1.py (53)

The fifth demo function shows a simple program to dump selected records of a file.

Print a simple heading Create a RecordFactory, rf. Create an CleanupLexer to process the definition file. Pass the CleanupLexer to rf to create a DDE, called dde. Define fieldList as an instance of OVRMCUDBdump, based on dde. Define fs as an instance of FileScan, that uses fieldList to scan the file. Use fs process to examine 4 rows, using fieldList to dump all fields.

Demo 5 - detailed, field-by-field occurance dump of a record (63)

def demo5( aDef, aFileName ):
    """Detailed field-by-field occurance dump of a record."""
    # Heading
    print "File:     %snCopybook: %s" % ( aDef, aFileName )

    # Create a RecordFactory() to parse copy books and create DDE record definitions
    rf= cobol_dde.RecordFactory()
    dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) )

    # Create a special FieldDump that can separate the variant record formats
    # for the OVRMCUDB file.
    fieldList= OVRMCUDBdump(dde)

    # Create a FileScan for the file, using the given FieldDump list of fields
    fs= FileScan( dde, fieldList, aFileName )

    # Process through the given ending record
    fs.process( 4 )

Used by: demo1.py (53)

Demo Main

The demo main calls a selected function on a selected file and copybook. This can be replaced with a function that uses getopts to parse the command-line arguments.

Demo Main (64)

if __name__ == "__main__":
    start= time.clock()
    copyBook= r"J:Appl-DevFinanceMISMCUDBIW.TXT"
    dataFile= r"J:Appl-DevFinanceMISOVRMCUDB_bin.txt"
    #demo1( copyBook, dataFile, 100 )
    demo4( copyBook, dataFile, 100000 )
    print "Run Time: %7.4f" % (time.clock()-start)

Used by: demo1.py (53)

Packaging

The following additional elements are part of a complete package.

README (65)

##############################################
COBOL DDE (Data Definition Element) Processing
##############################################

This modules parses COBOL copybooks (DDE's)
to help write ETL and Data Profiling applications.

Installation
------------

Install with the following command::

    python setup.py install

Usage
-----

See demo1.py for demonstration applications built with these
tools.

Documentation
-------------

See `dde.html <dde.html>`_ for the detailed documentation of this application.

Build
-----

The source and documentation is built via the pyWeb tool from ``DDE.w``.

For information on pyWeb, see http://sourceforge.net/projects/pywebtool/.

setup.py (66)

#!/usr/bin/env python

from distutils.core import setup

setup(name="DDE",
      version="1.2",
      description="COBOL Data Definition Element Processing",
      author="Steven F. Lott",
      author_email="s_lott@yahoo.com",
      url="https://sourceforge.net/projects/cobol-dde/",
      py_modules=['cobol_dde', 'data_discovery']
     )

MANIFEST.in (67)

include *.w *.html *.css *.py *.tex *.pdf

Indices

Files

MANIFEST.in:→ (67)
README:→ (65)
cobol_dde.py:→ (1)
data_profile.py:
 → (39)
demo1.py:→ (53)
setup.py:→ (66)
test.py:→ (52)
test_data_profile.py:
 → (49)
test_dde.py:→ (30)

Macros

DDE Blank When Zero Clause Parsing:
 → (19)
DDE Class Construction methods:
 → (12)
DDE Class Hierarchy - defines group and elementary data descriptions elements:
 → (11)
DDE Class Record Scanning methods:
 → (14)
DDE Class Reporting methods:
 → (13)
DDE Common Visitors for reporting on a DDE structure:
 → (15)
DDE Element Parsing:
 → (28)
DDE Exception Definitions:
 → (7)
DDE Justified Clause Parsing:
 → (20)
DDE Lexical Scanner base class provides the default lexical scanner implementation:
 → (16)
DDE Occurs Clause Parsing:
 → (21)
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft:
 → (2) → (3) → (4) → (5) → (6)
DDE Picture Clause Parsing:
 → (18)
DDE Record Parsing:
 → (29)
DDE RecordFactory parses a record clause to create a DDE instance:
 → (17)
DDE Redefines Clause Parsing:
 → (22)
DDE Redefines Strategy class hierarchy - to define offsets to DDE elements:
 → (10)
DDE Renames Clause Parsing:
 → (23)
DDE Sign Clause Parsing:
 → (24)
DDE Synchronized Clause Parsing:
 → (25)
DDE Test copybook 1 with basic features:
 → (31)
DDE Test copybook 2 with 88-level item:
 → (32)
DDE Test copybook 3 with nested occurs level:
 → (33)
DDE Test copybook from page 174 with nested occurs level:
 → (34)
DDE Test copybook from page 195 with simple redefines:
 → (35)
DDE Test copybook from page 197 with another redefines:
 → (36)
DDE Test copybook from page 198, example a:
 → (37)
DDE Test copybook from page 198, example b:
 → (38)
DDE Usage Clause Parsing:
 → (26)
DDE Usage Strategy class hierarchy - to extract data from input buffers:
 → (9)
DDE Value Clause Parsing:
 → (27)
DDE Visitor base class - to analyze a complete DDE tree structure:
 → (8)
DProfile CVS Cruft and pyweb generator warning:
 → (43)
DProfile Class Definitions:
 → (45)
DProfile DOC String:
 → (40)
DProfile Field Value Class Hierarchy to accumulate distinct values:
 → (47)
DProfile Field and Record Scanning does either dumps or disctinct value processing:
 → (48)
DProfile Hex Dump Class to do raw dump of a record:
 → (46)
DProfile Imports:
 → (41)
DProfile Shell Escape:
 → (42)
DProfile Test 1:
 → (50)
DProfile Test 2:
 → (51)
DProfile Utility Functions:
 → (44)
Demo 1 - complete, detailed examination of a file:
 → (59)
Demo 2 - low-level hex dump of a file:
 → (60)
Demo 3 - detailed, field-by-field occurance dump of a record:
 → (61)
Demo 4 - detailed, field-by-field scan of distinct values of a record:
 → (62)
Demo 5 - detailed, field-by-field occurance dump of a record:
 → (63)
Demo CVS Cruft and pyweb generator warning:
 → (57)
Demo DOC String:
 → (54)
Demo Imports:→ (55)
Demo Main:→ (64)
Demo Shell Escape:
 → (56)
Demo Subclass Definitions:
 → (58)

User Identifiers

DDE:1 3 6 7 8 9 [11] 12 13 14 17 28 29 30 40 43 47 50 57 59 61 62 63 65 66
Dump:3 [15] 31 32 33 35 36 37 38 46 58 59 61
E2A:40 [44] 51
EBCDIC2ASCII:[44]
FieldDump:40 [48] 54 58 63
FieldScan:40 [48] 54 62
FieldValue:40 [47]
FileScan:40 [48] 54 58 62 63
HexDump:40 [46] 58 60
Lexer:3 [16] 31 32 33 34 35 36 37 38 50 58
NonRedefines:3 [10] 28
NumFieldValue:40 [47] 59 62
RecordFactory:3 12 [17] 30 50 59 61 62 63
Redefines:3 [10] 11 14 22
Report:3 [15] 30 44 50 59
Source:3 [15]
SyntaxError:3 [7] 18 26 28
TestDDE:[49]
Test_DProfile_1:
 [50]
Test_DProfile_2:
 [51]
UnsupportedError:
 3 [7] 21 23 24 25
Usage:3 [9] 11 26 65
UsageComp:3 [9] 26
UsageComp3:3 [9] 26
UsageDisplay:3 [9] 26 28
UsageError:3 [7] 14 33
Visitor:3 [8] 15
__version__:5 6 [43] 57
cobol_dde:30 40 41 49 [55] 58 59 61 62 63 66
data_profile:49 [55] 58 59 60 62
decimal:[4] 9 14 18
os:[55]
re:[4] 16
string:3 [4] 11 14 16 44
struct:[4] 9
time:[55] 64

Created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py at Sun Mar 14 10:46:52 2010. pyweb.__version__ '$Revision$'. Source DDE.w modified Sun Mar 14 10:46:18 2010.

Working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.