Use a COBOL Record Layout to Process an EBCDIC File
Steven F. Lott
Contents
When dealing with "Flat Files" from legacy COBOL problems, there are several problems that need to be solved.
Generally, COBOL files are defined by a "Data Definition Entry" (DDE) that provides the record layout.
This library helps parse DDE's to determine the offset, size, and encoding of each field. This information can be used by Python programs to process files that originate from COBOL systems.
There are two common applications for a Python-based analysis of COBOL data.
ETL. Extract Transform and Load (ETL) processing is a common pipeline between legacy COBOL ("Flat File") applications and relational database (or Object Database) applications. A Python application can be used to Extract data from COBOL flat files and either load a database or create a file in a more usable notation (e.g., XML, JSON or CSV).
The cobol_dde module helps to create an ETL application.
Profiling. Data Warehouse processing requires complete understanding of source application data domains. This complete understanding of a domain is part of analyzing data quality. Data domains, when not formalized by a programming language or database design, tend to grow in sometimes obscure ways, including bad data, special-purpose data and undocumented data.
Bad data are simply invalid values used in application files. These can be tolerated either because of programming bugs or dependencies that permit invalid data under certain circumstances. Typically, the latter case indicates a normalization issue. The domain values are illegal.
Special-purpose data is often a "patch" or "hack" to work around a problem. It may be documented, but rarely used, and unexpected by ETL programmers. Sometimes the special-purpose data represents an operational hack to work around a problem without changing the programming. Irrespective of the origin of the hack, the domain values are legal, but unexpected and uncommon.
Undocumented data is ordinary operational values that are actually widely used by unexpected because they are undocumented. Often this is because of data that is not part of the essential use cases for an application, but is additional data with an obscure final destination. The domain values are legal, unexpected but common.
Flat files processed by COBOL programs are very common and often suffer from the above data quality problems. The language imposes few rules on data domains. Programs can be easily modified to extend a domain. Processing rules can be quite obscure, making it necessary to analyze actual data rather than program source.
This application produces simple reports on the range of values found for particular fields in a file. This information is used to understand data quality, the actual values in a domain, and support reverse engineering software.
The primary use case is described below.
The user is given a file and the associated record definition (also known as a "Copy Book").
The user creates an application program based on data_profile to name the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form.
The user runs the profiler with the file, driver and copybook.
The application produces a summary report showing each of the named fields, their complete domain of values, and the occurance count for each value.
The user is given a file and the associated record definition (also known as a "Copy Book").
The user creates an application program based on cobol_dde to extract the fields of interest. The user may also include programming for the following: (1) to separate occurances or variant record types when the file is not in first normal form, (2) to conditionally process fields when the file is not in second or third normal form. The application loads a database or writes an output file in a more usable format (e.g., JSON).
The user installs the application for operational use.
The cobol_dde module has two distinct phases of operation. The first phase parses a COBOL Data Description Entry (DDE) to understand the record layout of the file. The second phase uses the parsed DDE to extract fields from a record.
The data_profile module is a framework for building data profiling applications based on the cobol_dde module. The data profiler uses COBOL Data Description Entry (DDE) to understand the record layout of the file. The profiling then uses uses the DDE object to examine selected fields of the file.
The DDE class is a recursive definition of a COBOL group-level DDE. There are two basic species of COBOL DDE's: elemetary items, which have a picture clause, and group-level items, which contain lower-level items. There are everal optional features of every DDE, including an occurs clause and a redefines clause. In additional a picture clause, elementary items can also have and optional usage clause, and optional sign clause.
The picture clause specifies how to interpret a sequence of bytes. The picture clause interacts with the optional usage clause, sign clause and synchronized clause to fully define the bytes. The picture clause uses a complex format of code characters to define either individual character bytes (when the usage is display) or dual decimal digit bytes (when the usage is computational).
The occurs clause specifies an array of elements. If the occurs clause appears on a group level item, the sub-record is repeated. If the occurs clause appears on an elementary item, that item is repeated.
The redefines clause defines an alias for input bytes. When field R redefines a previously defined field F, the storage bytes are used for both R and F. The record structure does not provide a way to disambiguate the interpretation of the bytes. Program logic must be examined to determine which interpretation is valid.
DDE Class. The parent class, DDE, defines the features of a group-level item. It supports the occurs and redefines features. It can contain a number of DDE items. The leaves of the tree, DDEElement, define the features of an elementary item. It adds support for the picture clause, but removes support for lower-level items.
The optional clauses are handled using a variety of design patterns. The usage information, for instance, is used to create a Strategy object that is used to extract a field from a record's bytes.
The redefines information is used to create a Strategy object that computes the offset to a field. There are two variant strategies: locate the basis field and use that field's offset or use the end of the previous element as the offset.
The Visitor pattern is used to traverse a DDE structure to write reports on the structure. The data_profile module uses a Visitor to write a detailed dump of a given record.
The RecordFactory object reads a file of text and either creates a DDE or raises an exception. If the text is a valid COBOL record definition, a DDE is created. If there are syntax errors, an exception is raised.
The RecordFactory depends on a Lexer instance to do lexical scanning of COBOL source. The lexical scanner can be subclassed to pre-process COBOL source. This is necessary because of the variety of source formats that are permitted. Shop standards may include or exclude features like program identification, line numbers, format control and other decoration of the input.
The makeRecord() method of the RecordFactory class does the parsing of the record definition. Each individual DDE statement is parsed. The level number information is used to define the correct grouping of elements. When the structure is parsed, it is decorated with size and offset information for each element.
There are two broad types of character interpretation:
These require different strategies for decoding the input bytes. Note that the COBOL languages, and IBM's extensions, provide for a number of usage options. In this application, three basic types of usage strategies are supported: display, comp and comp-3.
A typical data profiling application program has the following general form.
import cobol_dde import data_profile rf= cobol_dde.RecordFactory() dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) ) # Create a FieldScan for the three fields we care about fieldList= data_profile.FieldScan( [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ), data_profile.NumFieldValue( dde, 'MCUDBI-YR' ), data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' ) ] ) # Create a FileScan for the file, using the given FieldScan list of fields fs= data_profile.FileScan( dde, fieldList, aFileName ) # Process through the given ending record fs.process( end )
Is EBCDIC->ASCII conversion a feature of DDE? May need subclass or strategy for conversion.
Consider combining PIC, USAGE, and SIGN information into a single data type specification.
Add capability to search using a path string instead of individual get() calls in DDE
Create subclass of DDE for non-group-level items that adds PICTURE and USAGE features and removes the container.
We'll define the cobol_dde module and a test_dde unit test for this module.
We'll reuly on the decimal module is used to do fixed-precision decimal arithmetic.
Note
The legacy implementation was the FixedPoint module. While the FixedPoint module is handy, it is not as robust as the decimal module.
The cobol_dde module provides the DDE record definition and Lexical scanning capability.
Separately, we'll look at the data_profile module. This defines the scanning and analyzing features.
The cobol_dde module has the following structure.
cobol_dde.py (1)
→ DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2) → (3) → (4) → (5) → (6) → DDE Exception Definitions (7) # 1. Basic class definitions → DDE Visitor base class - to analyze a complete DDE tree structure (8) → DDE Usage Strategy class hierarchy - to extract data from input buffers (9) → DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10) # 2. DDE class definition → DDE Class Hierarchy - defines group and elementary data descriptions elements (11) # 3. Some utility classes for reporting → DDE Common Visitors for reporting on a DDE structure (15) # 4. The Lexical Scanning and Parsing of an input record layout → DDE Lexical Scanner base class provides the default lexical scanner implementation (16) → DDE RecordFactory parses a record clause to create a DDE instance (17)
Overheads includes the following: the shell escape, the doc string, imports and any CVS cruft.
The shell escape line allows this module to be run as a stand-alone application.
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (2)
#!/usr/bin/env python
Used by: cobol_dde.py (1)
The doc string provides documentation embedded within this module.
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (3)
"""COBOL Data Description Entries (a/k/a Record-Layout Objects) A COBOL Record is a collection of data description entries. Each entry is either a simple field (with a PICTURE) or a group of fields. Each field has an optional occurs clause, or redefines clause. Each field has a usage (DISPLAY or COMP or COMP-3). Each field is assigned an offset, size and data type (numeric or alpha). This module includes the following class definitions: DDE Defines a COBOL record layout object. Each record layout object has operations to locate individual fields or occurance instances. Usage UsageDisplay UsageComp UsageComp3 Various USAGE clauses; these classes provide a valueOf() method which decodes record bytes to a proper value. Redefines NonRedefines Two strategies for computing a field's offset - either it is after the previous field in memory, or it redefines another field's location in memory. RecordFactory Parses a COBOL copybook to create the DDE structure used to parse a character string into record fields. Lexer A COBOL lexical scanner. If necessary, this can be subclassed to handle unusual file formats or other record definition copy book problems. Visitor Source Report Dump A Visitor can traverse the DDE hierarchy. Each DDE has a visit() method that applies the visitor to the parent and each child in order. Source displays canonical source from the original input. Report displays the fields including size and offset information. Dump is used by visitOccurance() to dump each occurance of each field of a record. SyntaxError Raised for a COBOL syntax error. UnsupportedError Raised for a COBOL feature that is not supported by this module. UsageError Raised for a DDE that is not used properly; e.g., occurs-clause out of range. """
Used by: cobol_dde.py (1)
The following imports are used by this module.
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (4)
import re import struct import string import decimal
Used by: cobol_dde.py (1)
The CVS cruft provides a place for CVS or other version control tool to place the revision number information within this module.
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (5)
__version__ = """$Revision$"""
Used by: cobol_dde.py (1)
We also place a pyweb warning in the overheads. This reminds anyone reading the .py file that it is generated from a pyweb .w source.
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft (6)
### DO NOT EDIT THIS FILE! ### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'. ### From source DDE.w modified Sun Mar 14 10:46:18 2010. ### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.
Used by: cobol_dde.py (1)
The SyntaxError exception is raised during parsing for a few egregious COBOL syntax problems. One presumption underlying this program is that all copybooks are from production source programs, and have no syntax errors.
The UnsupportedError exception is raised during parsing for features of COBOL DDE's that are not supported by this program. These includes features like the OCCURS a TO b variation on the occurs clause, the OCCURS DEPEND ON clause, the RENAMES clause, the SIGN clause, the SYNCHRONIZED clause.
The UsageError exception is raised during analysis of a field when something invalid has happened during field extract.
DDE Exception Definitions (7)
class SyntaxError( Exception ): """COBOL syntax error.""" pass class UnsupportedError( Exception ): """A COBOL DDE has features not supported by this module.""" pass class UsageError( Exception ): """A COBOL DDE is not used properly, e.g., occurs-clause out of range.""" pass
Used by: cobol_dde.py (1)
The Base Class definitions can be separated into four high-level subject areas: (1) some basic definitions, (2) the DDE class hierarchy, (3) utility classes for reporting, and (4) the lexical scanning and parsing classes.
The basic definitions include the Visitor base class, the Usage strategy class hierarchy and the Redefines strategy class hierarch.
The DDE class hierarchy is the DDE class and the DDEElement class.
The utility classes include a number of common Visitor subclasses. These incldue Source, Report and Dump. Source produces a canonical report on the COBOL source. Report produces an analysis of the fields, their sizes and offsets. Dump can be used to dump all fields of a record.
The lexical scanning and parsing classes are Lexer and RecordFactory.
The Visitor design pattern is used to simplify recursive-descent depth-first in-order traversal of the parse tree. An instance of this class must provide a dde() method definition. Each individual element is passed to this method from the top of the DDE structure down each branch in depth-first order.
An instance of visitor may provide an __init__() method that can be used to initialize any internal data structures. An instance may also provide a finish() method that can be called at the end of a traversal to write a summary of the structure.
DDE Visitor base class - to analyze a complete DDE tree structure (8)
class Visitor( object ): """Visits each node of a DDE, doing a depth-first traversal of the structure.""" def __init__( self ): self.indent= 0 def enterSub( self ): self.indent += 1 def exitSub( self ): self.indent -= 1 def dde( self, aDDE ): """Given a DDE, perform the requested process.""" pass def finish( self ): """Any summary information at the end of the visit.""" pass
Used by: cobol_dde.py (1)
Usage is used to combine information in the picture, usage, sign and synchronized clauses.
The Strategy design pattern allows a DDE element to delegate the size() and valueOf() operations to this class.
The size() method returns the number of bytes used by the data element. For usage display, the size can be computed from the picture clause. For usage computational, the size is 2, 4 or 8 bytes. For usage computational-3, the picture clause digits are packed two per byte with an extra half-byte for sign information.
The valueOf() method returns a usable Python value extracted from the record's bytes. The UsageDisplay subclass does numeric conversion for numeric pictures, otherwise the data is left as a string. The UsageComp subclass does numeric conversion for binary coded data. This handles the mainframe endian conversion. The UsageComp3 subclass unpacks the digits into a character string and then does character-to-numeric conversion.
DDE Usage Strategy class hierarchy - to extract data from input buffers (9)
class Usage( object ): """Covert numeric data based on Usage clause.""" def __init__( self, name_ ): self.myName= name_ self.numeric= None self.originalSize= None self.scale= None self.precision= None self.signed= None def setTypeInfo( self, **typeInfo ): """After parsing a PICTURE clause, provide additional usage information.""" self.numeric = typeInfo['numeric'] self.originalSize = typeInfo['length'] self.scale = typeInfo['scale'] self.precision = typeInfo['precision'] self.signed = typeInfo['signed'] self.decimal = typeInfo['decimal'] def valueOf( self, buffer ): """Convert this data to a decimal number.""" return None def size( self, picture ): """Return the actual size of this data, based on PICTURE and SIGN.""" return len(picture) class UsageDisplay( Usage ): """Convert from ordinary character data to numeric.""" # NOTE: EBCDIC->ASCII conversion handled by the DDE as a whole. def __init__( self ): Usage.__init__( self, "DISPLAY" ) def valueOf( self, buffer ): if self.numeric and self.precision != 0: if self.decimal == '.': return decimal.Decimal( buffer ) # Insert the implied decimal point. return decimal.Decimal( buffer[:-self.precision]+"."+buffer[-self.precision:] ) elif self.numeric and self.precision == 0: return int(buffer) return buffer class UsageComp( Usage ): """Convert from COMP data to numeric. This may need to be overridden to handle little-endian data.""" def __init__( self ): Usage.__init__( self, "COMP" ) def valueOf( self, buffer ): n= struct.unpack( self.sc, buffer ) return decimal.Decimal( n[0] ) def size( self, picture ): if len(picture) <= 4: self.sc= '>h' return 2 elif len(picture) <= 9: self.sc= '>i' return 4 else: self.sc= '>q' return 8 class UsageComp3( Usage ): """Convert from COMP-3 data to numeric.""" def __init__( self ): Usage.__init__( self, "COMP-3" ) def valueOf( self, buffer ): display= [] for c in buffer: n= struct.unpack( "B", c ) display.append( str(n[0]/16) ) display.append( str(n[0]%16) ) #print repr(buffer), repr(display) #Last position has sign information: 'd' is <0, 'f' is unsigned, and 'c' >=0 f= decimal.Decimal( "".join(display[:-1]) ) if display[-1]==13: return -f return f def size( self, picture ): return int((len(picture)+2)/2)
Used by: cobol_dde.py (1)
Redefines is used to reset the offset to a specific group or elementary item. There are only two cases, modeled by two subclasses: Redefines and NonRedefines. An element can redefine another element; in this case the two elements have the same offset; this is handled by the Redefines class. An element can be independent; in this case it begins after the end of the lexically preceeding element; in this case the offset is computed from the previous element's offset + size.
The Strategy design pattern allows an element to delegate the offset(), indexedOffset() and size() methods. The Redefines subclass uses the redefines name to look up the offset and size information. The NonRedefines subclass uses the offset and size information currently being computed during the visit loop.
DDE Redefines Strategy class hierarchy - to define offsets to DDE elements (10)
class Redefines( object ): """Lookup size and offset from the field we rename.""" def __init__( self, name_=None ): self.myName= name_ def offset( self, offset, aDDE ): return aDDE.top.get( self.myName ).offset def indexedOffset( self, offset, aDDE ): return aDDE.top.get( self.myName ).indexedOffset def size( self, aDDE ): return 0 class NonRedefines( Redefines ): """More typical case is that we have our own size and offset.""" def offset( self, offset, aDDE ): return offset def indexedOffset( self, offset, aDDE ): return offset + aDDE.occurSize*aDDE.currentIndex def size( self, aDDE ): return aDDE.size
Used by: cobol_dde.py (1)
The DDE class itself defines a single element (group or elementary) of a record. There are several broad areas of functionality for a DDE: (1) construction, (2) reporting, (3) record scanning.
The class definition includes the attributes determined at parse time, attributes added during decoration time and attributes used during decoration processing.
Note that Group-level vs. item-level can be separate subclasses of DDE. And item-level definition has a picture clause; group level does not. A simple Visitor, then, can accumulate all item-level fields.
DDE Class Hierarchy - defines group and elementary data descriptions elements (11)
class DDE( object ): """A Data Description Entry. This is either a group-level item, which contains DDE's, or it is a lowest-level DDE, defined by a PICTURE clause. All higher-level DDE's are effectively string-type data. A lowest-level DDE with a numeric PICTURE is numeric-type data. Occurs and Redefines can occur at any level. Almost anything can be combined with anything else. Each entry is defined by the following attributes level COBOL level number 01 to 49, 66 or 88. myName COBOL variable name occurs the number of occurances (default is 1) picture the exploded picture clause, with ()'s expanded initValue any initial value provided offset offset to this field from start of record size overall size of this item, including all occurances occurSize the size of an individual occurance sizeScalePrecision ( numeric, size, scale (# of P's), precision) redefines an instance of Redefines used to compute the offset usage an instance of Usage used to do data conversions contains the list of contained fields parent the immediate parent DDE top the overall record definition DDE currentIndex the current index values used for locating data indexedOffset the current offset based on current index values The primary interface is get(), setIndex(), of() and valOf(). get('dataname') returns the DDE for the given dataname setIndex(x,...) sets the current indexes for the various occurs clauses of(record) locates this DDE's bytes within the given record valOf(record) locates this DDE's bytes and interprets them as a number """ def __init__( self, level, name_, usage=None, pic=None, occurs=None, redefines=None, ssp=(None,None,None,None), initValue=None ): self.level= level self.myName= name_ self.offset= 0 self.size= 0 self.occurs= occurs self.occurSize= None self.picture= pic self.sizeScalePrecision= ssp self.redefines= redefines self.usage= usage self.initValue= initValue self.contains= [] self.parent= None self.top= None self.currentIndex= 0 self.indexedOffset= None def __repr__( self ): return "%s %s %s" % ( self.level, self.myName, map(str,self.contains) ) def __str__( self ): oc= "" pc= "" rc= "" if self.occurs > 1: oc= " OCCURS %s" % self.occurs if self.picture: pc= " PIC %s USAGE %s" % ( self.picture, self.usage.myName ) if self.redefines.myName: rc= " REDEFINES %s" % ( self.redefines.myName ) return "%-2s %-20s%s%s%s." % ( self.level, self.myName, rc, oc, pc ) → DDE Class Construction methods (12) → DDE Class Reporting methods (13) → DDE Class Record Scanning methods (14)
Used by: cobol_dde.py (1)
Construction occurs in three general steps: (1) the DDE is created, (2) source attributes are set, (3) the DDE is decorated with size, offset and other details.
DDE Class Construction methods (12)
def append( self, aDDE ): """Add a substructure to this DDE. This is used by RecordFactory to assemble the DDE.""" self.contains.append( aDDE ) aDDE.parent= self def setTop( self, topDDE ): """Set the immediate parentage and top-level record for this DDE. Used by RecordFactory to assemble the DDE. Required before setSizeAndOffset().""" self.top= topDDE for f in self.contains: f.parent= self f.setTop( topDDE ) def setSizeAndOffset( self, offset=0 ): """Compute the size and offset for each field of this DDE. Used by RecordFactory to assemble the DDE. Requires setTop be done first. Note: 88-level items inherit attributes from their parent. """ # Wire in a single occurance, it simplifies the math, below. if not self.occurs: self.occurs= 1 # If this is a redefines, get a different offset, otherwise use this offset self.offset= self.redefines.offset( offset, self ) # Set the default indexedOffset self.indexedOffset= self.offset # PICTURE - elementary item; otherwise group-level item if self.picture: # Get the correct size based on USAGE self.occurSize= self.usage.size(self.picture) self.size= self.occurSize * self.occurs # Any contained items? These would be 88-level items. for f in self.contains: assert '88' == f.level, "Unexpected Level {0!r}".format(f.level) f.setSizeAndOffset(self.offset) elif self.level == '88': self.occurSize= self.parent.occurSize self.size = self.parent.size self.usage= self.parent.usage else: # Get the correct size based on each element of the group s= 0 # Was self.offset???? Wasn't That Funny? for f in self.contains: # Element size and offset f.setSizeAndOffset(s) # non-redefines add to the size; redefines add 0 to the size s += f.redefines.size( f ) self.occurSize= s # Multiply by the number of occurances to get the total size self.size= self.occurSize * self.occurs
Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)
The Visitor design pattern requires that each DDE have a method that is used to implement the visitor traversal. The visit() method visits each element. The visitOccurance() method visits each occurance of each element.
DDE Class Reporting methods (13)
def visit( self, visitor ): """Visit this DDE and each element.""" visitor.dde( self ) if self.contains: visitor.enterSub() for f in self.contains: f.visit( visitor ) visitor.exitSub() def visitOccurance( self, visitor ): """Visit each occurance of this DDE and each occurance of each element.""" if not self.occurs: return for self.currentIndex in range(0,self.occurs): self.top.setIndexedOffset(0) # compute offsets for this new index value visitor.dde( self ) if self.contains: visitor.enterSub() for f in self.contains: f.visitOccurance( visitor ) visitor.exitSub()
Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)
The process of scanning a record involves methods to locate a specific field, set the occurance index of a field, and pick bytes of a record input buffer.
DDE Class Record Scanning methods (14)
def pathTo( self ): """Return the complete path to this DDE.""" if self.parent: return self.parent.pathTo() + "." + self.myName return self.myName def get( self, name_ ): """Find the named field, and return the substructure. If necessary, search down through levels.""" for c in self.contains: if c.myName == name_: return c for c in self.contains: try: f= c.get(name_) if f: return f except UsageError, e: pass raise UsageError( "Field %s unknown in this record" % name_ ) def setIndex( self, *occurance ): """Set the index values for locating specific data bytes.""" # Handles multi-dimensional short-cut syntax. # Work up through parentage to locate occurs clauses and pop off indexes if self.occurs > 1: if self.occurs < occurance[-1] or occurance[-1] <= 0: raise UsageError( "Occurs value %r out of bounds %r" % ( occurance, self ) ) self.currentIndex= occurance[-1]-1 #print self.myName, 'occurs', self.occurs, 'index', self.currentIndex+1 # Recursive call to setIndex for all remaining index values. if occurance[:-1]: self.parent.setIndex( *occurance[:-1] ) else: #print self.myName, 'search upward',repr(occurance) self.parent.setIndex( *occurance ) # Compute offsets for these new index values self.top.setIndexedOffset(0) return self def setIndexedOffset( self, offset=0 ): """Given index values, compute the indexed offsets into occurs clauses. Used by setIndex to compute indexed offsets.""" # TODO: may be able to eliminate this if-statement! if self.occurSize: # Redefines will use an offset from another field, otherwise use the offset provided self.indexedOffset= self.redefines.indexedOffset( offset, self ) s= self.indexedOffset for f in self.contains: # Update elements within this group f.setIndexedOffset( s ) # Redefines add zero to the size, otherwise increment offset with the size s += f.redefines.size( f ) def of( self, aString ): """Pick the data bytes out of an input string. TODO: May require EBCDIC->ASCII conversion. Requires setIndexedOffset() call if indexes were changed without calling setIndex() Use valOf to handle packed decimal data (USAGE COMP-3). """ b= self.indexedOffset return aString[b:b+self.occurSize] def valOf( self, aString ): """Pick the data bytes out of an input string and interpret as a number.""" bytes= self.of( aString ) return self.usage.valueOf( bytes )
Used by: DDE Class Hierarchy - defines group and elementary data descriptions elements (11); cobol_dde.py (1)
Two common visitor needs are: (1) visit all elements, producing a listing that is a canonical version of the original source; (2) visit all elements producing additional details (e.g., size, offset, data type). Additionally, when examining actual data values, it is necessary to visit each element displaying the current value of that element. This traversal needs to visit each occurance, also. This depends on the visitOccurance() method of a DDE.
DDE Common Visitors for reporting on a DDE structure (15)
class Source( Visitor ): """Display canonical source from copybook parsing.""" def dde( self, aDDE ): print self.indent*' ', aDDE class Report( Visitor ): """Report on copybook structure.""" def dde( self, aDDE ): numeric,size,scale,precision= aDDE.sizeScalePrecision if numeric: nSpec= '%d.%d' % ( size, precision ) else: nSpec= "" print "%-65s %3d %3d %5s" % (self.indent*' '+str(aDDE), aDDE.offset, aDDE.size, nSpec) class Dump( Visitor ): """Dump the data values of this structure.""" def __init__( self, data ): Visitor.__init__( self ) self.data= data def dde( self, aDDE ): db= aDDE.of(self.data) dstr= [] for c in db: dstr.append( "%2s"%hex( ord(c) )[2:] ) r= " ".join(dstr) # or r=db if aDDE.occurs > 1: print "%-65s %3d %3d %3d '%s'" % (self.indent*' '+str(aDDE), aDDE.indexedOffset, aDDE.size, aDDE.currentIndex+1, r) elif aDDE.picture and aDDE.myName != "FILLER": print "%-65s %3d %3d '%s'" % (self.indent*' '+str(aDDE), aDDE.indexedOffset, aDDE.size, r) else: print "%-65s %3d %3d" % (self.indent*' '+str(aDDE), aDDE.indexedOffset, aDDE.size)
Used by: cobol_dde.py (1)
The lexical scanner can be subclassed to extend its capability. The default lexical scanner provides a lineClean() function that simply removes comments. This may need to be overridden to remove line numbers (from positions 72-80), module identification (from positions 1-5), and format control directives.
DDE Lexical Scanner base class provides the default lexical scanner implementation (16)
class Lexer( object ): """Lexical scanner for COBOL. Given a block of text, this scanner will remove comment lines. next() will step through the tokens unget(token) will back up a token """ def __init__( self, text ): """Initialize the scanner by cleaning the text.""" self.lines= self.lineClean( text ) self.backup= [] self.separator= re.compile( r'[.,;]?s' ) self.quote1= re.compile( r"'[^']*'" ) self.quote2= re.compile( r'"[^"]*"' ) def lineClean( self, text ): """Default cleaner skips comments.""" return [ l[6:]+' ' for l in text.split('n') if len(l) > 6 and l[6] not in ('*','/') ] def next( self ): """Locate the next token in the input stream.""" if self.backup: return self.backup.pop() #print "self.lines=", self.lines if not self.lines[0]: self.lines.pop(0) if not self.lines: print "EOF" return None while self.lines and self.lines[0] and self.lines[0][0] in string.whitespace: self.lines[0]= self.lines[0].lstrip() if not self.lines[0]: self.lines.pop(0) if not self.lines: return None if self.lines[0][0] == "'": # quoted string, break on balancing quote match= self.quote1.match( self.lines[0] ) space= match.end() elif self.lines[0][0] == '"': # quoted string, break on balancing quote match= self.quote2.match( self.lines[0] ) space= match.end() else: match= self.separator.search( self.lines[0] ) space= match.start() if space == 0: # starts with separator space= match.end()-1 token, self.lines[0] = self.lines[0][:space], self.lines[0][space:] #print token return token def unget( self, token ): """Push one token back into the input stream.""" self.backup.append( token )
Used by: cobol_dde.py (1)
The RecordFactory class is the parser for record definitions. The parser has three basic sets of methods: (1) clause parsing methods, (2) element parsing methods and (3) Complete record layout parsing.
Parsing a record layout involves parsing a sequence of elements and assembling them into a proper structure. Each element consists of a sequence of individual clauses.
DDE RecordFactory parses a record clause to create a DDE instance (17)
class RecordFactory( object ): """Parse a copybook, creating a DDE structure.""" def __init__( self ): self.lex= None self.token= None self.context= [] self.noisewords= ("WHEN","IS","TIMES") self.keywords= ("BLANK","ZERO","ZEROS","ZEROES", "DATE","FORMAT","EXTERNAL","GLOBAL", "JUST","JUSTIFIED","LEFT","RIGHT" "OCCURS", "PIC","PICTURE", "REDEFINES","RENAMES", "SIGN","LEADING","TRAILING","SEPARATE","CHARACTER", "SYNCH","SYNCHRONIZED", "USAGE","DISPLAY","COMP-3", "VALUE",".") → DDE Picture Clause Parsing (18) → DDE Blank When Zero Clause Parsing (19) → DDE Justified Clause Parsing (20) → DDE Occurs Clause Parsing (21) → DDE Redefines Clause Parsing (22) → DDE Renames Clause Parsing (23) → DDE Sign Clause Parsing (24) → DDE Synchronized Clause Parsing (25) → DDE Usage Clause Parsing (26) → DDE Value Clause Parsing (27) → DDE Element Parsing (28) → DDE Record Parsing (29)
Used by: cobol_dde.py (1)
DDE Picture Clause Parsing (18)
def picParse( self, pic ): """Rewrite a picture clause to eliminate ()'s, S's, V's, P's, etc. Returns expanded, normalized picture and (type,length,scale,precision,signed) information.""" out= [] scale, precision, signed, decimal = 0, 0, False, None while pic: c= pic[:1] if c in ('A','B','X','Z','9','0','/',',','+','-','*','$'): out.append( c ) if decimal: precision += 1 pic= pic[1:] elif pic[:2] in ('DB','CR'): out.append( pic[:2] ) pic= pic[2:] elif c == '(': irpt= 0 pic= pic[1:] # A regular expression may be quicker and simpler! try: while pic and pic[:1].isdigit(): irpt = 10*irpt+int( pic[:1] ) pic= pic[1:] except ValueError, t: raise SyntaxError( "picture error in %r"%pic ) out.append( (irpt-1)*out[-1] ) assert pic[0] == ')', SyntaxError( "picture error in %r"%pic ) pic= pic[1:] elif c == 'S': # silently drop an "S". # Note that 'S' plus a SIGN SEPARATE option increases the size of the picture! signed= True pic= pic[1:] elif c == 'P': # silently drop a "P", since it just sets scale and isn't represented. scale += 1 pic= pic[1:] elif c == "V": decimal= "V" pic= pic[1:] elif c == ".": decimal= "." out.append( "." ) pic= pic[1:] else: raise SyntaxError( "picture error in %s"%pic ) final= "".join( out ) alpha= ('A' in final) or ('X' in final) or ('/' in final) #print pic, final, alpha, scale, precision # Note: Actual size depends on len(final) and usage! return dict( final=final, alpha=alpha, numeric=not alpha, length=len(final), scale=scale, precision= precision, signed=signed, decimal=decimal) def picture( self ): """Parse a PICTURE clause.""" if self.token == "IS": self.token= self.lex.next() pic= self.lex.next() self.token= self.lex.next() return self.picParse(pic)
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Blank When Zero Clause Parsing (19)
def blankWhenZero( self ): """Gracefully skip over a BLANK WHEN ZERO clause.""" self.token= self.lex.next() if self.token == "WHEN": self.token= self.lex.next() if self.token in ("ZERO","ZEROES","ZEROS"): self.token= self.lex.next()
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Justified Clause Parsing (20)
def justified( self ): """Gracefully skip over a JUSTIFIED clause.""" self.token= self.lex.next() if self.token == "RIGHT": self.token= self.lex.next()
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Occurs Clause Parsing (21)
def occurs( self ): """Parse an OCCURS clause.""" occurs= self.lex.next() if occurs == "TO": # format 2 - occurs depending on with assumed 1 for the lower limit # TODO - parse the Occurs Depending On clause raise UnsupportedError( "Occurs depending on" ) self.token= self.lex.next() if self.token == "TO": # format 2 - occurs depending on # TODO - parse the Occurs Depending On clause raise UnsupportedError( "Occurs depending on" ) else: # format 1 - fixed-length if self.token == "TIMES": self.token= self.lex.next() if self.token in ("ASCENDING","DESCENDING"): self.token= self.lex.next() if self.token == "KEY": self.token= self.lex.next() if self.token == "IS": self.token= self.lex.next() # get key data names while self.token not in self.keywords: self.token= self.lex.next() if self.token == "INDEXED": self.token= self.lex.next() if self.token == "BY": self.token= self.lex.next() # get indexed data names while self.token not in self.keywords: self.token= self.lex.next() return int(occurs)
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Redefines Clause Parsing (22)
def redefines( self ): """Parse a REDEFINES clause.""" redef= self.lex.next() self.token= self.lex.next() return Redefines(redef)
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Renames Clause Parsing (23)
def renames( self ): """Raise an exception on a RENAMES clause.""" ren1= self.lex.next() self.token= self.lex.next() if self.token in ("THRU","THROUGH"): ren2= self.lext.next() self.token= self.lex.next() raise UnsupportedError( "Renames clause" )
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
There are two variations on the SIGN clause syntax.
DDE Sign Clause Parsing (24)
def sign1( self ): """Raise an exception on a SIGN clause.""" self.token= self.lex.next() if self.token == "IS": self.token= self.lex.next() if self.token in ("LEADING","TRAILING"): self.sign2() # TODO: this may change the size to add a sign byte raise UnsupportedError( "Sign clause" ) def sign2( self ): """Raise an exception on a SIGN clause.""" self.token= self.lex.next() if self.token == "SEPARATE": self.token= self.lex.next() if self.token == "CHARACTER": self.token= self.lex.next() raise UnsupportedError( "Sign clause" )
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Synchronized Clause Parsing (25)
def synchronized( self ): """Raise an exception on a SYNCHRONIZED clause.""" self.token= self.lex.next() if self.token == "LEFT": self.token= self.lex.next() if self.token == "RIGHT": self.token= self.lex.next() raise UnsupportedError( "Synchronized clause" )
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
There are two variations on the USAGE clause syntax.
DDE Usage Clause Parsing (26)
def usage( self ): """Parse a USAGE clause.""" self.token= self.lex.next() if self.token == "IS": self.token= self.lex.next() use= self.token self.token= self.lex.next() return self.usage2( use ) def usage2( self, use ): """Create a correct Usage instance based on the USAGE clause.""" if use == "DISPLAY": return UsageDisplay() elif use == "COMPUTATIONAL": return UsageComp() elif use == "COMP": return UsageComp() elif use == "COMPUTATIONAL-3": return UsageComp3() elif use == "COMP-3": return UsageComp3() else: raise SyntaxError( "Unknown usage clause %r" % use )
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Value Clause Parsing (27)
def value( self ): """Parse a VALUE clause.""" if self.token == "IS": self.token= self.lex.next() lit= self.lex.next() self.token= self.lex.next() return lit
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Element Parsing (28)
def makeDDE( self ): """Create a single DDE from an entry of clauses.""" # Pick off the level level= self.token # Pick off a name, if present name_= self.lex.next() if name_ in self.keywords: self.lex.unget( name_ ) name_= "FILLER" # Accumulate the relevant clauses, dropping noise words and irrelevant clauses. usage= UsageDisplay() pic, typeInfo= None, None occurs= None redefines= NonRedefines() self.token= self.lex.next() while self.token and self.token != '.': if self.token == "BLANK": self.blankWhenZero() elif self.token in ("EXTERNAL","GLOBAL"): self.token= self.lex.next() elif self.token in ("JUST","JUSTIFIED"): self.justified() elif self.token == "OCCURS": occurs= self.occurs() elif self.token in ("PIC","PICTURE"): self.typeInfo= self.picture() pic= self.typeInfo['final'] elif self.token == "REDEFINES": redefines= self.redefines() elif self.token == "RENAMES": self.renames() elif self.token == "SIGN": self.sign1() elif self.token in ("LEADING","TRAILING"): self.sign2() elif self.token == "SYNCHRONIZED": self.synchronized() elif self.token == "USAGE": usage= self.usage() elif self.token == "VALUE": self.value() else: try: # Keyword USAGE is optional usage= self.usage2( self.token ) self.token= self.lex.next() except SyntaxError, e: raise SyntaxError( "%s unrecognized" % self.token ) # Create the DDE and return it # TODO: Add a subclass for elementary items different from group-level items if pic: usage.setTypeInfo(**self.typeInfo) return DDE( level, name_, pic=pic, usage=usage, occurs=occurs, redefines=redefines ) else: return DDE( level, name_, occurs=occurs, redefines=redefines )
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
DDE Record Parsing (29)
def makeRecord( self, lex ): """Parse an entire copybook block of text.""" self.lex= lex self.token= self.lex.next() # Parse the first DDE and establish the context stack. self.context= [ self.makeDDE() ] self.token= self.lex.next() while self.token: # Parse the next DDE dde= self.makeDDE() #print dde, ":", self.context[-1] # If a lower level # or same level #, pop context while dde.level <= self.context[-1].level: self.context.pop() # Make this DDE part of the parent DDE at the top of the context stack self.context[-1].append( dde ) # Push this DDE onto the context stack self.context.append( dde ) # Get the first token of the next DDE or find the end of the file self.token= self.lex.next() # Decorate the parse tree with parentage and basic size/offset information rec= self.context[0] rec.setTop( rec ) rec.setSizeAndOffset(0) return rec
Used by: DDE RecordFactory parses a record clause to create a DDE instance (17); cobol_dde.py (1)
The unit tests are not exhaustive. They test a number of key features, however.
test_dde.py (30)
#!/usr/bin/env python import unittest from cobol_dde import * class DDE_Test( unittest.TestCase ): def setUp( self ): # Create a Report() visitor to write a report on a structure self.rpt= Report() # Create a RecordFactory() to create DDE record definitions self.rf= RecordFactory() → DDE Test copybook 1 with basic features (31) → DDE Test copybook 2 with 88-level item (32) → DDE Test copybook 3 with nested occurs level (33) → DDE Test copybook from page 174 with nested occurs level (34) → DDE Test copybook from page 195 with simple redefines (35) → DDE Test copybook from page 197 with another redefines (36) → DDE Test copybook from page 198, example a (37) → DDE Test copybook from page 198, example b (38) if __name__ == "__main__": unittest.main()
DDE Test copybook 1 with basic features (31)
copy1= """ * COPY1.COB 01 DETAIL-LINE. 05 PIC X(7). 05 QUESTION PIC ZZ. 05 PIC X(6). 05 PRINT-YES PIC ZZ. 05 PIC X(3). 05 PRINT-NO PIC ZZ. 05 PIC X(6). 05 NOT-SURE PIC ZZ. 05 PIC X(7). """ class Test_Copybook_1( DDE_Test ): def setUp( self ): super( Test_Copybook_1, self ).setUp() def test_should_parse( self ): dde1 = self.rf.makeRecord( Lexer(copy1) ) #dde1.visit( self.rpt ) self.assertEquals( 7, dde1.get( "QUESTION" ).offset ) self.assertEquals( 2, dde1.get( "QUESTION" ).size ) self.assertEquals( "ZZ", dde1.get( "QUESTION" ).picture ) self.assertEquals( "DISPLAY", dde1.get( "QUESTION" ).usage.myName ) self.assertEquals( 15, dde1.get( "PRINT-YES" ).offset ) self.assertEquals( 2, dde1.get( "PRINT-YES" ).size ) self.assertEquals( "ZZ", dde1.get( "PRINT-YES" ).picture ) self.assertEquals( 20, dde1.get( "PRINT-NO" ).offset ) self.assertEquals( 2, dde1.get( "PRINT-NO" ).size ) self.assertEquals( "ZZ", dde1.get( "PRINT-NO" ).picture ) self.assertEquals( 28, dde1.get( "NOT-SURE" ).offset ) self.assertEquals( 2, dde1.get( "NOT-SURE" ).size ) self.assertEquals( "ZZ", dde1.get( "NOT-SURE" ).picture ) data= "ABCDEFG01HIJKLM02OPQ03RSTUVW04YZabcde" #d= Dump( data ) #dde1.visitOccurance( d ) self.assertEquals( "01", dde1.get('QUESTION').of(data) ) self.assertEquals( "02", dde1.get('PRINT-YES').of(data) ) self.assertEquals( "03", dde1.get('PRINT-NO').of(data) ) self.assertEquals( "04", dde1.get('NOT-SURE').of(data) )
Used by: test_dde.py (30)
Future Expansion: we need to use the default value provided with an 88-level item to create a boolean function.
DDE Test copybook 2 with 88-level item (32)
copy2= """ * COPY2.COB 01 WORK-AREAS. 05 ARE-THERE-MORE-RECORDS PIC X(3) VALUE 'YES'. 88 NO-MORE-RECORDS VALUE 'NO '. 05 ANSWER-SUB PIC 99. 05 QUESTION-SUB PIC 99. """ class Test_Copybook_2( DDE_Test ): def setUp( self ): super( Test_Copybook_2, self ).setUp() def test_should_parse( self ): dde2= self.rf.makeRecord( Lexer(copy2) ) #dde2.visit( self.rpt ) self.assertEquals( 0, dde2.get("ARE-THERE-MORE-RECORDS").offset ) self.assertEquals( 3, dde2.get("ARE-THERE-MORE-RECORDS").size ) self.assertEquals( "XXX", dde2.get("ARE-THERE-MORE-RECORDS").picture ) self.assertEquals( 0, dde2.get("NO-MORE-RECORDS").offset ) self.assertEquals( 3, dde2.get("NO-MORE-RECORDS").size ) self.assertEquals( 3, dde2.get("ANSWER-SUB").offset ) self.assertEquals( 5, dde2.get("QUESTION-SUB").offset ) data= "NO 4567" d= Dump( data ) #print dde2.visitOccurance( d ) self.assertEquals( "NO ", dde2.get("ARE-THERE-MORE-RECORDS").of(data) ) self.assertEquals( "NO ", dde2.get("NO-MORE-RECORDS").valOf(data) )
Used by: test_dde.py (30)
DDE Test copybook 3 with nested occurs level (33)
copy3= """ * COPY3.COB 01 SURVEY-RESPONSES. 05 QUESTION-NUMBER OCCURS 10 TIMES. 10 RESPONSE-CATEGORY OCCURS 3 TIMES. 15 ANSWER PIC 99. """ class Test_Copybook_3( DDE_Test ): def setUp( self ): super( Test_Copybook_3, self ).setUp() def test_should_parse( self ): dde3= self.rf.makeRecord( Lexer(copy3) ) #dde3.visit( self.rpt ) data = "111213212223313233414243515253616263717273818283919293010203" d= Dump(data) #dde3.visitOccurance( d ) self.assertEquals( 12, dde3.get('ANSWER').setIndex(1,2).valOf(data) ) self.assertEquals( 21, dde3.get('ANSWER').setIndex(2,1).valOf(data) ) try: self.assertEquals( 21, dde3.get('ANSWER').setIndex(1,4).valOf(data) ) self.fail() except UsageError, e: pass
Used by: test_dde.py (30)
From IBM COBOL Language Reference Manual, fourth edition: SC26-9046-03.
DDE Test copybook from page 174 with nested occurs level (34)
page174= """ 01 TABLE-RECORD. 05 EMPLOYEE-TABLE OCCURS 10 TIMES ASCENDING KEY IS WAGE-RATE EMPLOYEE-NO INDEXED BY A, B. 10 EMPLOYEE-NAME PIC X(20). 10 EMPLOYEE-NO PIC 9(6). 10 WAGE-RATE PIC 9999V99. 10 WEEK-RECORD OCCURS 52 TIMES ASCENDING KEY IS WEEK-NO INDEXED BY C. 15 WEEK-NO PIC 99. 15 AUTHORIZED-ABSENCES PIC 9. 15 UNAUTHORIZED-ABSENCES PIC 9. 15 LATE-ARRIVALS PIC 9. """ class Test_Copybook_4( DDE_Test ): def setUp( self ): super( Test_Copybook_4, self ).setUp() def test_should_parse( self ): dde4= self.rf.makeRecord( Lexer(page174) ) #dde4.visit( self.rpt ) self.assertEquals( 2920, dde4.size ) self.assertEquals( 0, dde4.offset ) self.assertEquals( 10, dde4.get("EMPLOYEE-TABLE" ).occurs ) self.assertEquals( 52, dde4.get("WEEK-RECORD" ).occurs ) self.assertEquals( 5, dde4.get("WEEK-RECORD" ).occurSize ) self.assertEquals( "999999", dde4.get("EMPLOYEE-NO").picture ) self.assertEquals( 36, dde4.get("LATE-ARRIVALS" ).setIndex(1,1).indexedOffset ) self.assertEquals( 41, dde4.get("EMPLOYEE-TABLE").setIndex(1).get("LATE-ARRIVALS" ).setIndex(2).indexedOffset )
Used by: test_dde.py (30)
DDE Test copybook from page 195 with simple redefines (35)
page195= """ 01 REDEFINES-RECORD. 05 A PICTURE X(6). 05 B REDEFINES A. 10 B-1 PICTURE X(2). 10 B-2 PICTURE 9(4). 05 C PICTURE 99V99. """ class Test_Copybook_5( DDE_Test ): def setUp( self ): super( Test_Copybook_5, self ).setUp() def test_should_parse( self ): dde5= self.rf.makeRecord( Lexer(page195) ) #dde5.visit( self.rpt ) self.assertEquals( 10, dde5.size ) self.assertEquals( 6, dde5.get("A").size ) self.assertEquals( 0, dde5.get("A").offset ) self.assertEquals( 6, dde5.get("B").size ) self.assertEquals( 0, dde5.get("B").offset ) self.assertEquals( 2, dde5.get("B-1").size ) self.assertEquals( 0, dde5.get("B-1").offset ) self.assertEquals( 4, dde5.get("B-2").size ) self.assertEquals( 2, dde5.get("B-2").offset ) self.assertEquals( "9999", dde5.get("B-2").picture ) self.assertEquals( 4, dde5.get("C").size ) self.assertEquals( 6, dde5.get("C").offset ) data = "AB12345678" d= Dump(data) #dde5.visitOccurance( d ) self.assertEquals( "AB1234", dde5.get("A").of(data) ) self.assertEquals( "AB1234", dde5.get("B").of(data) ) self.assertEquals( "AB", dde5.get("B-1").of(data) ) self.assertEquals( "1234", dde5.get("B-2").of(data) ) self.assertEquals( "5678", dde5.get("C").of(data) )
Used by: test_dde.py (30)
DDE Test copybook from page 197 with another redefines (36)
page197= """ 01 REDEFINES-RECORD. 05 NAME-2. 10 SALARY PICTURE XXX. 10 SO-SEC-NO PICTURE X(9). 10 MONTH PICTURE XX. 05 NAME-1 REDEFINES NAME-2. 10 WAGE PICTURE 999V999. 10 EMP-NO PICTURE X(6). 10 YEAR PICTURE XX. """ class Test_Copybook_6( DDE_Test ): def setUp( self ): super( Test_Copybook_6, self ).setUp() def test_should_parse( self ): dde6= self.rf.makeRecord( Lexer(page197) ) #dde6.visit( self.rpt ) self.assertEquals( 3, dde6.get("SALARY").size ) self.assertEquals( 0, dde6.get("SALARY").offset ) self.assertEquals( 9, dde6.get("SO-SEC-NO").size ) self.assertEquals( 3, dde6.get("SO-SEC-NO").offset ) self.assertEquals( 2, dde6.get("MONTH").size ) self.assertEquals( 12, dde6.get("MONTH").offset ) self.assertEquals( 6, dde6.get("WAGE").size ) self.assertEquals( 0, dde6.get("WAGE").offset ) self.assertEquals( "999999", dde6.get("WAGE").picture ) self.assertEquals( 3, dde6.get("WAGE").usage.precision ) self.assertEquals( 6, dde6.get("EMP-NO").size ) self.assertEquals( 6, dde6.get("EMP-NO").offset ) self.assertEquals( 2, dde6.get("YEAR").size ) self.assertEquals( 12, dde6.get("YEAR").offset ) data1= "ABC123456789DE" d1= Dump(data1) #dde6.visitOccurance( d1 ) self.assertEquals( "ABC", dde6.get("SALARY").of( data1 ) ) self.assertEquals( "123456789", dde6.get("SO-SEC-NO").of( data1 ) ) self.assertEquals( "DE", dde6.get("MONTH").of( data1 ) ) data2= "123456ABCDEF78" d2= Dump(data2) #dde6.visitOccurance( d2 ) self.assertAlmostEquals( 123.456, float(dde6.get("WAGE").valOf( data2 )) ) self.assertEquals( "ABCDEF", dde6.get("EMP-NO").of( data2 ) ) self.assertEquals( "78", dde6.get("YEAR").of( data2 ) )
Used by: test_dde.py (30)
DDE Test copybook from page 198, example a (37)
page198A= """ 01 REDEFINES-RECORD. 05 REGULAR-EMPLOYEE. 10 LOCATION PICTURE A(8). 10 GRADE PICTURE X(4). 10 SEMI-MONTHLY-PAY PICTURE 9999V99. 10 WEEKLY-PAY REDEFINES SEMI-MONTHLY-PAY PICTURE 999V999. 05 TEMPORARY-EMPLOYEE REDEFINES REGULAR-EMPLOYEE. 10 LOCATION PICTURE A(8). 10 FILLER PICTURE X(6). 10 HOURLY-PAY PICTURE 99V99. """ class Test_Copybook_7( DDE_Test ): def setUp( self ): super( Test_Copybook_7, self ).setUp() def test_should_parse( self ): dde7= self.rf.makeRecord( Lexer(page198A) ) #dde7.visit( self.rpt ) self.assertEquals( 18, dde7.get("REGULAR-EMPLOYEE").size ) self.assertEquals( 18, dde7.get("TEMPORARY-EMPLOYEE").size ) self.assertEquals( 6, dde7.get("SEMI-MONTHLY-PAY").size ) self.assertEquals( 6, dde7.get("WEEKLY-PAY").size ) data1= "ABCDEFGHijkl123456" d1= Dump(data1) #dde7.visitOccurance( d1 ) self.assertEquals( '1234.56', str(dde7.get("SEMI-MONTHLY-PAY").valOf( data1 )) ) data2= "ABCDEFGHijklmn1234" d2= Dump(data2) #dde7.visitOccurance( d2 ) self.assertEquals( '12.34', str(dde7.get("HOURLY-PAY").valOf( data2 ) ) )
Used by: test_dde.py (30)
DDE Test copybook from page 198, example b (38)
page198B= """ 01 REDEFINES-RECORD. 05 REGULAR-EMPLOYEE. 10 LOCATION PICTURE A(8). 10 GRADE PICTURE X(4). 10 SEMI-MONTHLY-PAY PICTURE 999V999. 05 TEMPORARY-EMPLOYEE REDEFINES REGULAR-EMPLOYEE. 10 LOCATION PICTURE A(8). 10 FILLER PICTURE X(6). 10 HOURLY-PAY PICTURE 99V99. 10 CODE-H REDEFINES HOURLY-PAY PICTURE 9999. """ class Test_Copybook_8( DDE_Test ): def setUp( self ): super( Test_Copybook_8, self ).setUp() def test_should_parse( self ): dde8= self.rf.makeRecord( Lexer(page198B) ) #dde8.visit( self.rpt ) self.assertEquals( 18, dde8.get("REGULAR-EMPLOYEE").size ) self.assertEquals( 18, dde8.get("TEMPORARY-EMPLOYEE").size ) self.assertEquals( 6, dde8.get("SEMI-MONTHLY-PAY").size ) self.assertEquals( 4, dde8.get("HOURLY-PAY").size ) self.assertEquals( 4, dde8.get("CODE-H").size ) rec1= "ABCDEFGHijkl123456" d1= Dump(rec1) #dde8.visitOccurance( d1 ) self.assertAlmostEquals( 123.456, float( dde8.get('REGULAR-EMPLOYEE') .get('SEMI-MONTHLY-PAY').valOf(rec1) ) ) rec2= "ABCDEFGHijklmn1234" d2= Dump(rec2) #dde8.visitOccurance( d2 ) self.assertEquals( 12.34, float( dde8.get('TEMPORARY-EMPLOYEE') .get('HOURLY-PAY').valOf(rec2) ) ) self.assertEquals( 1234, dde8.get('TEMPORARY-EMPLOYEE').get('CODE-H').valOf(rec2) ) self.assertEquals( "REDEFINES-RECORD.TEMPORARY-EMPLOYEE.HOURLY-PAY", dde8.get('HOURLY-PAY').pathTo() )
Used by: test_dde.py (30)
This is an application to handle simple data profiling. It will discover the range of values in particular fields.
It can be modified to profile and document relationships among data elements, also.
TBD.
The data_profile application has the following structure.
data_profile.py (39)
→ DProfile Shell Escape (42) → DProfile DOC String (40) → DProfile CVS Cruft and pyweb generator warning (43) → DProfile Imports (41) → DProfile Utility Functions (44) → DProfile Class Definitions (45)
DProfile DOC String (40)
"""data_profiledata_profiledata_profiledata_profile - use a cobol_dde to analyze a file. Given a DDE instance, and a file, either dump fields of records or accumulate distinct values of fields of a record. HexDump Display a record similarly to the way the TSO users see files using the File-Aid screens in TSO. FieldValue NumFieldValue Support gathering the actual domain for a field in a data file. FieldDump FieldScan Examine all FieldValue instances for a particular record layout. Either dump each FieldValue or scan to gather domain values. FileScan A standardized class for scanning a file to accumulate frequency tables for selected fields using FieldDump or FieldScan instance. This module includes the following utility functions: E2A Convert EBCDIC characters to ASCII characters. """
Used by: data_profile.py (39)
The data_profile module depends on the cobol_dde module.
DProfile Imports (41)
from cobol_dde import *
Used by: data_profile.py (39)
DProfile Shell Escape (42)
#!/usr/bin/env python # -*- coding: UTF-8 -*-
Used by: data_profile.py (39)
DProfile CVS Cruft and pyweb generator warning (43)
__version__ = """$Revision$""" ### DO NOT EDIT THIS FILE! ### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'. ### From source DDE.w modified Sun Mar 14 10:46:18 2010. ### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.
Used by: data_profile.py (39)
DProfile Utility Functions (44)
import codecs # Static sequence of ASCII character codes that should be used for each # EBCDIC character. # See http://www.natural-innovations.com/boo/asciiebcdic.html # for the source of this mapping. Note that unassigned EBCDIC characters # assigned ASCII 0xA4 (164, §) # Unicode Technical Report 16 has a reversible mapping, but it doesn't # seem to handle some EBCDIC characters correctly, notably ¢ and ¬. EBCDIC2ASCII= map( chr, [ 0x00,0x01,0x02,0x03,0xA3,0x09,0x97,0x7F,0xA4,0xA4,0x01,0x0B,0x0C,0x0D,0x0E,0x0F, 0x10,0x11,0x12,0x16,0xAE,0x15,0x08,0x2D,0x18,0x19,0xA9,0xA9,0x2D,0x2D,0x2D,0x2D, 0xD0,0x01,0x21,0xA4,0xA6,0x0A,0x17,0x1B,0xA4,0xA4,0x3B,0xA9,0xA4,0x05,0x06,0x07, 0xA4,0xA4,0x16,0xA4,0xA3,0xBA,0x1F,0x04,0xA4,0xA4,0xA4,0xA9,0x14,0x15,0xA4,0x1A, 0x20,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA2,0x2E,0x3C,0x28,0x2B,0x2E, 0x26,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x21,0x24,0x2A,0x29,0x3B,0xAC, 0x2D,0x2F,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x2C,0x25,0x5F,0x9B,0x3F, 0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x3A,0x23,0x40,0x27,0x3D,0x22, 0xA4,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0x6A,0x6B,0x6C,0x6D,0x6E,0x6F,0x70,0x71,0x72,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0xA4,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7A,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4,0x60,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,0x50,0x51,0x52,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0xA4,0xA4,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4, 0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0xA4,0xA4,0xA4,0xA4,0xA4,0xA4 ] ) def E2A_str( string ): """Return the ASCII version of this EBCDIC string.""" # r= StringIO.StringIO() # for c in string: # r.write( EBCDIC2ASCII[ ord(c) ] ) # s= r.getvalue() # r.close() chars= [ EBCDIC2ASCII[ ord(c) ] for c in string ] return "".join( chars ) def E2A( string ): """Return UNICODE version of this EBCDIC string.""" chars, used= codecs.getdecoder('cp037')(string) assert used == len(string) return chars
Used by: data_profile.py (39)
DProfile Class Definitions (45)
→ DProfile Hex Dump Class to do raw dump of a record (46) → DProfile Field Value Class Hierarchy to accumulate distinct values (47) → DProfile Field and Record Scanning does either dumps or disctinct value processing (48)
Used by: data_profile.py (39)
DProfile Hex Dump Class to do raw dump of a record (46)
# A handy hex dump printer class class HexDump: """Create a Hex Dump object that can dump records from a file.""" def __init__( self, aFile=None, rowSize=64 ): self.theFile= None if aFile: self.theFile= file(aFile,"rb") self.rows= 0 self.hex= '0123456789ABCDEF' self.rowSize= rowSize self.positions= "".join([ ("----+----%d"%(i+1))[:10] for i in range(self.rowSize/10) ]) + "----+-----"[:self.rowSize%10] def hexPrint( self, row, data ): """Print a row of data in two-line hex format.""" cha= [] top= [] bot= [] for c in data: if c in ('n','r','f','t','x00'): cha.append('.') else: cha.append( c ) top.append( self.hex[ ord(c)/16 ] ) bot.append( self.hex[ ord(c)%16 ] ) print '%3d|' % (row*self.rowSize+1), "".join( cha ) print " |", "".join(top) print " |", "".join(bot) def dump( self, bytes=64 ): """Dump a record of a given length.""" self.rows += 1 data= self.theFile.read(bytes) if not data: return None print "record %d (%d bytes)" % (self.rows, len(data)) print " |",self.positions rows= len(data)/self.rowSize for i in range(rows): self.hexPrint( i, data[i*self.rowSize:(i+1)*self.rowSize] ) self.hexPrint( rows, data[rows*self.rowSize:] ) print return self def dumpAll( self, bytes=64 ): """Dump all records in the file.""" while self.dump(bytes): pass
Used by: DProfile Class Definitions (45); data_profile.py (39)
DProfile Field Value Class Hierarchy to accumulate distinct values (47)
# Two handy classes for examining individual fields class FieldValue: """Accumulate unique values for a named field of a DDE. This will have to be subclassed for indexes of occurs clauses. """ def __init__( self, dde, cobolName ): """Given a DDE and a COBOL name, set up a field extractor and frequency mapping.""" self.cobolName= cobolName self.usage = dde.get(cobolName).usage self.get_field= dde.get(cobolName) self.domain= {} def getFrom( self, data ): """Get the value from the field, then accumulate in the frequency mapping.""" v= self.get_field.of( data ) self.domain[v]= self.domain.setdefault(v,0) + 1 def fqTable( self ): """Return a sequence of tuples with value and frequency count, sorted.""" val_count= self.domain.items() # Sort descending by second field (count), ascending by first field (value) val_count.sort( lambda a,b: cmp(b[1],a[1]) or cmp(a[0],b[0]) ) return val_count class NumFieldValue( FieldValue ): """Accumulate unique values for a named field of a DDE that is numeric. This will have to be subclassed for indexes of occurs clauses. """ def fqTable( self ): """Return a sequence of tuples with value and frequency count, sorted.""" val_count= [ (self.usage.valueOf(v),c) for v,c in self.domain.items() ] # Sort descending by second field (count), ascending by first field (value) val_count.sort( lambda a,b: cmp(b[1],a[1]) or cmp(a[0],b[0]) ) return val_count
Used by: DProfile Class Definitions (45); data_profile.py (39)
FieldScan accumulates distinct values in a list of fields. FieldDump dumps each individual field. A FileScan uses either a FieldScan or a FieldDump to accumulate or dump fields.
DProfile Field and Record Scanning does either dumps or disctinct value processing (48)
# Handy classes for examining all fields of all records of a file. class FieldScan: def __init__( self, aFieldList ): self.fieldList= aFieldList def process( self, recno, data ): for f in self.fieldList: f.getFrom( data ) def final( self, records ): print "n%d Records" % ( records ) for f in self.fieldList: print "n%-10s %7s" % ( f.cobolName, "count" ) for di,c in f.fqTable(): print "%-10s %7d" % ( di,c ) class FieldDump( FieldScan ): def process( self, recno, data ): print "nRecord %d" % (recno) for f in self.fieldList: v= f.get_field.of( data ) print " ", f.cobolName, f.usage.valueOf( v ) def final( self, records ): pass class FileScan: """Basic file scanning operation.""" def __init__( self, aDDE, aFieldProcess, aFileName ): self.dde= aDDE self.fieldProcess= aFieldProcess self.theFile= file( aFileName, "rb" ) self.record= 0 def reclen( self ): return self.dde.size def process( self, end=-1 ): data= self.theFile.read( self.reclen() ) while data: self.record += 1 self.fieldProcess.process( self.record, data ) if self.record == end: break data= self.theFile.read( self.reclen() ) self.theFile.close() self.fieldProcess.final(self.record)
Used by: DProfile Class Definitions (45); data_profile.py (39)
test_data_profile.py (49)
#!/usr/bin/env python # -*- coding: UTF-8 -*- import unittest from cobol_dde import * from data_profile import * import collections → DProfile Test 1 (50) → DProfile Test 2 (51) if __name__ == "__main__": unittest.main()
DProfile Test 1 (50)
copy1= """ * COPY1.COB 01 DETAIL-LINE. 05 PIC X(7). 05 QUESTION PIC ZZ. 05 PIC X(6). 05 PRINT-YES PIC ZZ. 05 PIC X(3). 05 PRINT-NO PIC ZZ. 05 PIC X(6). 05 NOT-SURE PIC ZZ. 05 PIC X(7). """ dataset1= ( "ABCDEFG11HIJKLM12NOP13QRSTUV14WXYZabcn", "ABCDEFG22HIJKLM22NOP23QRSTUV24WXYZabcn", "ABCDEFG11HIJKLM12NOP33QRSTUV34WXYZabcn", "ABCDEFG44HIJKLMX2NOP13QRSTUV44WXYZabcn", "ABCDEFG11HIJKLMX2NOP23QRSTUV54WXYZabcn" ) class Test_DProfile_1( unittest.TestCase ): def setUp( self ): # Create a Report() visitor to write a report on a structure rpt= Report() # Create a RecordFactory() to create DDE record definitions rf= RecordFactory() # copy1= open("copy1.cob","r").read() self.dde1= rf.makeRecord( Lexer(copy1) ) #self.dde1.visit( rpt ) def test_should_dump( self ): # dataset1= open("dataset.dat","r").readlines() question_domain= collections.defaultdict( int ) yes_domain= collections.defaultdict( set ) for record in dataset1: question= self.dde1.get('QUESTION').valOf(record) yes= self.dde1.get('PRINT-YES').of(record) question_domain[question] += 1 yes_domain[yes].add( yes ) #print record.rstrip() #print question,yes,no,notsure self.assertEquals( 3, len(question_domain ) ) self.assertEquals( 3, question_domain[11] ) self.assertEquals( 1, question_domain[22] ) self.assertEquals( 1, question_domain[44] ) self.assertEquals( set(['12', '22', 'X2']), set(yes_domain) )
Used by: test_data_profile.py (49)
DProfile Test 2 (51)
class Test_DProfile_2( unittest.TestCase ): def setUp( self ): self.dataset2="".join( map( chr, [ 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x5f, 0x4a ] ) ) def test_should_convert( self ): self.assertEquals( u"abcdefghixacxa2", E2A(self.dataset2) ) self.assertEquals( "abcdefghixacxa2", E2A_str(self.dataset2) )
Used by: test_data_profile.py (49)
TODO
Rewrite hexPrint so we can perform the following test:
def test_should_format_dump( self ): print "EBCDIC Data" HexDump().hexPrint( 0, self.dataset2 ) print "ASCII Conversion" HexDump().hexPrint( 0, E2A(self.dataset2) )
A combined test suite.
test.py (52)
#!/usr/bin/env python from __future__ import print_function """Combined tests.""" import unittest import test_dde import test_data_profile import logging def suite(): s= unittest.TestSuite() for m in ( test_dde, test_data_profile ): s.addTests( unittest.defaultTestLoader.loadTestsFromModule( m ) ) return s if __name__ == "__main__": import sys logging.basicConfig( stream=sys.stdout, level=logging.CRITICAL ) tr= unittest.TextTestRunner() result= tr.run( suite() ) logging.shutdown() sys.exit( len(result.failures) + len(result.errors) )
Five Demos.
demo1.py (53)
→ Demo Shell Escape (56) → Demo DOC String (54) → Demo CVS Cruft and pyweb generator warning (57) → Demo Imports (55) → Demo Subclass Definitions (58) → Demo 1 - complete, detailed examination of a file (59) → Demo 2 - low-level hex dump of a file (60) → Demo 3 - detailed, field-by-field occurance dump of a record (61) → Demo 4 - detailed, field-by-field scan of distinct values of a record (62) → Demo 5 - detailed, field-by-field occurance dump of a record (63) → Demo Main (64)
DOC string
Demo DOC String (54)
"""Examine a sample COBOL file. This requires that files be transferred in strictly BINARY mode from the mainframe. Any ASCII to EBCDIC conversion is a bad thing. Performance: 15,000 field values per second. There are five demos: demo1 collects the ranges of data values from a file demo2 does low-level hex dumps of records in a file demo3 does detailed structure dumps of records in a file demo4 shows the FieldScan and FileScan classes to examine a file (similar to demo1) demo5 shwos the FieldDump and FileScan classes to examine selected records (similar to demo3) """
Used by: demo1.py (53)
This demo application uses the cobol_dde module to parse a record layout and the data_profile module to analyze data in a file defined by the record layout.
Demo Imports (55)
import os, time import cobol_dde, data_profile
Used by: demo1.py (53)
Demo Shell Escape (56)
#!/usr/bin/env python
Used by: demo1.py (53)
Demo CVS Cruft and pyweb generator warning (57)
__version__ = """$Revision$""" ### DO NOT EDIT THIS FILE! ### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'. ### From source DDE.w modified Sun Mar 14 10:46:18 2010. ### In working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.
Used by: demo1.py (53)
Demo Subclass Definitions (58)
# Extend the cobol_dde module's ``Lexer`` class to override how the lines are cleaned prior to parsing. class CleanupLexer( cobol_dde.Lexer ): """Cleanup as part of Lexing: drop ID from 0:6 and sequence from [72:]. Also drop "SKIP" commands.""" def lineClean( self, text ): lines= [ l[6:72].rstrip()+' ' for l in text.split('n') if len(l) > 6 and l[6] not in ('*','/') ] lines= [ l for l in lines if l.strip() != 'SKIP1' ] return lines # Extend the FieldDump class to dump OVRMCUDB record instances. class OVRMCUDBdump( data_profile.FieldDump ): """Dump OVRMCUDB record instances.""" def __init__( self, dde ): FieldDump.__init__( self, None ) self.dde= dde def process( self, recno, data ): """Use the value-length field to decode records of OVRMCUDB file.""" print "nRecord %d:" % recno print ' CUST-NO', self.dde.get('MCUDBI-CUST-NO').valOf( data ) print ' DATA-ITEM', self.dde.get('MCUDBI-DATA-ITEM').valOf( data ) print ' YR', self.dde.get('MCUDBI-YR').valOf( data ) print ' VALUE-LENGTH', self.dde.get('MCUDBI-VALUE-LENGTH').valOf( data ) value_length= self.dde.get('MCUDBI-VALUE-LENGTH').valOf( data ) if value_length == 1: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-1').setIndex(i).valOf(data) elif value_length == 2: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-2').setIndex(i).valOf(data) elif value_length == 3: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-3').setIndex(i).valOf(data) elif value_length == 4: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-4').setIndex(i).valOf(data) elif value_length == 5: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-5').setIndex(i).valOf(data) elif value_length == 6: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-6').setIndex(i).valOf(data) elif value_length == 7: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-7').setIndex(i).valOf(data) elif value_length == 8: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-8').setIndex(i).valOf(data) elif value_length == 8: for i in range(1,13): print ' VALUE-FIELD', i, self.dde.get('MCUDBI-VALUE-FIELD-9').setIndex(i).valOf(data) else: print "Invalid record" cobol_dde.HexDump().hexPrint(recno,data) # Extend the FileScan class to handle a damaged OVRMCUDB file where record 1 is damaged. class OVRMCUDBfile( data_profile.FileScan ): """Special-purpose FileScan to handle damaged record in OVRMCUDB.""" def reclen( self ): if self.record == 0: return 92 return 97
Used by: demo1.py (53)
The first demo function shows a relatively complex program to dump one record and summarize other records.
Create a list, fieldList with instance of NumFieldValue.
Get the record size from dde. Open a file, assigning it to theFile.
Dump record number 1. In this case, the record is damaged, and is only 92 bytes in length. The first record is read into data. An instance of the OCRMCUDBdump class, dump1Rec is created from the record definition in dde.
The dump1Rec process method produces a dump of the record in data.
Dump the remaining records. The dump proceeds as follows:
Read a 97-byte record into data. While there is content in data, process each record. Increment the record counter, recno. Set f to each field in fieldList, perform the fields getFrom() method to extract the appropriate bytes from data. If the requested number of records have been processed, break from the loop. Read a 97-byte record into data.
Produce a final report from each NumFieldValue instance in fieldList. Set f to each field in fieldList; print an appropriate heading. The fqTable method of f returns a frequency table; set di, and c to the key and count from the frequency table; print di and c.
Demo 1 - complete, detailed examination of a file (59)
def demo1( aDef, aFileName, end=10 ): """Complete examination of a file.""" # Heading print "File: %snCopybook: %s" % ( aDef, aFileName ) # Create a RecordFactory() to parse copy books and create DDE record definitions rf= cobol_dde.RecordFactory() dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) ) # Use the Report() visitor to produce a report on the record structure print "nRecord Layout" rpt= cobol_dde.Report() dde.visit( rpt ) # Identify the fields to be examined fieldList= [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ), data_profile.NumFieldValue( dde, 'MCUDBI-YR' ), data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' ) ] # Get the record size, open the input file, read and dump the records reclen= dde.size theFile= file(aFileName,"rb") # Dump record 1 (92 bytes) recno= 1 data= theFile.read(92) dump1Rec= OVRMCUDBdump(dde) dump1Rec.process( recno, data ) # Scan all remaining records (97 bytes each) data= theFile.read(97) while data: recno += 1 for f in fieldList: f.getFrom( data ) if recno == end: break data= theFile.read(97) # Final report for f in fieldList: print "n%-10s %7s" % ( f.cobolName, "count" ) for di,c in f.fqTable(): print "%-10s %7d" % ( di,c ) theFile.close()
Used by: demo1.py (53)
The second demo function shows a simple program to produce a hex dump of the first hundred records.
Set h to a HexDump instance defined on aFileName. Use h dump method to dump a 92-byte record. Set i to 100 values (from 0 to 99); Use h dump method to dump a sequence of 97-byte records.
Demo 2 - low-level hex dump of a file (60)
def demo2( aDef, aFileName ): """Low-level hex dump of a file.""" print "nDump of 100 records" h= data_profile.HexDump( aFileName, 80 ) h.dump(92) for i in range(100): h.dump(97)
Used by: demo1.py (53)
The third demo function shows a simple program to produce a detailed field-by-field dump of one record.
The dde visitOccurance method produces a dump of each occurance of each field in the record in fd. Dump record number 2: set data to the next 97 bytes; set fd to a Dump subclass of Visitor; use dde visitOccurance to dump the record.
Demo 3 - detailed, field-by-field occurance dump of a record (61)
def demo3( aDef, aFileName ): """Detailed field-by-field occurance dump of a record.""" # Heading print "File: %snCopybook: %s" % ( aDef, aFileName ) # Create a RecordFactory() to parse copy books and create DDE record definitions rf= cobol_dde.RecordFactory() dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) ) # Get the record size, open the input file, read and dump the first few records reclen= dde.size theFile= file(aFileName,"rb") # Dump record 1 (92 bytes) recno= 1 data= theFile.read(92) # Detailed occurance-by-occurance dump of the most recent record print "nField Dump" fd= cobol_dde.Dump( data ) dde.visitOccurance( fd ) # Dump record 2 (97 bytes) data= theFile.read(97) fd= cobol_dde.Dump( data ) dde.visitOccurance( fd )
Used by: demo1.py (53)
The fourth demo function shows a simple program to scan selected fields of a file.
Print a simple heading Create a RecordFactory, rf. Create an CleanupLexer to process the definition file. Pass the CleanupLexer to rf to create a DDE, called dde. Define fieldList as an instance of FieldScan built from three instances of NumFieldValue. Define fs as an instance of OVRMCUDBfile, that uses fieldList to scan the file. Use fs process to examine each row, using fieldList to examine selected fields.
Demo 4 - detailed, field-by-field scan of distinct values of a record (62)
def demo4( aDef, aFileName, end=10 ): """Detailed field-by-field scan of a record.""" # Heading print "File: %snCopybook: %s" % ( aDef, aFileName ) # Create a RecordFactory() to parse copy books and create DDE record definitions rf= cobol_dde.RecordFactory() dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) ) # Create a FieldScan for the three fields we care about fieldList= data_profile.FieldScan( [ data_profile.NumFieldValue( dde, 'MCUDBI-DATA-ITEM' ), data_profile.NumFieldValue( dde, 'MCUDBI-YR' ), data_profile.NumFieldValue( dde, 'MCUDBI-VALUE-LENGTH' ) ] ) # Create a FileScan for the file, using the given FieldScan list of fields fs= OVRMCUDBfile( dde, fieldList, aFileName ) # Process through the given ending record fs.process( end )
Used by: demo1.py (53)
The fifth demo function shows a simple program to dump selected records of a file.
Print a simple heading Create a RecordFactory, rf. Create an CleanupLexer to process the definition file. Pass the CleanupLexer to rf to create a DDE, called dde. Define fieldList as an instance of OVRMCUDBdump, based on dde. Define fs as an instance of FileScan, that uses fieldList to scan the file. Use fs process to examine 4 rows, using fieldList to dump all fields.
Demo 5 - detailed, field-by-field occurance dump of a record (63)
def demo5( aDef, aFileName ): """Detailed field-by-field occurance dump of a record.""" # Heading print "File: %snCopybook: %s" % ( aDef, aFileName ) # Create a RecordFactory() to parse copy books and create DDE record definitions rf= cobol_dde.RecordFactory() dde= rf.makeRecord( CleanupLexer(file(aDef,"r").read()) ) # Create a special FieldDump that can separate the variant record formats # for the OVRMCUDB file. fieldList= OVRMCUDBdump(dde) # Create a FileScan for the file, using the given FieldDump list of fields fs= FileScan( dde, fieldList, aFileName ) # Process through the given ending record fs.process( 4 )
Used by: demo1.py (53)
The demo main calls a selected function on a selected file and copybook. This can be replaced with a function that uses getopts to parse the command-line arguments.
Demo Main (64)
if __name__ == "__main__": start= time.clock() copyBook= r"J:Appl-DevFinanceMISMCUDBIW.TXT" dataFile= r"J:Appl-DevFinanceMISOVRMCUDB_bin.txt" #demo1( copyBook, dataFile, 100 ) demo4( copyBook, dataFile, 100000 ) print "Run Time: %7.4f" % (time.clock()-start)
Used by: demo1.py (53)
The following additional elements are part of a complete package.
README (65)
############################################## COBOL DDE (Data Definition Element) Processing ############################################## This modules parses COBOL copybooks (DDE's) to help write ETL and Data Profiling applications. Installation ------------ Install with the following command:: python setup.py install Usage ----- See demo1.py for demonstration applications built with these tools. Documentation ------------- See `dde.html <dde.html>`_ for the detailed documentation of this application. Build ----- The source and documentation is built via the pyWeb tool from ``DDE.w``. For information on pyWeb, see http://sourceforge.net/projects/pywebtool/.
setup.py (66)
#!/usr/bin/env python from distutils.core import setup setup(name="DDE", version="1.2", description="COBOL Data Definition Element Processing", author="Steven F. Lott", author_email="s_lott@yahoo.com", url="https://sourceforge.net/projects/cobol-dde/", py_modules=['cobol_dde', 'data_discovery'] )
MANIFEST.in (67)
include *.w *.html *.css *.py *.tex *.pdf
Files
MANIFEST.in: | → (67) |
---|---|
README: | → (65) |
cobol_dde.py: | → (1) |
data_profile.py: | |
→ (39) | |
demo1.py: | → (53) |
setup.py: | → (66) |
test.py: | → (52) |
test_data_profile.py: | |
→ (49) | |
test_dde.py: | → (30) |
Macros
DDE Blank When Zero Clause Parsing: | |
---|---|
→ (19) | |
DDE Class Construction methods: | |
→ (12) | |
DDE Class Hierarchy - defines group and elementary data descriptions elements: | |
→ (11) | |
DDE Class Record Scanning methods: | |
→ (14) | |
DDE Class Reporting methods: | |
→ (13) | |
DDE Common Visitors for reporting on a DDE structure: | |
→ (15) | |
DDE Element Parsing: | |
→ (28) | |
DDE Exception Definitions: | |
→ (7) | |
DDE Justified Clause Parsing: | |
→ (20) | |
DDE Lexical Scanner base class provides the default lexical scanner implementation: | |
→ (16) | |
DDE Occurs Clause Parsing: | |
→ (21) | |
DDE Overheads - Shell Escape, Doc String, Imports, CVS Cruft: | |
→ (2) → (3) → (4) → (5) → (6) | |
DDE Picture Clause Parsing: | |
→ (18) | |
DDE Record Parsing: | |
→ (29) | |
DDE RecordFactory parses a record clause to create a DDE instance: | |
→ (17) | |
DDE Redefines Clause Parsing: | |
→ (22) | |
DDE Redefines Strategy class hierarchy - to define offsets to DDE elements: | |
→ (10) | |
DDE Renames Clause Parsing: | |
→ (23) | |
DDE Sign Clause Parsing: | |
→ (24) | |
DDE Synchronized Clause Parsing: | |
→ (25) | |
DDE Test copybook 1 with basic features: | |
→ (31) | |
DDE Test copybook 2 with 88-level item: | |
→ (32) | |
DDE Test copybook 3 with nested occurs level: | |
→ (33) | |
DDE Test copybook from page 174 with nested occurs level: | |
→ (34) | |
DDE Test copybook from page 195 with simple redefines: | |
→ (35) | |
DDE Test copybook from page 197 with another redefines: | |
→ (36) | |
DDE Test copybook from page 198, example a: | |
→ (37) | |
DDE Test copybook from page 198, example b: | |
→ (38) | |
DDE Usage Clause Parsing: | |
→ (26) | |
DDE Usage Strategy class hierarchy - to extract data from input buffers: | |
→ (9) | |
DDE Value Clause Parsing: | |
→ (27) | |
DDE Visitor base class - to analyze a complete DDE tree structure: | |
→ (8) | |
DProfile CVS Cruft and pyweb generator warning: | |
→ (43) | |
DProfile Class Definitions: | |
→ (45) | |
DProfile DOC String: | |
→ (40) | |
DProfile Field Value Class Hierarchy to accumulate distinct values: | |
→ (47) | |
DProfile Field and Record Scanning does either dumps or disctinct value processing: | |
→ (48) | |
DProfile Hex Dump Class to do raw dump of a record: | |
→ (46) | |
DProfile Imports: | |
→ (41) | |
DProfile Shell Escape: | |
→ (42) | |
DProfile Test 1: | |
→ (50) | |
DProfile Test 2: | |
→ (51) | |
DProfile Utility Functions: | |
→ (44) | |
Demo 1 - complete, detailed examination of a file: | |
→ (59) | |
Demo 2 - low-level hex dump of a file: | |
→ (60) | |
Demo 3 - detailed, field-by-field occurance dump of a record: | |
→ (61) | |
Demo 4 - detailed, field-by-field scan of distinct values of a record: | |
→ (62) | |
Demo 5 - detailed, field-by-field occurance dump of a record: | |
→ (63) | |
Demo CVS Cruft and pyweb generator warning: | |
→ (57) | |
Demo DOC String: | |
→ (54) | |
Demo Imports: | → (55) |
Demo Main: | → (64) |
Demo Shell Escape: | |
→ (56) | |
Demo Subclass Definitions: | |
→ (58) |
User Identifiers
DDE: | 1 3 6 7 8 9 [11] 12 13 14 17 28 29 30 40 43 47 50 57 59 61 62 63 65 66 |
---|---|
Dump: | 3 [15] 31 32 33 35 36 37 38 46 58 59 61 |
E2A: | 40 [44] 51 |
EBCDIC2ASCII: | [44] |
FieldDump: | 40 [48] 54 58 63 |
FieldScan: | 40 [48] 54 62 |
FieldValue: | 40 [47] |
FileScan: | 40 [48] 54 58 62 63 |
HexDump: | 40 [46] 58 60 |
Lexer: | 3 [16] 31 32 33 34 35 36 37 38 50 58 |
NonRedefines: | 3 [10] 28 |
NumFieldValue: | 40 [47] 59 62 |
RecordFactory: | 3 12 [17] 30 50 59 61 62 63 |
Redefines: | 3 [10] 11 14 22 |
Report: | 3 [15] 30 44 50 59 |
Source: | 3 [15] |
SyntaxError: | 3 [7] 18 26 28 |
TestDDE: | [49] |
Test_DProfile_1: | |
[50] | |
Test_DProfile_2: | |
[51] | |
UnsupportedError: | |
3 [7] 21 23 24 25 | |
Usage: | 3 [9] 11 26 65 |
UsageComp: | 3 [9] 26 |
UsageComp3: | 3 [9] 26 |
UsageDisplay: | 3 [9] 26 28 |
UsageError: | 3 [7] 14 33 |
Visitor: | 3 [8] 15 |
__version__: | 5 6 [43] 57 |
cobol_dde: | 30 40 41 49 [55] 58 59 61 62 63 66 |
data_profile: | 49 [55] 58 59 60 62 |
decimal: | [4] 9 14 18 |
os: | [55] |
re: | [4] 16 |
string: | 3 [4] 11 14 16 44 |
struct: | [4] 9 |
time: | [55] 64 |
Created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py at Sun Mar 14 10:46:52 2010. pyweb.__version__ '$Revision$'. Source DDE.w modified Sun Mar 14 10:46:18 2010.
Working directory '/Users/slott/Documents/Projects/COBOL_DDE-1.2'.