Copybook parser: Difference between revisions
mNo edit summary |
|||
Line 91: | Line 91: | ||
* Kept the knowledge in the code. Copybooks have some quirks, such as redefines. Another is the rules for calculating field lengths. For example, one developer might think a PIC 9(11)V99 is 14 bytes long, but in fact it's only 13 bytes. The rules for calculating these lengths were in code, and in one place. If there was a mistake, it could be fixed in one place as well. | * Kept the knowledge in the code. Copybooks have some quirks, such as redefines. Another is the rules for calculating field lengths. For example, one developer might think a PIC 9(11)V99 is 14 bytes long, but in fact it's only 13 bytes. The rules for calculating these lengths were in code, and in one place. If there was a mistake, it could be fixed in one place as well. | ||
* Was fun. Writing code is generally more stimulating than performing a repetitive task. Computers are quite happy to do repetitive tasks, so keeping the programmer from doing such keeps him/her happy. | * Was fun. Writing code is generally more stimulating than performing a repetitive task. Computers are quite happy to do repetitive tasks, so keeping the programmer from doing such keeps him/her happy. | ||
== Download == | |||
* http://www.theeggeadventure.com/2007/copybookParser.zip |
Latest revision as of 10:09, 22 September 2009
Copybooks
Parsers can be used to parse all sorts of data, not just computer languages, but many kinds of data if there is some structure to it. When I joined the Cards project, I found that we connect to a mainframe running COBOL. Communication with the mainframe is done by sending [copybook] messages to and from the server. The copybooks define the physical layout of a fixed length record. Here's an example of the kind copybook that we needed to use.
000100*N NAG0106 PRCITXDA
000200******************************************************************PRCITXDA
000300* YADA YADA YADA *PRCITXDA
000400* SOME COPYBOOK RECORD *PRCITXDA
000500******************************************************************PRCITXDA
000600 01 AU-CR-INT-INTERFACE-REC. PRCITXDA
000700 05 AU-REC-TYPE-CD PIC X(01). PRCITXDA
000800 88 AU-TAX-DETAIL VALUE 'D'. PRCITXDA
000900 88 AU-TAX-TRAILER VALUE 'Z'. PRCITXDA
001000 05 FILLER PIC X(98). PRCITXDA
001100* PRCITXDA
001200 01 AU-CR-INT-DETAIL-REC REDEFINES PRCITXDA
001300 AU-CR-INT-INTERFACE-REC. PRCITXDA
001400 05 AU-TAX-DETAIL-REC-CD PIC X(01). PRCITXDA
001500 88 AU-TAX-DETAIL-REC VALUE 'D'. PRCITXDA
001600 05 AU-ACCOUNT-NO PIC S9(17) COMP-3. PRCITXDA
001700 05 AU-DETAIL-PROD-ID PIC X(04). PRCITXDA
001800 88 AU-DETAIL-PROD VALUE 'CP '. NAG0106
001900 05 AU-NAB-COMP-EXT-CD PIC 9(02). PRCITXDA
002000 05 AU-TYPE-IND PIC X(02). PRCITXDA
002100 05 AU-FROM-EFF-DATE-CYMD PIC S9(09) COMP-3. PRCITXDA
002200 05 AU-CUST-RESIDENT-CD PIC X(01). PRCITXDA
002300 05 AU-REASON-CD PIC X(01). PRCITXDA
002400 05 AU-CR-INT-EARNED-AMT PIC S9(15)V99 COMP-3. PRCITXDA
002500 05 AU-CR-INT-TAX-AMT PIC S9(11)V99 COMP-3. PRCITXDA
002600 05 AU-CARDHOLDER-NBR PIC S9(17) COMP-3. PRCITXDA
002700 05 AU-INPUT-SOURCE-ID PIC X(04). PRCITXDA
002800 05 AU-PGM-ACTION-IND PIC X(01). PRCITXDA
002900 05 AU-PRINC-AMT PIC S9(13)V99 COMP-3. PRCITXDA
003000 05 AU-WH-PRIN-TAX-EXMPT-AMT PIC S9(13)V99 COMP-3. PRCITXDA
003100 05 AU-TAX-EXMPT-CERT-NO PIC X(07). PRCITXDA
003200 05 AU-WH-PRINC-TAX-AMT PIC S9(11)V99 COMP-3. PRCITXDA
003300 05 AU-DETAIL-FILLER PIC X(14). PRCITXDA
003400* PRCITXDA
003500 01 AU-CR-INT-TRAILER-REC REDEFINES PRCITXDA
003600 AU-CR-INT-INTERFACE-REC. PRCITXDA
003700 05 AU-TAX-TRAILER-REC-CD PIC X(01). PRCITXDA
003800 88 AU-TAX-TRAILER-REC VALUE 'Z'. PRCITXDA
003900 05 AU-TRAILER-PROD-ID PIC X(04). PRCITXDA
004000 88 AU-TRAILER-PROD VALUE 'CP '. NAG0106
004100 05 AU-TRAILER-DATE-CYMD PIC 9(08). PRCITXDA
004200 05 AU-TRAILER-TIME PIC 9(08). PRCITXDA
004300 05 AU-TRAILER-REC-CNT PIC S9(13) COMP-3. PRCITXDA
004400 05 AU-TRAILER-FILLER PIC X(71). PRCITXDA
004500* PRCITXDA
004600**** END OF PRCITXDA ******************************************** PRCITXDA
Initial approach
The initial approach to digesting and composing the messages was to use a type of iterator which would consume or write fields. For each message, a Java class would be created which would include the fields, their sizes, and their ordering. The iterator would take the list of fields and then either read or write the fields values into a hash map. This method worked, but meant that each additional copybook one wanted to use would require the same amount of effort to translate the field mapping. Additionally, this method allowed for errors to occur, as it was a manual process, and not every developer understood the nuances of the copybook format.
Parser approach
When looking at the copybook data, I figured it wouldn't be too difficult to parse it and generate Java code. Building a parser could normalize the format into a usable structure. This structure could then be used to parse the copybook data itself. The next choice is if you want to parse the copybook at compile time or at runtime. The advantage of parsing them at runtime is the flexibility to handle new and changing formats without needing to recompile your program. The compile time approach gives you the benefit of code completion, and compile time checks. I.e., for a runtime parse you might have code which looks like:
void setCard(String card) {
if (card.length() != fields.get("AU-CARDHOLDER-NBR").fieldSize()) {
return;
}
fields.get("AU-CARDHOLDER-NBR").setValue(card);
}
The generated code / compile time use might look like this:
void setCard(String card) {
if (card.length() != AU_CARDHOLDER_NBR_LENGTH) {
return;
}
setAU_CARDHOLDER_NBR(card);
}
Parser advantages
I decided to go with the code generation approach for these reasons:
- copybook formats don't change very often. If we do get a new format, and it is not completely compatible with the previous version, we will get compile time errors instead of runtime errors. Since we were using a statically compiled language (Java), we could use the compiler to help find any problems with the new spec.
- Code completion. Using an IDE which offers code completion means that the copybook field names can pop up when we press the dot key. This is quicker than thumbing through the copybook spec to find the fields that your interested it.
- Speed. This wasn't really a concern on our project, but the compiled generated code will almost always be faster than a runtime equivalent.
Once the parser was in place, and reliably generating code, it was an easy replacement for the hand coded classes. Using a parser to parse the copybooks and generate Java code provided these benefits:
- Kept the code [DRY]. The definition of the copybook format was kept in one place - the file which the vendor gave us.
- Made the common case easy, the difficult case possible. One of the record formats contained over 2,000 fields. Calculating field lengths and offsets by hand may have been near impossible in this case.
- Kept the knowledge in the code. Copybooks have some quirks, such as redefines. Another is the rules for calculating field lengths. For example, one developer might think a PIC 9(11)V99 is 14 bytes long, but in fact it's only 13 bytes. The rules for calculating these lengths were in code, and in one place. If there was a mistake, it could be fixed in one place as well.
- Was fun. Writing code is generally more stimulating than performing a repetitive task. Computers are quite happy to do repetitive tasks, so keeping the programmer from doing such keeps him/her happy.