Parsers Presentation

From EggeWiki
Jump to navigation Jump to search

Parser Presentation for ThoughtWorks Oz Team Hug June 2007

Official Title: Using Parser Generators For Fun & Profit

Presentation Type: Technical

Material used: Keynote

Length: 55 minutes

Summary: Parser generators (compiler compilers) are usually taught in a computer science compiler design class, and then quickly forgotten. However, parser generators can be used for a whole range of tasks including:

  • Code generation
  • External DSLs
  • Code quality analysis
  • Data transformation

Generally, few people other than language designers know the ins and out of lex & yacc. These days you don't need to be a C programmer to make use of parser generators. Java has quite a few, as do most major programming languages. These modern tools are easier to learn than their C ancestors, and have active community support for creating grammars and troubleshooting problems.

In this presentation, I'll show some real life examples where parser generators have saved time and increased simplicity. Here are some of the examples I'll be going into:

  • copybook parser - On this project we needed to interface with a COBOL mainframe using it's copybook format. Instead of crafting our message code by hand, I wrote a parser which took in the copy book format and generated Java code which could compose the messages.
  • code quality - PMD/Checkstyle can detect all sorts of patterns in your code. I've been collaborating on a project which [detects code duplication using Abstract Syntax Trees].
  • analytics DSL - Having a trading systems analytics all written in a DSL provided a number of advantages over directly implementing them in a general programming languages. One of the interesting abilities in this system was being able to do a topology dependency sort. This is similar to how Excel knows what cells in needs to recompute when you make a change, and in what order to do the computations.
  • regex reuse - Normalizing bond quotes in hundreds of different formats soon created more regular expressions than could be figured out. By migrating to a system built using a parser generator reduced the maintainability of the code.

Download [full presentation (pdf)]