RegexFieldValidation

From EggeWiki

Knowing how to use regular expressions will make you a better programmer. Most every language has a good regex library, and you can use regex's to do search/replaces in most IDEs. Here, I present an example of translating a list of validation rules into regular expressions. This IMHO keeps the code need and tidy. Keeping things well laid out, can make it clear what each regular expression does.

The business has a list of rules to validate a driver's license number:

  • Maximum length of 9 characters.
  • Alphanumeric characters only.
  • Must have at least 4 numeric characters.
  • Must have no more than 2 alphabetic characters.
  • The third and fourth characters must be numeric.

When I saw these rules, my first thought was can I take these definitions and write a grammar/interpreter for them. I decided, although that would be fun, it probably wouldn't be time effective in this case. Next, I tried to see if each rule could be expressed as a regular expression, or if I was going to have to write Java code for each rule. I decided to try the regex route, and if that fails, then I would code some rule logic.

To start my class, I used the TinyType abstract class. You can read more about that in my Curiously recurring template pattern post.

public class DriversLicense extends TinyType<String> {

    public DriversLicense(String s) {
        super(s.replaceAll(" ", ""));
    }
}

This is all that's needed for a TinyType. Now, I want add the validation logic. I could put this in the ctor, but I'd prefer to add it as a separate method. This allows the class to hold dirty data, but let me later check if it's valid. For my purpose, this second method is going to be more flexable.

Next, I created some unit tests for the expected behavior:

    public void testMaxLength() {
        assertTrue(new DriversLicense("012345678").isValid());
        assertFalse(new DriversLicense("0123456789").isValid());
    }

    public void testAlphaNumeric() {
        assertTrue(new DriversLicense("Az0123456").isValid());
        assertFalse(new DriversLicense("01234567_").isValid());
        assertFalse(new DriversLicense("01234567@").isValid());
    }

    public void testFourOrMoreNumericCharacters() {
        assertTrue(new DriversLicense("0123").isValid());
        assertTrue(new DriversLicense("0A12Z3").isValid());
        assertFalse(new DriversLicense("012").isValid());
        assertFalse(new DriversLicense("AU012").isValid());
    }

    public void testTwoOrFewerAlphaCharacters() {
        assertTrue(new DriversLicense("N012345Z").isValid());
        assertTrue(new DriversLicense("0A12Z3").isValid());
        assertFalse(new DriversLicense("A0B1C2345").isValid());
        assertFalse(new DriversLicense("012345ABC").isValid());
    }

    public void testThirdAndFourthAreNumeric() {
        DriversLicense license = new DriversLicense("N012345Z");
        assertTrue(license.getReason(), license.isValid());
        DriversLicense license1 = new DriversLicense("123456");
        assertTrue(license1.getReason(), license1.isValid());
        assertFalse(new DriversLicense("A0B1C2345").isValid());
        assertFalse(new DriversLicense("012A345").isValid());
    }
    public void testGetReason() {
        assertEquals("Maximum length of 9 characters.", new DriversLicense("0123456789").getReason());
    }

I decided that the class should have two methods. One, isValid returns a boolean, the other, getReason returns a string of the failed rule.

With my failing unit test, I was ready to start writing the rules, and re-testing as I went.

Unfortunetly, Java does not have a pair or tuple class, but I can get the same effect using a collection.

    private static final Map<Pattern, String> RULES = new LinkedHashMap<Pattern, String>();

    static {
        RULES.put(Pattern.compile("^.{0,9}$"), "Maximum length of 9 characters.");
        RULES.put(Pattern.compile("^[A-Za-z0-9]*$"), "Alphanumeric characters only.");
        RULES.put(Pattern.compile("^([A-Za-z]*[0-9][A-Za-z]*){4,}$"), "Must have at least 4 numeric characters.");
        RULES.put(Pattern.compile("^[0-9]*([0-9]*[A-Za-z][0-9]*){0,2}[0-9]*$"),
                "Must have no more than 2 alphabetic characters.");
        RULES.put(Pattern.compile("^.{2}[0-9]{2}.*$"), "The third and fourth characters must be numeric.");
    }

Having the business rules as a string associated with each rule, makes the code self documenting, and allows me to write the getReason method. I tried to have each rule match only the text on the right. For example, for the max length, I match on any characters, so if they have illegal characters, but the length passes, this rule won't fail. Lastely, here's the code which iterates over the rules. If Java supported multiple return values, I'd probably combine these into a single function. Alternatively, I could define a pair type and return that.

    boolean isValid() {
        for (Pattern rule : RULES.keySet()) {
            if (!rule.matcher(getValue()).find()) {
                return false;
            }
        }
        return true;
    }

    String getReason() {
        for (Map.Entry<Pattern, String> rule : RULES.entrySet()) {
            if (!rule.getKey().matcher(getValue()).find()) {
                return rule.getValue();
            }
        }
        return "valid";
    }

Including JavaDoc, the class is 57 LOC. It was easy to get 100% code coverage, and it would be easy to add or remove rules.