Software Ungineering

From EggeWiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

I've been thinking about creating an actual blog, instead of just adding pages to my Wiki. As I've been writing more rants and less tech tips, maybe the blog format would be better.

In my previous job I was kept so busy getting things done that I didn't have much time to read blogs, much less write them. Currently, the project which [ThoughtWorks] has assigned me to is a little less [helter skelter] than supporting a trading system. This has many advantages, like lower stress and fewer heart attacks, and feels more sustainable. The difference is kind of like that between playing Half Life 2 and playing checkers.

We at ThoughtWorks pride ourselves in writing quality, well tested software. I like writing quality software which is stable and that I can be proud of, however, there are times I feel ThoughtWorkers go overboard in some quest for writing 'perfect' software. In fact, I often feel this way to some extent or another.

Today, I'll share a story in 'ungineering'. Our app uses Sybase to store some data in it. We have a single database, with a single table. There are no joins, no foreign keys, no replication, or stored procs, triggers, etc.. We use a database so that we can write records to it throughout the day, and at night have a batch job export the transactions to a flat file. The database has ACID transactions, so we don't have to worry about the app and the batch run interfering with each other. It doesn't get much more basic than this.

The table in the database contains an identity column, which is the table's primary key. An identity column is simply a column which the database populates with a unique incrementing integer value. Identity columns make for a good surrogate key, and nearly every database has some support for them.

This is all well and good until Sr Dev notices this column and decides that identity columns are bad because they are outside of the programmers control. Sr Dev tells Jr Dev to replace the identity column with a GUID. No, we're not doing multi-master replication, or dealing with billions of rows, we simply have a single table which queues up data for a nightly batch run. In the strictest since the table doesn't even need a primary key. My preferences for creating primary keys are such:

  1. Use an identity column if your database supports it, and you don't have other constraints which prohibit such.
  2. If your database can't auto generate sequential integers, then create a control table in the database where you can generate sequential values yourself.
  3. If you have multi-master replication, and God help you if you do, have each site generate unique sequences of integers. Generally either the most significant digit or the least significant digit should indicate the site which initially created the record.
  4. If you have no other choice, then use GUIDs. Maybe you have a mobile sales force which needs to do two way synchronization with a master database. GUIDs will make your life difficult, so use it sparingly and only when you really have to.

Sr Dev & I clearly have different views on databases. Never mind my database experience, certification, awards, or even that I've built a relational database before. These things are discounted partially because I have a 'get things done' background, and not a 'cleaner code is always better' background. So, Jr Dev spends the next week 'ungineering' the database. This takes him a while because he doesn't have a great deal of database experience and I refuse to help him complicate the database. When I see single lines of code being replaces with hundreds of lines of 'better code', I have to look the other direction.

So today Jr Dev announces he's completed the database 'refactoring' work. He checks in his code and a little while later we hear reports that our UAT system is broke. Having [Cruise Control] running can be a useful tool, but it can also let developers feel overly confident in their code. (Especially the ones who aim for 100% code coverage.) Having a green light can, for some, be a substitute for doing a quick reality check. I.e,. checking the current weather online is no substitute for looking out the window before you leave the house.

I ask Jr Dev "Did you make the schema changes in the UAT system which you made locally?" The answer was obviously no. The question which followed of course was can we just drop the table. This wasn't practical, because we have testers inputing data into the system, and they need the nightly batch to reconcile. So, now Jr. Dev is faced with the task of converting the primary key from an integer to a GUID. I figure it should only take him one more week, simply because no other tables depend on the primary key.

Oh, and did I mention we're billing a client for this refactoring?