No more .bak files

On of the earliest lessons I learned in computing was to always make a backup. In the days of MS-DOS, it was common to modify your autoexec.bat file. One learned it was smart to make a copy and name it autoexec.bak prior to making any change. Later, scripts which installed software would automatically make backup copies of the file it modified during the install. In general, you would have one backup copy, but you may create additional ones with slightly different extensions.

Later, when desktop computer systems started to allow longer file names, one might include the date in the file, or name the extension .backup. The problem with the date, was should you include the date of the modification time of the original file, or the date the current modification.

In recent years, the need to create backup files has been reduced. Many times software is installed from a version repository, and the complete file history is contained in version control. Another aspect which has helped is the prevalence on snap-shotting file systems. NetApp file system appliances have offered snapshots for a long time now, and they work both on Windows and unix. More recently, Solaris introduced the ZFS file system which contains the snapshot capability. Lastly, backup devices like Apple's Time Machine make it easy to go back and retrieve a previous version of a file.

I'm often advocating putting everything in version control, and for some projects, release directly via version control. For example, this wiki runs directly on a svn checkout from Mediawiki. Any local modifications I've made can easily be seen with an svn status command. I've even gone so far to put my home directory under version control.

However, not everything makes sense to be under version control, and some machines due to firewall restrictions can't access a version control repository. Quite often though, the file system contains backups, so there is little risk in locally modifying a file.

However, there is the occasion where I'm not on a file system with snapshots, and I'm not working on an project checked out from svn. An example of this is the files in my /etc/apache2 directory. These files are important, but I don't want to create a remote repository just for them. If I want to make a change to my httpd.conf, I'm still tempted to make a backup of the file and then make the change.

My 2010 New Years Resolution is never to create a '.bak' file again. These files litter ones directories, provide little meaning, and five minutes after being created are no longer trusted to be a stable version one could revert to. What's the solution instead of creating .bak files? The answer is to simply create a local distributed version repository.

For example, here is my habit pre-2010:

<geshi> cd /etc/apache2 cp -p httpd.conf httpd.conf.20091231 vi httpd.conf apache2 restart </geshi>

Here's my new way of doing things:

<geshi> cd /etc/apache2 hg status

abort: There is no Mercurial repository here (.hg not found)!

hg init hg addremove hg commit -m "initial state" vi httpd.conf apache2 restart hg commit -m "increased the number of worker processes" </geshi>

The above offers the following advantages over the .bak file:

cleaner directories - no clutter of all backup files
space savings once more than one backup file has been created
An easy way to go back to a prior, stable version
A comment indicating the change
an easy way to diff and revert if the change doesn't work out.

In this example, I've used Mercurial, but one could do the same thing using git.