Version control for Excel workbooks, Part 1
10 September 2008 16 Comments
(I expect this will end up being renamed “part 1”. Oh heck, I’ll change it now)
For whatever sins I may have committed, I have spent much of the last (pauses to count fingers, pauses longer to add toes) 18 years building Excel spreadsheets. I’ve been using the application, one of Microsoft’s rare gems, for all its documented frailties, since version 3.0 and I’ve spent a good part of my life in VBA since it first appeared in version 5.0, a little over a decade ago. I’m very fond of Excel and it’s been my number one meal ticket for most of the period.
By now I’ve built up a number of useful utilities that I carry with me from job to job. I have an Excel/VBA unit-testing framework, a VBA profiling tool, a simple code analyser, a (thin) abstraction of ADO for database access and several more, rather less well-defined bits and pieces.
At least in part to demonstrate how clever I am (or think I am, there could very well be a gap between perception and reality) I’m looking at making these available – before they lose all relevance – under some sort of open-source license, hosted somewhere suitable – CodePlex seems kind of appropriate.
But a lot of the benefit of source code control, once we’re past the obvious security-blanket stuff, derives from being able easily to examine different versions and see the changes that were made. Which is a bit tricky with an XLS file. OK, we could go full-bore 2007 and work exclusively with the XML formats, but an awful lot of people are going to be excluded on that basis and to be honest I’m not convinced that trying to read an XML Diff on a VBA project would be a tremendous amount of fun.
It seems that what should be checked in to the version control system ought, as far as possible, to be text. So all the VBA modules, classes, forms and whatnot should be checked in independently of the XLS file. Which points to some sort of Excel “disassembler” and “assembler” functions that can pull a workbook apart and put it back together. Without breaking anything.
Resources would include code modules and some text representation (a sort of “manifest”, if that word’s not already too overloaded) of references. I kind of like the idea of taking worksheets apart and storing them separately, although none of the things I’m dealing with are really “front-end” apps, so that’s probably something for the future.
While the processing involved could be built in anything that can interact with Excel as a COM server, I rather like the idea of doing it in Excel. I think I’ll explore that first, using xlUnit, my testing framework. In fact, in the interests of increased self-reference I think xlUnit could be one of the test cases.
I’d better go and start writing tests.