The Probability of Culpability

(stackoverflow rep: 10,976, Project Euler 98/288 complete)

(in which we are reminded that finding solutions to most of the issues we face requires looking no further than our own immediate surroundings)

“I think there’s a problem with the web server – one of my pages isn’t displaying”.

“Strange – everything else seems to be fine – have you made any changes recently?”

“No – and it was fine yesterday. It must be the server; I’m going to escalate it to support.”

A few minutes later, I discovered that a directory had, as a result of a deviant drag-and-drop, been relocated inside another directory, with the result that the failing page’s Javascripts were no longer loading. Panic over, issue de-escalated. It was our team’s collective fault after all.

I have, over the years, found myself working in a support capacity. Needs must, from time to time. I doubt very much that I’m the only support person who ever had a user who claimed that they’d found a bug in Windows/a compiler/Excel/some other third-party application. I also suspect that the number of occasions where the actual error really did reside where the user asserted it did is small. Vanishingly small.

Ask not at what the finger pointeth...

As a general rule of thumb, when faced with a perplexing error, we may consider the “stack” of software within which our problem has arisen. With that in mind, we should consider how much effort has been invested into assuring the correctness of each layer. Take an Excel/VBA application running on, say, Windows 7. I’d guess that the test effort invested in Windows would be an order of magnitude greater – at least – than for Excel. And the step should be as great when we come down to the app we built. That’s reasonable – it reflects the impact of a detected problem in each layer, an impact we can measure in time, money, reputation and the like.

The web environment above supports hundreds of internal sites and is supported by a dedicated team of engineers. The environment changes slowly and only after extensive testing. It has to – the cost of messing up is high.

Sometimes the problem really is upstream, don’t get me wrong. At least one version of Excel, when switching from German to English, doesn’t translate BRTEILJAHRE() to YEARFRAC(), for example. IronRuby has reached version 1.0 but I did discover a little bug (well, in one of the standard libraries, at least) and years ago I had the excitement of working on a PC with a P60 CPU, FDIV bug and all.

The very fact that I remember these examples shows how rare they are, compared to the number of bugs I identify and fix in my own code every day. I can barely remember what I fixed this morning, for goodness’ sake. Orders of magnitude.

The UK National Lottery (or “Lotto” as I think it’s now called) used to have a tagline: “it could be you”. In that particular case it was generally a safe bet to append “but it probably won’t be”. When we’re looking for the source of software errors, we can change that to “and it probably is”.

Estimated distribution of causes of problems

Shippity-doo-dah

(stackoverflow rep: 7576, Project Euler 83/257 complete)
In my band days we called it "Gaffer"

In my band days we called it "Gaffer"

Reading Joel’s1 Duct-Tape Programmer article this morning (in the interests of full disclosure I should admit without additional prevarication that I have a large roll of “Duck” tape in the second drawer of my desk as I type) one sentence smacked me metaphorically between the eyes:

“Shipping is a feature”

I was transported back a couple of decades to the time when the bank for whom I was then working discovered that it was building not one but two settlement systems (the things that ensure that what traders agree should happen actually does) in two locations: London and Zurich. In London we were targeting our DEC VAX/Oracle platform, while the Swiss were designing with their local Tandem Non-Stop installation. And we’d both have gotten away with it if it hadn’t been for that meddling CEO…

It was decreed that The Wise Men (external auditors) be appointed to review the two projects and pronounce which should live and which should consign its members to the dole queue.

The Wise Ones duly decamped to Zurich to spend a few weeks working through the cabinets of meticulously-detailed standards-compliant design documentation that had been lovingly crafted over the past several months, with coding about to start. Then they came to see us. It didn’t look so good.

dried-up and crusty now...

dried-up and crusty now...

What documentation we had was months old (from a previous, aborted start of the waterfall) and coated in Tipp-Ex. Remember the white error-correction fluid we used all the time back in the 20th Century? When we still wrote “memos”? After a week of vagueness and frustration a set of presentations were scheduled for the Friday, at which we proposed to try to fill in the gaps.

england2switz1

Ing-er-land!

London won.

Yay us, but how? On most objective measurements we were deficient when compared with our continental rivals, even we agreed on that. But on that Friday afternoon, I got to stand up to summarise the differences, positive and negative between the two projects, as seen by the London team. I think what may have swung it was the part where I got to say “our system has been settling trades since 3 o’clock this morning”.

In about nine months, one team had done everything by the Book (don’t know the title, but I bet it had “Structured” in it) and had reached the point where they had, well, a book. Lots of books, in fact – they’d worked really hard. In the same time, we built a system and even better, shipped it. I don’t think anyone had written any Agile books by then – even if they had, we hadn’t read them.

Our team hadn’t done an awful job by any means, you understand: there’d been a few weeks of up-front requirement-gathering/scoping.  We had a massive data model that we Tipp-Exed down to the minimum needed. We had an outline architecture that, through luck or judgement, proved to be appropriate. Probably best of all, though, we sat with our users while we built their system. Better, as we built different features we moved around so we were always within speaking distance of our domain expert (I don’t think we’d done the whole “domain” thing then – we just called them “users”). So  we seldom got very far off track while stuff got built, and we were, with hindsight, feature-driven and relatively lowly-coupled/highly cohesive at the component level, all Good Things. Mostly written in COBOL, too.

Looking back, we were lucky: we didn’t manage to repeat the magic and fell back into time and cost overruns with the next couple of large projects. At least we were still being paid, unlike our erstwhile colleagues in Switzerland.


1 I call him by his first name because we share so much; we’re only a few slots apart on page 13 of StackOverflow as I write this. Page-mates, don’t you know.

Fifty Candles (each)

(stackoverflow rep: 4675, Project Euler 72/240 complete)

The first programming language I learned was a peculiar version of BASIC, running on the ICL mainframe installed during my abbreviated university career. I seem to recall it used a magnetic drum as its primary storage device. My second programming language, and the first I was ever paid real money for working with, was COBOL. It was the lingua franca of business computing, had been around forever and I started to learn it early in 1979. It turns out that “ancient” old COBOL had at the time only been around for about 20 years. As indeed had I, and this year we both turn 50.

Ah, the fun we had!

Ah, the fun we had!

When the only tool you have is a hammer, everything looks like a nail. (I remember vividly a school “handyman”, inexplicably nicknamed “Sausage”, who would repair desks by applying a large hammer to drive screws). I used COBOL for things to which it really wasn’t suited. For a couple of years, that didn’t just mean writing peculiar code, it meant punching cards on an IBM 029 punch machine.

Programming for work was done on paper coding sheets, converted into machine-readable format (80-column punched cards) two evenings a week by Hazel the Punch Girl. Times change.

Once upon a time all code was written on these

Once upon a time all code was written on these

It wasn’t as bad as you might think: we only got to compile or run our code twice a day anyway, the rest of the time involved pencils and paper. Lots of paper. Some things change less than others.

Anyway, designing, coding, compiling and testing/debugging a program was a mammoth task: it took months. Lots of months. I think in the three years I was a programmer in my first job I wrote about eight complete programs.

Virgin input to the 029...

Virgin input to the 029...

In all, I was primarily a COBOL programmer for about 12 years, although there were secondary activities in PL/1, Fortran and C in the same period. I haven’t written a line since some time in early 1990. Can’t say that I miss it, not even now that it has object-orientation, a scary concept for a language that didn’t even use to have data scoping below “global”.

So happy birthday to COBOL, whenever it falls during the year. I can’t say I miss you but you paid the bills for over a decade, and for that I’ll always be grateful.