Shippity-doo-dah

(stackoverflow rep: 7576, Project Euler 83/257 complete)
In my band days we called it "Gaffer"

In my band days we called it "Gaffer"

Reading Joel’s1 Duct-Tape Programmer article this morning (in the interests of full disclosure I should admit without additional prevarication that I have a large roll of “Duck” tape in the second drawer of my desk as I type) one sentence smacked me metaphorically between the eyes:

“Shipping is a feature”

I was transported back a couple of decades to the time when the bank for whom I was then working discovered that it was building not one but two settlement systems (the things that ensure that what traders agree should happen actually does) in two locations: London and Zurich. In London we were targeting our DEC VAX/Oracle platform, while the Swiss were designing with their local Tandem Non-Stop installation. And we’d both have gotten away with it if it hadn’t been for that meddling CEO…

It was decreed that The Wise Men (external auditors) be appointed to review the two projects and pronounce which should live and which should consign its members to the dole queue.

The Wise Ones duly decamped to Zurich to spend a few weeks working through the cabinets of meticulously-detailed standards-compliant design documentation that had been lovingly crafted over the past several months, with coding about to start. Then they came to see us. It didn’t look so good.

dried-up and crusty now...

dried-up and crusty now...

What documentation we had was months old (from a previous, aborted start of the waterfall) and coated in Tipp-Ex. Remember the white error-correction fluid we used all the time back in the 20th Century? When we still wrote “memos”? After a week of vagueness and frustration a set of presentations were scheduled for the Friday, at which we proposed to try to fill in the gaps.

england2switz1

Ing-er-land!

London won.

Yay us, but how? On most objective measurements we were deficient when compared with our continental rivals, even we agreed on that. But on that Friday afternoon, I got to stand up to summarise the differences, positive and negative between the two projects, as seen by the London team. I think what may have swung it was the part where I got to say “our system has been settling trades since 3 o’clock this morning”.

In about nine months, one team had done everything by the Book (don’t know the title, but I bet it had “Structured” in it) and had reached the point where they had, well, a book. Lots of books, in fact – they’d worked really hard. In the same time, we built a system and even better, shipped it. I don’t think anyone had written any Agile books by then – even if they had, we hadn’t read them.

Our team hadn’t done an awful job by any means, you understand: there’d been a few weeks of up-front requirement-gathering/scoping.  We had a massive data model that we Tipp-Exed down to the minimum needed. We had an outline architecture that, through luck or judgement, proved to be appropriate. Probably best of all, though, we sat with our users while we built their system. Better, as we built different features we moved around so we were always within speaking distance of our domain expert (I don’t think we’d done the whole “domain” thing then – we just called them “users”). So  we seldom got very far off track while stuff got built, and we were, with hindsight, feature-driven and relatively lowly-coupled/highly cohesive at the component level, all Good Things. Mostly written in COBOL, too.

Looking back, we were lucky: we didn’t manage to repeat the magic and fell back into time and cost overruns with the next couple of large projects. At least we were still being paid, unlike our erstwhile colleagues in Switzerland.


1 I call him by his first name because we share so much; we’re only a few slots apart on page 13 of StackOverflow as I write this. Page-mates, don’t you know.

Round-tripper

(stackoverflow rep: 3856, Project Euler 63/235 complete)

Good grief. I wrote the first draft of this about a month ago, planning on completing and posting it when the code was done. I expected that to take a few more days. A little more work required on the estimating front, then.

I’m starting to go off Oracle.

Let me put that into context a little. I first encountered Oracle some time in 1998, when version 5 was all the rage. I’d actually taught data analysis, third and fifth normal form, stuff like that for a few years previously but actual hands-on table creation had to wait. Strange but true. Anyway, over the next two or three years, some of which I spent as “Technical Architect” for the investment bank where I worked, I got to be something of a whiz with both version 5 and the swanky new version 6. Heck, I know the query optimiser’s rules off by heart. I’m not just blowing my own trumpet, mind: when I was untimately (fortuitous typo retained) laid off, I was offered a job by Oracle, which I rejected because I didn’t want to take a pay cut.

I spent five more years in Oracle-land with another bank before drifting into the realms of Sybase in its early MS SQL Server guise, and then Sybase itself across three jobs and four years (it seems like longer). Now, fourteen years after we parted company, Oracle and I are back together.

But we’ve both changed. I no longer code in COBOL and have acquired a pathological dislike of business logic in the database. Oracle has a cost-based optimiser, loves to grab all your business rules (more processors = more revenue) and has become a fat bloated porcine creation. Even the free “personal” 10g Express Edition for Windows is a 165MB download.  (OK, SQL Server Express 2008 is even larger, I checked). When running, the thing takes out a 642MB virtual machine. OK, it’s almost entirely swapped out, but still.

How we did parallel processing in the old days

How we did parallel processing in the old days

But Oracle is still a helluva fast platform. Unoptimised I was seeing about 8K inserts a minute on my development PC, three times that on a real server. Unfortunately our db server currently lives abroad for tax reasons (or something) and the network latency is fierce. About 900 inserts a minute fierce. So I needed to batch up my inserts or enter the living hell that is SQL Loader.

In order to get multiple insert processes working within my Ruby On Rails-based code, I split each file into several fragments, then run a separate process on each fragment. This takes a bit of doing, generating lots of CMD files that run ruby scripts with “START [/WAIT] CMD /C whatever_I_want_goes_here“.

My file-splitting code, I thought, was rather spiffy – it needs to put the headings from the original to each fragment (because they’re used to figure out what’s in the file) then it starts dealing out the records:

def create_fragment_files(paths)
  File.open(file_path, 'r') do |fin|
    hdgs = fin.readline.chomp
      files = paths.map { |path| File.open(path, 'w+') }
      files.each { |fout| fout.puts hdgs }
      fin.each_line do |line|
        files.first.puts line
        files.push files.shift # the first shall be last...
      end
    files.each { |fout| fout.close }
  end
end

There are faster ways, I’m sure – I could calculate the “ideal” file size and dump records into a file until it’s reached, but this is fast enough (well under a minute for an 85MB file) and it pleases me.

There’s a handy little library, ar-extensions, that makes batching of inserts possible within ActiveRecord (which is the default data mapping library within Rails). It works nicely with MySQL, but turned out to have the Oracle code stubbed and invalid. It only took me a day or two to find a solution to that problem, although I still haven’t figured out how to push an update through a proxy server to github. Finally a chance to do something open sourceful, and I’m thwarted at every turn.

So all in all, it’s taken a month. OK, a month in which a lot of other stuff got done, but still.On the plus side, I just fired it up and I’m watching about 36,000 inserts a minute go through. It’ll be faster when the lookup tables are fully populated. (Another day on, and I’m looking at it: 46,000 – and I still have a few tricks up my sleeve)

While the nearly-two years’ of data is backfilling I now get to rewrite the front end.

And the point of this post? In no small part, to remind me of what I actually spent the lion’s share of the last month doing. Also, to record my first-ever open-source contribution, even if I still haven’t worked out how to get my source out into the open.

If you have been, thanks for your forebearance.