My (Im)perfect Cousin?

in which we start to worry about the source of our inspiration
Mona Lisa Vito: So what’s your problem?
Vinny Gambini: My problem is, I wanted to win my first case without any help from anybody.
Lisa: Well, I guess that plan’s moot.
Vinny: Yeah.
Lisa: You know, this could be a sign of things to come. You win all your cases, but with somebody else’s help. Right? You win case, after case, – and then afterwards, you have to go up somebody and you have to say- “thank you“! Oh my God, what a fuckin’ nightmare!


It is one of the all-time great movies, and netted Marisa Tomei an Oscar in the process. Yes it is. It really is1.

Not only that, but My Cousin Vinny2 throws up parallels in real life all the time. Yes it does. It really does3.

Why only recently, I was puzzling over the best (or least worst) way to implement a particularly nonsensical requirement for an intransigent client. After summarising the various unpalatable options in an email, a reply arrived from a generally unproductive source. The message content made it obvious that he’d somewhat missed the point but the conclusion he drew from that misunderstanding triggered a new thought process that gave us a new, even less, er, worser solution to our problem.

Sadly, my unwitting muse has moved on now, but he left his mark for all time4 on our latest product. I suppose he should also take partial credit for the creation of a hitherto unknown development methodology: Powerpoint-Driven Development, but that’s a story for another day.


1 All right, IMHO
2 See also My Cousin Vinny At Work, application of quotes therefrom
3 YMMV.
4 Or at least until we have a better idea and change the whole damn thing

This Wheel Goes To Eleven

(in which we make an unexpected connection regarding the D in SOLID and get all hot under the collar about it)

Let’s not beat about the bush: I think I may have reinvented Dependency Injection. While it looks rather casual, stated like that, I’ve actually spent much of the last six months doing it. (Were you wondering? Well, that.)

I’ve been designing/building/testing/ripping apart/putting back together again a library/app/framework/tool thing that allows us to assemble an asset allocation algorithm for each of our ten or so products1, each of which may have been modified at various times since inception. It’s been interesting and not a little fun, plus I’ve been climbing the C# learning curve (through the three versions shipped since my last serious exposure) like Chris Bonnington on amphetamines.

Our products are broadly similar but differ in detail in some places. So there’s lots of potential for reuse, but no real hierarchy (if you can see a hierarchy in the little chart here, trust me, it’s not real).

So Product A need features 1, 2 & 3, in that order. B needs 1 & 4, C 1, 3 & 5, etc. What I came up with was to encapsulate each feature in a class, each class inheriting from a common interface. Call it IFeature or some such. At run-time, I can feed my program an XML file (or something less ghastly perhaps) that says which classes I need (and potentially the assemblies in which they may be found), applying the wonder that is System.Reflection to load the specified assembles and create instances of the classes I need, storing them in, for example, a List<IFeature>. To run my algorithm, all I need to do is call the method defined in my interface on each object in turn. A different product, or a new version of an existing one has a different specification and it Should Just Work.

It’s all very exciting.

So changing a single feature of an existing product means writing one new class that implements the standard interface and pointing the product definition at the library that contains the new class (which may – should – be different from those already in use).

The discerning reader may, er, discern that there are elements of Strategy and Command patterns in here as well. Aren’t we modern?

While all this is very exciting (to me at least – a profound and disturbing symptom of work-life imbalance) it’s still not the end of the line. I’ve built functions and then chosen to access them serially, relying on carefully (or tricky & tedious) XML definitions to dictate sequence. I’m thinking that I can go a long way further into declarative/functional territory, possibly gaining quite a bit. And there’s a whole world of Dynamic to be accessed plus Excel and C++ interfaces of varying degrees of sexiness to be devised .

More on much of that when I understand it well enough to say something.


1 There are billions at stake, here, billions I tell you.

Do Wednesday mornings get much better?

Do I give Lotus Notes enough love? Does it deserve any? To be honest, it doesn’t get any at all from me. In fact, most of the time I hate it with a passion bordering on clinical insanity.

Occasionally, though, there’s a little ray of metaphorical sunshine.

Holy Crepuscularity, Batman!

I’ve posted before on the joy of the tooltip that is perfectly informative and yet utterly useless at the same time.

It’s seemed as if Lotus (or IBM) have a standard that requires icons to have a tooltip, but the designers leave it to the developers to figure out what text should be displayed in the tooltips themselves. (Actually, they probably don’t call them “tooltips” at all, what with the snazzy comic-book voice-bubble shape and everything). If the developers are nine-to-five cubicle drones then they’re going to exercise minimum levels of creative thought and frankly, the feeble result is less than surprising.

The other day I spotted two more closely-related gems, again offering a beautifully terse description of what they are, when I was rather hoping to discover what they mean. Cue vocal expressions of Joy.

Here’s the first:

It’s clearly a “Collapsed Twistie Icon” and I’m glad they were able to make it so clear. What it signifies is less transparent, neither does my mouse pointer change shape to give me any clue as to whether or not something might happen should I be brave enough to click on it. Some icons, such as the “Message replied to Icon”, for example, do nothing, mutely displaying their feebly-composed tooltip (how about “you replied to this message” as a more useful alternative, perhaps wirh a click taking you to the reply?)

I clicked. Turns out this indicates that there’s a threaded “conversation” to be displayed and my action causes the thread to be expanded. What do you think the icon changes to?

Genius.

My Mother Would Be So Proud

(stackoverflow rep: 10,038, Project Euler 96/283 complete)

(in which it transpires that I’ve been something creative all along)

Perhaps because I don’t speak Swedish very well (for values of “very well” that are exactly equal to “at all”) I didn’t notice this blog post until today, when I came across it in a link from a comment on post in a blog that I do read.

Earlier this week I was struggling to impress upon a colleague that I thought that he should write some code instead of spending additional time “getting a complete understanding of all the issues” and “acquiring a good knowledge of the technical framework”. I wondered if it was some deficiency in my communication “skills” – and that could very well be the case – but I’m starting to think that we have very different views on the purpose of code, perhaps because he’s from an academic mathematical background and I’m just an old hacker. I think he sees the code-writing process as the final expression of his understanding of a problem, where I use code to develop that understanding. Not surprisingly, within our problem domain I think I have the more appropriate model. Anyway, I outrank him…

Maybe it’s an Internet phenomenon (or more likely just human nature) that the same idea (meme?) is rediscovered on a regular basis. I remember reading – and being influenced by – a similar article several years ago. A quick search threw up an even older article making a similar point. Yes, even almost 20 years ago, the transition from source to executable was relatively cheap.

If we view the act of writing source code as part of the design of the program and not part of the build, then we move further away from the commonly-applied physical engineering metaphor and this is no bad thing. It doesn’t take much mental effort: any substantial change in the design of a building is likely to be immensely expensive after the building is constructed, whereas rebuilding a whole program or application is relatively cheap once the design (source) has been updated. Once you get to web apps, where even deployment is almost free, then you’re a long way from the building site.

Further, if we see coding as design, then the notion of starting small and enhancing iteratively, tidying (refactoring) as we go becomes a much more obviously good thing. To my mind, at least.

So – look Ma! I’m a designer!

Buddy, Can You Paradigm?

(stackoverflow rep: 8998, Project Euler 89/273 complete

(or: “If Paradigm was half as nice”)

I started reading Coders At Work recently. In the style of a few predecessors, the book comprises transcribed (and presumably edited) interviews with fifteen programmers who have made significant contributions of various kinds. It’s a weighty tome, possibly a little overweight if I’m honest, but interesting nonetheless; it makes a change from the kind of technical tome I usually cart around on the daily commute.

Circling a little closer to, and nodding vaguely in the general direction of the point, interviewee number seven, reached just this morning, is Simon Peyton Jones, who was one of the progenitors of Haskell, one of, if not the first name that springs to mind when one hears “functional programming“. For the last decade, Jones has been a researcher at Microsoft Research in Cambridge, which suggests that he would have had at least one finger in the F# pie… There’s a DotNetRocks show number 310 features an interview with the man, by the way. I’m enviously disturbed that despite our very similar ages, he appears considerably less follicly-challenged than me, although there could be some comb-over action going on there.

In the taxonomy of programming paradigms (does that make sense?) we have Declarative and Imperative. They are opposites. In the Declarative fold, we get stuff like SQL, XSLT, all the functional languages and, I submit Excel worksheets. Actually it’s not just me: that Wikipedia link places spreadsheet cell-based programming within the Dataflow subcategory of Functional. Imperative programming, on the other hand, includes the panoply of more “traditional” languages, including good old Visual Basic and its slightly less functional sibling VBA.

Thinking about Excel and VBA developement, I’d say there are several fairly distinct uses for VBA code (usually, but not exclusively: we can write add-ins or perform automation in plenty of other ways) which include:

  • New worksheet functions;
  • Interface extensions (menus, toolbars, ribbons etc);
  • Helpers (or wizards);
  • Control and simulation

…of which I write a lot of the latter two.

(You know how some comedians talk about a string of seemingly unrelated things and then cleverly tie them all together at the end? Well, fingers crossed, stay tuned)

A while back, I was handed a workbook, developed by an external consultancy, that modelled the 20-year evolution of a financial product. We wanted to run it across a significant number of stochastic market data simulations to get a picture of the distribution of possible outcomes. About 30,000 such simulations (and about 16GB of simulated market data) would be sufficient, we thought. The workbook took about 20 minutes to perform one such calculation. We didn’t have a year.

Fortunately, there were any number of classic optimisation failures to rectify: massive VBA/worksheet interaction at the cell level (changed to use large arrays), screen update enabled at all times, lots of less painful VBA-specific stuff, Application.Calculation = xlCalculationAutomatic, you know the drill. Easy stuff. Within a day it was taking about a minute and I could look at having the whole analysis done in under a week (I have a four-CPU machine).

If it all seems straightforward, it wasn’t. I’d wrapped the workbook in a modified (hacked-up from xlUnit) test jacket to check that my changes weren’t affecting the results, pretty much a necessity for any refactoring work, and one change kept breaking. Every time I tried to switch off automatic calculation, triggering it only when I thought it was needed, the final results came out different. I finally realised that the VBA and worksheet were extremely highly-coupled – calculations were occurring as the code pushed values through: change a value, auto-calc, change another value, auto-calc again and so on. Eek. It took almost a week to unravel. Of course, since I actually enjoy doing this kind of work for the most part, it wasn’t a complete disaster, but I could have lived without the time pressure…

As a finance worker (perversely, I’m increasingly inclined to describe myself as a Banker1 these days) I have cause, from time to time, to access The Bloomberg from Excel. At least one of the functions in the (extensive and powerful, let’s be fair) API is a real functional-imperative mess. Here’s one with which I’m not at all happy:

=BDH("MXWO", "PX_LAST", "01/06/2007", "28/12/2009", "Dir=V", "Dts=S", "Sort=A", "Quote=C", "QtTyp=Y", "Days=T", "Per=cd", "DtFmt=D", "cols=2;rows=672", "FX=EUR")

Yes, there are a lot of arguments2, but most are at least not positional, which shows they’re trying. This particular example asks for the last daily quote of the MSCI World Index, in Euro, between the dates given. Even better, I guess, there’s a wizard to generate it all for you. But here’s the nasty bit: the call fills as many rows – without warning about overwrites – as it needs. Oh, and that "cols=2;rows=672" parameter? It’s rewritten if you make any change that would affect the number of cells output. This is not easy to accomplish. Try duplicating the effect in VBA, go on, I double-dare you. I think there’s a separate server process at work here, able to make asynchronous (but timely) modifications to user worksheets.

Yikes. And ugh.

To me, there’s something broken here. They’re using am apparently functional programming model but allowing the code to affect the state of the worksheet in (from the user’s perspective at least) a rather arbitrary (or at least, unpredictable) way. We’re not even getting an array formula here – the function just causes chunks of worksheet to be filled with data. In point of fact, there looks to have been some serious programming work here – I don’t think in the normal course of events that calling a function is allowed to cause areas outside the calling range to be modified. I rather suspect an external process is being asked to pump values into the sheet through some devious means like (but let’s hope it isn’t) DDE.

(This is where I try to pull it all together. A little encouragement wouldn’t go amiss here…)

The fact that we have both paradigms represented as first-class components of a single development platform may be a large part of the reason for Excel’s dominance in the marketplace. But the interface between declarative and imperative code needs to be carefully managed to avoid the kind of unnecessary side-effects that I spent so long untangling.

The messed-up simulation I whined about above failed to recognise that the worksheets should have been, in effect, a great big, super-complex function, which the code used by setting up the input conditions, firing off a Calculate and then reading the output values. Instead, it tried to implement a tightly-coupled declarative-imperative abomination, with unfortunate results.

Let’s try not to repeat that mistake, for the sake of the children.

I’ve been (and remain) grumpy – thank you and good night.

You've been a lovely audience


1 Wanna make something of it?

2 Many Reuters functions do something similar, but push all the key-value pairs into a long string.

What Time Is It? Bah.

It’s a long time since I last wrote about Lotus Notes and the unlimited joy that is its, er, idiosyncratic interface, not least since IBM’s decision1 to host the client in Eclipse.

Too long, really – it’s such a rich source of oddness. For example, today I recevied an invitation to join some colleagues (located in Frankfurt) in a video conference. The heading in the message informed me that:

NotesMeetingTime1

Which seemed a little odd, since a quick phone call earlier had seen some time on Wednesday morning identified as the preferred time. No matter, I opened the message to accept the invitation (it’s not clear why I couldn’t do that from the inbox/preview, but I can’t). Clicking “Accept” put the meeting into my calendar:

BookedMeeting

Whoops! Well, it was more in keeping with what I expected. To settle myself, I went for a coffee (you can tell Christmas is coming, btw: Starbucks are using the Red Cups). When I got back, Notes had been busy – the Inbox message now had this:

NotesMeetingTime2

OK, it’s now accurate, but I’m not sure how I feel about a message being modified in any way after I’ve opened and read it.

Of course, we’re 0.5 of a release behind the current version, so maybe stuff like this has been fixed by now.

 


celebration

 

The BBC reports that, in an Australian Science magazine article, an Australian psychology expert “who has been studying emotions has found being grumpy makes us think more clearly”.

 

To which I can only say “hmph”.

 


1 I’d love to have been a fly on the wall at that meeting.

 

The Hard Way

(stackoverflow rep: 7284, Project Euler 83/252 complete)

My main work PC was upgraded to IE7 yesterday. That’s one less IE6-infected machine to worry about. Unrelated to that (I suppose) is that the Aventail VPN product that I have to use each day decided it wanted to upgrade. I’m still trying to figure out how to make that work on IE7 but fortunately I also have an older machine that seems to have been immune to the upgrade, so I switched to that.

After some back-and-forth, I saw the happy news that this was happening:

All going acording to plan?

All going according to plan?

While this was cogitating, a message popped up, partially obscured by the progress dialog. So I moved it. The dialog, that is, not the message. And I saw this:

Whoops!

Dude, where's my progress bar?

How confused must the developer of this part of the installer have been to have built the progress bar as an entirely separate window? And how much more difficult must it have been to do it that way? I amused myself dragging the main dialog all over my desktop while the progress bar stayed resolutely where it was until the install completed.

Johnny 99

Sometimes I write legacy code. There, I’ve said it. The secret’s out, the dirty laundry’s aired and the cat’s out of Pandora’s box.

I don’t think it makes me a bad person. I understand the value of tests, especially in the highly incremental development world I currently inhabit, and I strive to use tests to drive my code. Sometimes it doesn’t happen, for many reasons, none of which I’m proud of, such as my Excel test framework being a little clumsy, there not being anyone around me to nag, the “quick” change that turns into just a little more uncovered code than I expected (but it works and the user needs it Right Now).

I understand that all that code sitting in the wild in an uncovered state is Legacy and represents an accumulation of Technical Debt that will have to be repaid. By me.

It’s not that code without tests is necessarily bad. I mean, heck, even Kent Beck sometimes flies without a safety net. If it’s write-once-and-throw-away code, then there’s an argument for just getting it done. But honestly, how often does that code get resurrected months, maybe years later? It’s when code needs to be changed, often requiring some refactoring in the process, that the absence of that warm, covered-by-tests feeling starts to be felt.

Still there, still doing a job, but just try changing it...

Still there, still doing a job, but just try changing it...

The trouble with beginning to repay technical debt is that interest tends to accrue, compounded, at some arbitrary (but almost always positive) rate. So no matter how trivial the original untest-covered change seems to have been, the longer you leave it, the more unpleasantness tends to have accreted around it by the time you come back to it. I suppose there’s always the possibility that you’ll come back to discover a glistening pearl, but in my programming life I’ve never returned to anything other than a thick coating of rust.

Worse than that, getting the encrusted nodule of code under test usually turns out to be painful: it’s seldom structured as it would have been had tests been used in the first place, so unwanted and hard-to-separate interdependencies are rife and the whole thing becomes, well, a bit tricky.

I’d been meaning to buy Michael Feathers‘ “Working Effectively with Legacy Code” for years and recently got around to buying it. Scott Hanselman’s interview last week with Mr Feathers was a serendipitous bonus. The book, even for non-Java or C++ programmers, is excellent, full of practical advice on how to break things up in a way that can let you get tests around the locus of change. In fact, when you see how just plain nasty working with C++ code can be, working with code in any other language starts to look like a breeze.

I wish I’d read the book earlier.

Round-tripper

(stackoverflow rep: 3856, Project Euler 63/235 complete)

Good grief. I wrote the first draft of this about a month ago, planning on completing and posting it when the code was done. I expected that to take a few more days. A little more work required on the estimating front, then.

I’m starting to go off Oracle.

Let me put that into context a little. I first encountered Oracle some time in 1998, when version 5 was all the rage. I’d actually taught data analysis, third and fifth normal form, stuff like that for a few years previously but actual hands-on table creation had to wait. Strange but true. Anyway, over the next two or three years, some of which I spent as “Technical Architect” for the investment bank where I worked, I got to be something of a whiz with both version 5 and the swanky new version 6. Heck, I know the query optimiser’s rules off by heart. I’m not just blowing my own trumpet, mind: when I was untimately (fortuitous typo retained) laid off, I was offered a job by Oracle, which I rejected because I didn’t want to take a pay cut.

I spent five more years in Oracle-land with another bank before drifting into the realms of Sybase in its early MS SQL Server guise, and then Sybase itself across three jobs and four years (it seems like longer). Now, fourteen years after we parted company, Oracle and I are back together.

But we’ve both changed. I no longer code in COBOL and have acquired a pathological dislike of business logic in the database. Oracle has a cost-based optimiser, loves to grab all your business rules (more processors = more revenue) and has become a fat bloated porcine creation. Even the free “personal” 10g Express Edition for Windows is a 165MB download.  (OK, SQL Server Express 2008 is even larger, I checked). When running, the thing takes out a 642MB virtual machine. OK, it’s almost entirely swapped out, but still.

How we did parallel processing in the old days

How we did parallel processing in the old days

But Oracle is still a helluva fast platform. Unoptimised I was seeing about 8K inserts a minute on my development PC, three times that on a real server. Unfortunately our db server currently lives abroad for tax reasons (or something) and the network latency is fierce. About 900 inserts a minute fierce. So I needed to batch up my inserts or enter the living hell that is SQL Loader.

In order to get multiple insert processes working within my Ruby On Rails-based code, I split each file into several fragments, then run a separate process on each fragment. This takes a bit of doing, generating lots of CMD files that run ruby scripts with “START [/WAIT] CMD /C whatever_I_want_goes_here“.

My file-splitting code, I thought, was rather spiffy – it needs to put the headings from the original to each fragment (because they’re used to figure out what’s in the file) then it starts dealing out the records:

def create_fragment_files(paths)
  File.open(file_path, 'r') do |fin|
    hdgs = fin.readline.chomp
      files = paths.map { |path| File.open(path, 'w+') }
      files.each { |fout| fout.puts hdgs }
      fin.each_line do |line|
        files.first.puts line
        files.push files.shift # the first shall be last...
      end
    files.each { |fout| fout.close }
  end
end

There are faster ways, I’m sure – I could calculate the “ideal” file size and dump records into a file until it’s reached, but this is fast enough (well under a minute for an 85MB file) and it pleases me.

There’s a handy little library, ar-extensions, that makes batching of inserts possible within ActiveRecord (which is the default data mapping library within Rails). It works nicely with MySQL, but turned out to have the Oracle code stubbed and invalid. It only took me a day or two to find a solution to that problem, although I still haven’t figured out how to push an update through a proxy server to github. Finally a chance to do something open sourceful, and I’m thwarted at every turn.

So all in all, it’s taken a month. OK, a month in which a lot of other stuff got done, but still.On the plus side, I just fired it up and I’m watching about 36,000 inserts a minute go through. It’ll be faster when the lookup tables are fully populated. (Another day on, and I’m looking at it: 46,000 – and I still have a few tricks up my sleeve)

While the nearly-two years’ of data is backfilling I now get to rewrite the front end.

And the point of this post? In no small part, to remind me of what I actually spent the lion’s share of the last month doing. Also, to record my first-ever open-source contribution, even if I still haven’t worked out how to get my source out into the open.

If you have been, thanks for your forebearance.

Lazing On A Sunny Afternoon

(stackoverflow rep: 2906, Project Euler 59/230 complete)

Below (lightly edited) is a recent answer to a question on StackOverflow. The question is pretty much a waste of time, but what an answer!

Laziness is indeed the first of the Three Programming Virtues, but it is misunderstood. Programming Perl defines it well:

* LAZINESS: The quality that makes you go to great effort to reduce overall energy expenditure.

Good programming calls for laziness, but laziness requires hard work. Good programmers must constantly think of and implement new ways to be lazy. The first compiler had to be written in assembly, and the first assembler had to be written in machine language. Wonderfully lazy, but hard. You don’t get to call it a day after an hour just because what used to take a day takes an hour.

That’s about as close to an encapsulation of my personal programming philosophy as I’ve ever seen. The virtue of Laziness is closely related, if not identical to the DRY (Don’t Repeat Yourself) Principle as espoused by the Pragmatic Programmers in The Pragmatic Programmer.

(I just found myself wondering if there’s a way to refactor that last sentence to remove some of the duplication)

Laziness - all I have to do is this...

Laziness - all I have to do is this...

The idea behind xlUnit (which has been languishing somewhat since being codeplexed in, good grief, September) was to minimise the amount of manual repetition involved in testing Excel/VBA code as it is developed. As a useful side-effect, it also made TDD (Test-Driven Design, or Development, depending on your choice of definition) possible. A fair amount of the code complexity of the framework is involved in eliminating repetitive manual tasks such as test class creation.

There are only three significant UI-level entry points in xlUnit as it stands: application creation (used rarely, only about once per app) class creation, used rather more often, and test execution, which is used all the time. Oh, and there’s a little “Options” dialog, but it only has one setting and that’s not really very interesting so we’ll gloss over that.

Since the first two things are used relatively infrequently, I tucked them away on a menu, whereas the test execution, which is an all-the-time thing, gets a toolbar button. I could have been fancier with that, but I never got past the simple text “Run Tests”, which had the extra benefit of giving a larger target area for the mouse.

...and I get this for free.

...and I get this for free.

There is no interface within the VBE, which is most definitely a shortcoming. To be honest, I only even tried briefly (failing, obviously) to get it to work: with a dual screen setup, which my working environment has had for the best part of a decade, both workbook and VBA are usually visible, so I can get to the “Run” button easily enough. For a similar reason, there is no built-in keyboard shortcut at present. Of course, in Excel 2007, it’s all a bit of a mess. I have the (massive) RibbonX book, by the way, I just need the intestinal fortitude to sit down and read it.

The “Create Testable Application” and “Create Testable Class” routines both live in the add-in itself and are located within UserAccessibleEntryPoints.bas. I’ll come back to “Run Tests” another time – not only was it a bit tricky but I have some new ideas that would make it even trickier.

Choosing to create an application causes a throwaway instance of the xlUnitCodeBuilder class to create a workbook and its tester and create references from the tester to both the application (so it can call classes there) and the framework (so it can use the framework). The same class contains the code to create new classes. I think I put the functions together because they’re about building code but I don’t think I’ve achieved very good separation of concerns here: there’s workbook creation and test class creation and they would probably be better off being separated. Since there’s work to do with Excel 2007 anyway, I think I’ll refactor that bit next time it’s open.

I mentioned “throwaway instance” above without explanation: it’s a way I describe this VB pattern:

    With New UsefulClassThatDoesntReallyNeedToBeDimmed
        .DoSomethingProfound
        .DoSomethingSimilarlyDeep "AndMeaningful"
    End With

Is there a more “standard” way to describe this? Something using “anonymous” perhaps? Anyway, I use it a lot (and I’m now fervently hoping that no-one points out some awful risk that I’m running as a consequence) as it’s a way to get part of the economy of class (“static” methods in Java or C#, where the word means something rather different than in VBA) method.