This Wheel Goes To Eleven

(in which we make an unexpected connection regarding the D in SOLID and get all hot under the collar about it)

Let’s not beat about the bush: I think I may have reinvented Dependency Injection. While it looks rather casual, stated like that, I’ve actually spent much of the last six months doing it. (Were you wondering? Well, that.)

I’ve been designing/building/testing/ripping apart/putting back together again a library/app/framework/tool thing that allows us to assemble an asset allocation algorithm for each of our ten or so products1, each of which may have been modified at various times since inception. It’s been interesting and not a little fun, plus I’ve been climbing the C# learning curve (through the three versions shipped since my last serious exposure) like Chris Bonnington on amphetamines.

Our products are broadly similar but differ in detail in some places. So there’s lots of potential for reuse, but no real hierarchy (if you can see a hierarchy in the little chart here, trust me, it’s not real).

So Product A need features 1, 2 & 3, in that order. B needs 1 & 4, C 1, 3 & 5, etc. What I came up with was to encapsulate each feature in a class, each class inheriting from a common interface. Call it IFeature or some such. At run-time, I can feed my program an XML file (or something less ghastly perhaps) that says which classes I need (and potentially the assemblies in which they may be found), applying the wonder that is System.Reflection to load the specified assembles and create instances of the classes I need, storing them in, for example, a List<IFeature>. To run my algorithm, all I need to do is call the method defined in my interface on each object in turn. A different product, or a new version of an existing one has a different specification and it Should Just Work.

It’s all very exciting.

So changing a single feature of an existing product means writing one new class that implements the standard interface and pointing the product definition at the library that contains the new class (which may – should – be different from those already in use).

The discerning reader may, er, discern that there are elements of Strategy and Command patterns in here as well. Aren’t we modern?

While all this is very exciting (to me at least – a profound and disturbing symptom of work-life imbalance) it’s still not the end of the line. I’ve built functions and then chosen to access them serially, relying on carefully (or tricky & tedious) XML definitions to dictate sequence. I’m thinking that I can go a long way further into declarative/functional territory, possibly gaining quite a bit. And there’s a whole world of Dynamic to be accessed plus Excel and C++ interfaces of varying degrees of sexiness to be devised .

More on much of that when I understand it well enough to say something.


1 There are billions at stake, here, billions I tell you.

Advertisements

All You Wanna Do Is Jaw-Jaw

(in which we apply our face, at speed, to the brick wall of Enterprise IT Development Methodologies)

I am, for my sins, embroiled in a Project with our IT people. I assume that some Holy Standards Manual1 somewhere must decree that in order to transition from stage 0.7.4.1.a to stage 0.7.4.1.b2 there is a requirement to develop and publish a “Communication Plan”, such a document having also assaulted my inbox this week. That’s an eight-slide Powerpoint presentation, adorned with pithy quotes and charmingly apposite images. More Powerpointedness than I’ve personally produced in my life, for content that could adequately be summarised in a four-sentence email. Three, if you worked at it. Something like this:

  1. There will be a weekly meetings: on Thursdays at 1400 to review status and issues. (There are a couple more, but they don’t have general applicability so let’s not bother everybody with those).
  2. Everybody should talk to each other as often as necesary to ensure that lack of information does not hold up progress (I’m paraphrasing wildly here, attempting to extract some sense from two ludicrously complicated diagrams)
  3. Documents will be stored on the Sharepoint server that everybody already uses.
  4. Communication will be in English except where everyone involved speaks German and no non-German speaker will need to read/listen.

There. I did it in four without even trying that hard. Took about five minutes. No pictures though. Sorry about that.


1 With no apparent irony, a document received this week declares that the Project is “following a strict Waterfall approach.”
2 The stages may be more fine-grained than that – I don’t know and I don’t want to know.

Ghosts in the Machines

in which we consider the death of Social Networking? How Seasonal…

The world is Socially Networked these days, or at least a seriously chunky proportion of the interwebular world is. I’ll admit to being present (to a greater, or more likely lesser extent) on Facebook, Friends Reunited, Twitter, LiveJournal and to having at least registered on a good few others.

Beyond those, there are places like WordPress, Stackoverflow (and sundry other StackExchange sites), Blogger, Opera, Google, Yahoo! etc etc ad nauseam. You’re reading this so it’s likely that you’re aware of most, if not all of this stuff.

As well as grumpy, I’m also old. Well, by the dictionary I appear to be in the first blush of middle-age, but when you reach the stage in life when all your relatives from the previous generation have died and your peers are starting to shuffle off this mortal coil then you’re at least beginning to be aware – uncomfortably so – of mortality.

So right now I’m all over the Internet. But in fifty years it’s statistically almost certain that I won’t be actively contributing any more.

People die.

More pertinently to my musings here, people on Facebook die. Of the ten countries with the highest estimated Facebook populations, assuming their age distributions are similar to the general population (obviously not, but stay with me) and that mortality rates as per the 2009 CIA World Factbook can be applied, then we get this:

Country FB Pop (m) Deaths/K FB deaths
USA 145.3 8.38 1,217,614
Indonesia 31.4 6.26 196,564
UK 28.8 10.02 288,576
Turkey 23.8 6.10 145,180
France 20.3 8.56 173,768
Philippines 18.8 5.10 95,880
Mexico 17.8 4.80 85,440
Italy 17.6 10.72 188,672
Canada 17.4 7.74 134,676
India 16.5 6.23 102,795
TOTAL 2,629,165

..or about 2.6 million Facebook users dying each year. Obviously the estimate may be a little low, there are only ten countries represented for one thing. What are you going to do when all your friends are dead?

Add in the all the other sites and we’re looking at a small-to-medium sized country-worth of “ghost” accounts, persisting – and inviting interaction, even.

I’m not offering any solutions – heck, I’m not even suggesting that this is even a Bad Thing – perhaps it’s as close to life after death as we’ll ever get. I just keep extrapolating to some inflection point where half the content on the Internet was written by people who are now dead.

So if I stop posting here, is it because I’m bored, out of things to write about or something more … terminal? And if the latter, how will you know?

Tiny VBA Tooltippery Tip

Project Euler 100/304

(in which we discover What Went On In 1997)

This morning’s iteration of the daily blog/news trawl for useful information threw up “Five tips for debugging a routine in the Visual Basic Editor“, all of which are sensible, although unlikely to be news to anyone reading this, if we’re honest.

Tip #3, “View variables using data tips”, however, reminded me of something that I don’t believe is widely known. Since the site seems to require a full-blown account creation that I can’t see as appropriate for a simple comment, I’m going to mention it here.

Hovering the mouse pointer over a variable while in VBA’s Break mode will show the variable’s value in a tool tip:

The smart VBA programmer

That’s fine: almost all the time we get to see exactly what we want. Above about (or maybe exactly) 60 characters, however, we get the leading part and three little dots:

Still no problem if we only want the start of the string...

What if we want to see what’s at the end of the string, though? Well, back in (I think) 1997, I managed to get my then employer to send me to VBA DevCon (no easy task, given that the location was EuroDisney), at which I happened to meet the Microsoft guy who actually wrote the hover/tooltip thing (it was in the VB4 editor first, I believe) and he told me that viewing the last 60-ish characters of the string could be achieved by holding down the Control key before moving the pointer over the variable name:

Presto!

I don’t think I’ve ever seen this recorded. Of course, I haven’t exactly gone looking for it, so if you came all the way to the end only to discover that I was just repeating something that everyone knows, then I can only apologise. We’ll get over it.

Lacking Anything Worthwhile To Say, The Third Wise Monkey Remained Mostly Mute

Project Euler 100/304 complete (on the permanent leaderboard at last!)

(in which we discover What’s Been Going On lately)

 

Don't talk ... code

 

I’ve been coding like crazy in C# of late, a language I’ve barely touched in the last few years. Put it this way, generics were new and sexy the last time I wrote anything serious in .NET… (There should be some ExcelDNA fun to be had later.)

I’d forgotten how flat-out fast compiled languages, even the bytecode/IL kind could be. It’s not a completely fair comparison, to be sure, but a Ruby script to extract 300,000 accounts from my Oracle database and write them as XML takes a couple of hours, mostly in the output part. A C# program handled the whole thing in 5 minutes. Then processes the accounts in about 30 seconds, of which 10 are spent deserializing the XML into objects, 10 are serializing the results to XML and 10 are performing the moderately heavy-duty mathematical tranformations in between.

 


Click to test is value for money

 

Lacking at present a paid-for version of Visual Studio 2010 (the Express Edition, while brilliantly capable, won’t do plugins, which precludes integration Subversion and NUnit, to name but two essentials), I have been enjoying greatly my experience with SharpDevelop, which spots my installs of TortoiseSVN and NUnit and allows both to be used inside the IDE. It’s not perfect: there are areas, particularly in its Intellisense analogue, where exceptions get thrown, but they’re all caught and I have yet to lose any work. While the polish is, unsurprisingly, at a lower level than Microsoft’s, it’s entirely adequate (and I mean that in a good way) and the price is right. I particularly liked being able to open an IronRuby session in the IDE and use it to interact with the classes in the DLL on which I was working.

While I expect VS2010 to become available as the budgeting process grinds through, I’m not at all sure that it’ll be necessary to switch. An extended set of automated refactoring tools could be attractive, although Rename and Extract Method are probably the two most useful, productivity-wise, and they’re already present. I would rather like to have Extract Class, which isn’t needed often but would be a big time (and error) saver when called for.

On another topic entirely, should you be looking for entertaining reading in the vaguely technical, erudite and borderline insane category, may I recommend To Umm Is Human to you? Any blog that has “orang utan” amongst its tags is worth a look, I’d say. If you like it, you’ll like it a lot. I was once made redundant by a Doubleday, but I don’t think they’re related.

There’s an interesting new programmer-oriented podcast on the block, too: This Developer’s Life has slightly higher production values that may ultimately limit its life – the time to produce an episode must be substantial. I found myself wanting to join in the conversation with stories of my own, a sure sign that I was engaged in the content.

That Do Impress Me Much

Over at stackoverflow, now that they have a pile of money to spend invest, the rate of change is picking up. There’s the re-worked stack exchange model, which has changed dramatically – and quite likely for the better. They’ve moved away from the original paid-for hosted service to a community-driven process, whereby a community needs to form and commit to the idea of a new site. The objective is to improve the prospects of achieving critical mass for a new site, thus increasing its chances of success. I imagine a revenue model is mooted, although it may be little more than “if we build it, they will come” at present. Sponsored tags and ads spring to mind.

This week we’ve seen the covers removed (perhaps for a limited time initially) on a “third place“, to go with the existing main Q&A and “meta” (questions about the Q&A site). It’s a chat room. Well, lots of chat rooms of varying degrees of focus, to be more specific. Quite nicely done, too.

What has really impressed me has been that during this “limited sneak beta preview”, bugs, issues, feature requests and the like have been flowing through the interface at a fair rate of knots and many have been addressed and released within hours. Minutes, sometimes.

Think about it. User detects a bug, reports it and gets a fix, to an application with global reach, in a couple of hours or less. That’s agile.

A crucial part of Lean movement in manufacturing (and its younger counterpart in software development) is eliminating waste. “Waste” is broadly defined, very broadly defined, in fact, but one easily identifiable component is Work In Progress (WIP). In software terms, this often represents effort that has been invested (and money that’s been tied up) without having been included in a release. The more we invest effort without release the more we’re wasting, since we have no possibility of obtaining a return on that investment.

Here’s a particularly quick find/fix from earlier today:

Yes, it was probably a trivial bug, but the problem was notified, found, fixed and released in eight frickin’ minutes. How many of us can turn anything around that fast?

I’m looking forward to seeing where this goes.

Sorted for Excel and Whee!

If you happened upon the VBA Lamp and by rubbing it were able to produce the Excel VBA Genie and were granted a VBA Wish, would you ask for a built-in Sort() function?

If you build the kind of Excel apps that I do, you’ll run up against the need for a Sort all too frequently. Sorting arrays with sizes in the tens of thousands is not unusual and I was reminded of this when reading a post in this entry at Andrew’s Excel Tips the other day.

While it’s not crucial in the (useful) idea presented, the code has a quick and and dirty sort that is about the worst sort algorithm1 one could intelligently come up with. It’s also an prettty intuitive solution, which just goes to show that sorting may not be as simple as we may think. I’m not attacking Andrew’s skillz here, btw: the code as presented is certainly fit for purpose; it’s not presented as a general-purpose utility (at least I hope it isn’t).

I’ve accumulated a few algorithms over the years and I’ve coded up a couple more while “researching” this piece. On the one hand, we have the oft-derided BubbleSort and its close relation, CocktailSort. In the same group I’d include InsertionSort and SelectionSort. I’m going to be harsh and categorise those, with the naive sort above, as “Slow”. Well, “mostly slow”, as we’ll see.

In the “Fast” group, we have the much-touted QuickSort, and somewhere in between, we have HeapSort,and my current algorithm of choice, CombSort. As I was researching this, I also coded up ShellSort, which is about as old as I am and which was claimed to be faster than QuickSort under some conditions.

I ran some comparisons, not meant in any way to be perfectly scientific2. I ran each algorithm on arrays of 50 to 50,000 values with six different characteristics:

  • already sorted
  • exactly reversed
  • mostly sorted (values typically within one or two places of target)
  • ordered blocks of 100 random values (the first 100 values are 0 + RAND(), then 100 + RAND() and so on)
  • completely random
  • random 10-character strings

First off, the 50-record results:


50 recs (ms) sorted near sorted blocks random strings reversed
Shell 0.06 0.06 0.11 0.10 0.24 0.09
Quick 0.13 0.13 0.16 0.13 0.29 0.14
Comb 0.09 0.10 0.17 0.17 0.33 0.13
Heap 0.34 0.33 0.32 0.32 0.52 0.28
Insertion 0.02 0.02 0.20 0.17 0.47 0.37
Selection 0.25 0.25 0.25 0.25 0.54 0.25
Cocktail 0.01 0.02 0.44 0.39 1.02 0.77
Bubble 0.01 0.02 0.50 0.45 1.12 0.78
Naive 0.22 0.23 0.50 0.46 1.06 0.77

I’d say it’s pretty clear that it doesn’t matter much what you use to sort a small array, just about anything will be fast enough (unless you’re going to perform that sort tens of thousands of times in a run). It’s also apparent that the “slow” algorithms are actually pretty good if our data is already reasonably well-ordered.

So far, so “so what?”

Let’s look at the opposite end of the spectrum: 50,000 values? Here, the Fast/Slow divide is apparent. First the “Slows” (two tests only, for reasons that should become apparent):


50K (ms) near sorted random
Bubble 31 522,216
Cocktail 30 449,696
Insertion 19 179,127
Naive 219,338 510,010
Selection 220,735 220,743

Yes, that’s hundreds of thousands of milliseconds. “Three or four minutes” to you and me. The “Fasts”, meanwhile:


50K (ms) sorted near sorted blocks random strings reversed
Shell 162 164 219 377 929 250
Quick 296 298 327 365 790 306
Comb 390 396 477 622 1,348 452
Heap 899 903 885 874 1,548 844

(I only ran two tests on the “Slows”, for fear of dozing off completely.)

Again, for data where values are near their final sorted positions there’s clear evidence that something like an Insertion Sort is much faster than any of the “sexier” options. Provided you know your data will actually meet that criterion, of course.

All that considered, I’m switching from CombSort to ShellSort as my default algorithm. While it loses out a little to QuickSort in the “random” test (probably most representative of my normal use case) it doesn’t carry with it the fear of stack overflow through extreme recursion, something that’s bitten me with QS in the past. Anyway, us old’uns have got to stick together.

As already mentioned, if you have small quantities of data or infrequent sort calls in your application, it really doesn’t make much difference which algorithm you use, although I’d still suggest being aware of the properties of several options and having a basic implementation to hand. Once you reach a point where sorting contributes materially to your application run time then you owe it to yourself and your users to make an intelligent selection.

Here’s my ShellSort implemenation in VB, transcoded fairly literally from the Wikipedia pseudo-code (optimisations welcome):


Public Sub ShellSort(inp)
' sorts supplied array in place in ascending order
Dim inc As Long, i As Long, j As Long
Dim temp ' don't know what's in the input array...
  If Not IsArray(inp) Then Exit Sub ' ...but it had better be an array
  inc = (UBound(inp) - LBound(inp) + 1) / 2 ' use Shell's originally-proposed gap sequence
  Do While inc > 0
    For i = inc To UBound(inp)
      temp = inp(i)
      j = i
      Do
        If j < inc Then Exit Do ' check these conditions separately, as VBA Ors don't short-circuit
        If inp(j - inc) <= temp Then Exit Do ' ... and this will fail when j < inc
        inp(j) = inp(j - inc)
        j = j - inc
      Loop
      inp(j) = temp
    Next
    inc = Round(CDbl(inc) / 2.2)
  Loop

End Sub


1 Not the absolute worst: there’s the catastrophic genius of BogoSort, to name but one, but let’s not go anywhere nearer to there.
2 Just so we understand each other, these are my figures for my code on one of my machines. YMMV. A lot.