Another notch in the utility belt

Having been largely an all-thumbs c# programmer for the last year or so I’ve had little opportunity to add to the world of Excel know-how (and I don’t really feel qualified to say much about the .Net world yet, even if I did once get a Tweet from Jon Skeet). I do keep an eye out for Excel questions on StackOverflow, though, and one caught my eye this morning.

The questioner wanted to add a zero to a chart range. Not sure why he didn’t just add an extra constant value at the end (start or beginning) of the range, but perhaps he couldn’t (maybe the data was in another, read-only workbook?). Anyway, I came up with something that extended an existing trick a little and I thought it was mildly interesting.
Using a named formula in a chart series is1 a fairly well-known idea theses days. It lets us define a dynamic series (usually using OFFSET()) and have our charts update automagically as our series grow and shrink. What I hadn’t previously thought of2 was that I could use a VBA function call as the Name’s “Refers to” with the result being useable in a chart Series definition.

Turns out it could.

Public Function PrependZero(rng As range)
Dim val As Variant, res As Variant
Dim idx As Long
    val = Application.Transpose(rng.Columns(1).Value) ' nifty!
    ReDim res(LBound(val) To UBound(val) + 1)
    res(LBound(res)) = 0 ' this is where the function lives up to its name
    For idx = LBound(val) To UBound(val)
        res(idx + 1) = val(idx) ' copy the remainder
    Next
    PrependZero = res
End Function

(That Transpose thing is handy – I hadn’t known about it as a way of getting a 1-D range value as a vector. Two transposes needed for data along a row.)

You could use this approach for lots of things. Like, well, lots of things. Honestly, you don’t need me to spell them out.


1 This makes me a little sad – it was a source of Excel-fu that impressed people to a quite disproportionate degree. Array formulae and VBA classes still have this power.
2 I’m aware that I might be the last kid in the class here. Be gentle.

My (Im)perfect Cousin?

in which we start to worry about the source of our inspiration
Mona Lisa Vito: So what’s your problem?
Vinny Gambini: My problem is, I wanted to win my first case without any help from anybody.
Lisa: Well, I guess that plan’s moot.
Vinny: Yeah.
Lisa: You know, this could be a sign of things to come. You win all your cases, but with somebody else’s help. Right? You win case, after case, – and then afterwards, you have to go up somebody and you have to say- “thank you“! Oh my God, what a fuckin’ nightmare!


It is one of the all-time great movies, and netted Marisa Tomei an Oscar in the process. Yes it is. It really is1.

Not only that, but My Cousin Vinny2 throws up parallels in real life all the time. Yes it does. It really does3.

Why only recently, I was puzzling over the best (or least worst) way to implement a particularly nonsensical requirement for an intransigent client. After summarising the various unpalatable options in an email, a reply arrived from a generally unproductive source. The message content made it obvious that he’d somewhat missed the point but the conclusion he drew from that misunderstanding triggered a new thought process that gave us a new, even less, er, worser solution to our problem.

Sadly, my unwitting muse has moved on now, but he left his mark for all time4 on our latest product. I suppose he should also take partial credit for the creation of a hitherto unknown development methodology: Powerpoint-Driven Development, but that’s a story for another day.


1 All right, IMHO
2 See also My Cousin Vinny At Work, application of quotes therefrom
3 YMMV.
4 Or at least until we have a better idea and change the whole damn thing

This Wheel Goes To Eleven

(in which we make an unexpected connection regarding the D in SOLID and get all hot under the collar about it)

Let’s not beat about the bush: I think I may have reinvented Dependency Injection. While it looks rather casual, stated like that, I’ve actually spent much of the last six months doing it. (Were you wondering? Well, that.)

I’ve been designing/building/testing/ripping apart/putting back together again a library/app/framework/tool thing that allows us to assemble an asset allocation algorithm for each of our ten or so products1, each of which may have been modified at various times since inception. It’s been interesting and not a little fun, plus I’ve been climbing the C# learning curve (through the three versions shipped since my last serious exposure) like Chris Bonnington on amphetamines.

Our products are broadly similar but differ in detail in some places. So there’s lots of potential for reuse, but no real hierarchy (if you can see a hierarchy in the little chart here, trust me, it’s not real).

So Product A need features 1, 2 & 3, in that order. B needs 1 & 4, C 1, 3 & 5, etc. What I came up with was to encapsulate each feature in a class, each class inheriting from a common interface. Call it IFeature or some such. At run-time, I can feed my program an XML file (or something less ghastly perhaps) that says which classes I need (and potentially the assemblies in which they may be found), applying the wonder that is System.Reflection to load the specified assembles and create instances of the classes I need, storing them in, for example, a List<IFeature>. To run my algorithm, all I need to do is call the method defined in my interface on each object in turn. A different product, or a new version of an existing one has a different specification and it Should Just Work.

It’s all very exciting.

So changing a single feature of an existing product means writing one new class that implements the standard interface and pointing the product definition at the library that contains the new class (which may – should – be different from those already in use).

The discerning reader may, er, discern that there are elements of Strategy and Command patterns in here as well. Aren’t we modern?

While all this is very exciting (to me at least – a profound and disturbing symptom of work-life imbalance) it’s still not the end of the line. I’ve built functions and then chosen to access them serially, relying on carefully (or tricky & tedious) XML definitions to dictate sequence. I’m thinking that I can go a long way further into declarative/functional territory, possibly gaining quite a bit. And there’s a whole world of Dynamic to be accessed plus Excel and C++ interfaces of varying degrees of sexiness to be devised .

More on much of that when I understand it well enough to say something.


1 There are billions at stake, here, billions I tell you.

Lacking Anything Worthwhile To Say, The Third Wise Monkey Remained Mostly Mute

Project Euler 100/304 complete (on the permanent leaderboard at last!)

(in which we discover What’s Been Going On lately)

 

Don't talk ... code

 

I’ve been coding like crazy in C# of late, a language I’ve barely touched in the last few years. Put it this way, generics were new and sexy the last time I wrote anything serious in .NET… (There should be some ExcelDNA fun to be had later.)

I’d forgotten how flat-out fast compiled languages, even the bytecode/IL kind could be. It’s not a completely fair comparison, to be sure, but a Ruby script to extract 300,000 accounts from my Oracle database and write them as XML takes a couple of hours, mostly in the output part. A C# program handled the whole thing in 5 minutes. Then processes the accounts in about 30 seconds, of which 10 are spent deserializing the XML into objects, 10 are serializing the results to XML and 10 are performing the moderately heavy-duty mathematical tranformations in between.

 


Click to test is value for money

 

Lacking at present a paid-for version of Visual Studio 2010 (the Express Edition, while brilliantly capable, won’t do plugins, which precludes integration Subversion and NUnit, to name but two essentials), I have been enjoying greatly my experience with SharpDevelop, which spots my installs of TortoiseSVN and NUnit and allows both to be used inside the IDE. It’s not perfect: there are areas, particularly in its Intellisense analogue, where exceptions get thrown, but they’re all caught and I have yet to lose any work. While the polish is, unsurprisingly, at a lower level than Microsoft’s, it’s entirely adequate (and I mean that in a good way) and the price is right. I particularly liked being able to open an IronRuby session in the IDE and use it to interact with the classes in the DLL on which I was working.

While I expect VS2010 to become available as the budgeting process grinds through, I’m not at all sure that it’ll be necessary to switch. An extended set of automated refactoring tools could be attractive, although Rename and Extract Method are probably the two most useful, productivity-wise, and they’re already present. I would rather like to have Extract Class, which isn’t needed often but would be a big time (and error) saver when called for.

On another topic entirely, should you be looking for entertaining reading in the vaguely technical, erudite and borderline insane category, may I recommend To Umm Is Human to you? Any blog that has “orang utan” amongst its tags is worth a look, I’d say. If you like it, you’ll like it a lot. I was once made redundant by a Doubleday, but I don’t think they’re related.

There’s an interesting new programmer-oriented podcast on the block, too: This Developer’s Life has slightly higher production values that may ultimately limit its life – the time to produce an episode must be substantial. I found myself wanting to join in the conversation with stories of my own, a sure sign that I was engaged in the content.

That Do Impress Me Much

Over at stackoverflow, now that they have a pile of money to spend invest, the rate of change is picking up. There’s the re-worked stack exchange model, which has changed dramatically – and quite likely for the better. They’ve moved away from the original paid-for hosted service to a community-driven process, whereby a community needs to form and commit to the idea of a new site. The objective is to improve the prospects of achieving critical mass for a new site, thus increasing its chances of success. I imagine a revenue model is mooted, although it may be little more than “if we build it, they will come” at present. Sponsored tags and ads spring to mind.

This week we’ve seen the covers removed (perhaps for a limited time initially) on a “third place“, to go with the existing main Q&A and “meta” (questions about the Q&A site). It’s a chat room. Well, lots of chat rooms of varying degrees of focus, to be more specific. Quite nicely done, too.

What has really impressed me has been that during this “limited sneak beta preview”, bugs, issues, feature requests and the like have been flowing through the interface at a fair rate of knots and many have been addressed and released within hours. Minutes, sometimes.

Think about it. User detects a bug, reports it and gets a fix, to an application with global reach, in a couple of hours or less. That’s agile.

A crucial part of Lean movement in manufacturing (and its younger counterpart in software development) is eliminating waste. “Waste” is broadly defined, very broadly defined, in fact, but one easily identifiable component is Work In Progress (WIP). In software terms, this often represents effort that has been invested (and money that’s been tied up) without having been included in a release. The more we invest effort without release the more we’re wasting, since we have no possibility of obtaining a return on that investment.

Here’s a particularly quick find/fix from earlier today:

Yes, it was probably a trivial bug, but the problem was notified, found, fixed and released in eight frickin’ minutes. How many of us can turn anything around that fast?

I’m looking forward to seeing where this goes.

Sorted for Excel and Whee!

If you happened upon the VBA Lamp and by rubbing it were able to produce the Excel VBA Genie and were granted a VBA Wish, would you ask for a built-in Sort() function?

If you build the kind of Excel apps that I do, you’ll run up against the need for a Sort all too frequently. Sorting arrays with sizes in the tens of thousands is not unusual and I was reminded of this when reading a post in this entry at Andrew’s Excel Tips the other day.

While it’s not crucial in the (useful) idea presented, the code has a quick and and dirty sort that is about the worst sort algorithm1 one could intelligently come up with. It’s also an prettty intuitive solution, which just goes to show that sorting may not be as simple as we may think. I’m not attacking Andrew’s skillz here, btw: the code as presented is certainly fit for purpose; it’s not presented as a general-purpose utility (at least I hope it isn’t).

I’ve accumulated a few algorithms over the years and I’ve coded up a couple more while “researching” this piece. On the one hand, we have the oft-derided BubbleSort and its close relation, CocktailSort. In the same group I’d include InsertionSort and SelectionSort. I’m going to be harsh and categorise those, with the naive sort above, as “Slow”. Well, “mostly slow”, as we’ll see.

In the “Fast” group, we have the much-touted QuickSort, and somewhere in between, we have HeapSort,and my current algorithm of choice, CombSort. As I was researching this, I also coded up ShellSort, which is about as old as I am and which was claimed to be faster than QuickSort under some conditions.

I ran some comparisons, not meant in any way to be perfectly scientific2. I ran each algorithm on arrays of 50 to 50,000 values with six different characteristics:

  • already sorted
  • exactly reversed
  • mostly sorted (values typically within one or two places of target)
  • ordered blocks of 100 random values (the first 100 values are 0 + RAND(), then 100 + RAND() and so on)
  • completely random
  • random 10-character strings

First off, the 50-record results:


50 recs (ms) sorted near sorted blocks random strings reversed
Shell 0.06 0.06 0.11 0.10 0.24 0.09
Quick 0.13 0.13 0.16 0.13 0.29 0.14
Comb 0.09 0.10 0.17 0.17 0.33 0.13
Heap 0.34 0.33 0.32 0.32 0.52 0.28
Insertion 0.02 0.02 0.20 0.17 0.47 0.37
Selection 0.25 0.25 0.25 0.25 0.54 0.25
Cocktail 0.01 0.02 0.44 0.39 1.02 0.77
Bubble 0.01 0.02 0.50 0.45 1.12 0.78
Naive 0.22 0.23 0.50 0.46 1.06 0.77

I’d say it’s pretty clear that it doesn’t matter much what you use to sort a small array, just about anything will be fast enough (unless you’re going to perform that sort tens of thousands of times in a run). It’s also apparent that the “slow” algorithms are actually pretty good if our data is already reasonably well-ordered.

So far, so “so what?”

Let’s look at the opposite end of the spectrum: 50,000 values? Here, the Fast/Slow divide is apparent. First the “Slows” (two tests only, for reasons that should become apparent):


50K (ms) near sorted random
Bubble 31 522,216
Cocktail 30 449,696
Insertion 19 179,127
Naive 219,338 510,010
Selection 220,735 220,743

Yes, that’s hundreds of thousands of milliseconds. “Three or four minutes” to you and me. The “Fasts”, meanwhile:


50K (ms) sorted near sorted blocks random strings reversed
Shell 162 164 219 377 929 250
Quick 296 298 327 365 790 306
Comb 390 396 477 622 1,348 452
Heap 899 903 885 874 1,548 844

(I only ran two tests on the “Slows”, for fear of dozing off completely.)

Again, for data where values are near their final sorted positions there’s clear evidence that something like an Insertion Sort is much faster than any of the “sexier” options. Provided you know your data will actually meet that criterion, of course.

All that considered, I’m switching from CombSort to ShellSort as my default algorithm. While it loses out a little to QuickSort in the “random” test (probably most representative of my normal use case) it doesn’t carry with it the fear of stack overflow through extreme recursion, something that’s bitten me with QS in the past. Anyway, us old’uns have got to stick together.

As already mentioned, if you have small quantities of data or infrequent sort calls in your application, it really doesn’t make much difference which algorithm you use, although I’d still suggest being aware of the properties of several options and having a basic implementation to hand. Once you reach a point where sorting contributes materially to your application run time then you owe it to yourself and your users to make an intelligent selection.

Here’s my ShellSort implemenation in VB, transcoded fairly literally from the Wikipedia pseudo-code (optimisations welcome):


Public Sub ShellSort(inp)
' sorts supplied array in place in ascending order
Dim inc As Long, i As Long, j As Long
Dim temp ' don't know what's in the input array...
  If Not IsArray(inp) Then Exit Sub ' ...but it had better be an array
  inc = (UBound(inp) - LBound(inp) + 1) / 2 ' use Shell's originally-proposed gap sequence
  Do While inc > 0
    For i = inc To UBound(inp)
      temp = inp(i)
      j = i
      Do
        If j < inc Then Exit Do ' check these conditions separately, as VBA Ors don't short-circuit
        If inp(j - inc) <= temp Then Exit Do ' ... and this will fail when j < inc
        inp(j) = inp(j - inc)
        j = j - inc
      Loop
      inp(j) = temp
    Next
    inc = Round(CDbl(inc) / 2.2)
  Loop

End Sub


1 Not the absolute worst: there’s the catastrophic genius of BogoSort, to name but one, but let’s not go anywhere nearer to there.
2 Just so we understand each other, these are my figures for my code on one of my machines. YMMV. A lot.

Going Metric

(stackoverflow rep: 10,307, Project Euler 97/288 complete)

(in which a single comment substitutes for a butterfly’s wing, metaphorically speaking)

A week or so back , as part of the unfocused rant1 that stemmed from exposure to a particularly ghastly Excel travesty.

Since it struck a chord with at least one person2 and I’d been meaning for months to write something more about it, I’m going to ramble about my feeble (but still surprisingly useful) approach to VBA code metrics.

The original idea came from three contemporaneous events several years: a large and nasty workbook to be heavily modified, some reading about McCabe’s cyclomatic complexity metric and coming across a little VB utility named Oh No!

Oh No!

This utility scanned a VB project and displayed a chart that attempted to show you where possible problem areas might lie by using size and complexity of routines on its X and Y axes (Function, Sub, Property etc). Size was number on non-blank, non-comment lines, while complexity was (from the Help file):

The complexity is measured as the total number of structure statements in a procedure. These statements are: Do, For, If, ElseIf, Select, While and With.

It looked like this (pointed in this instance at its own source code):

The third dimension, “blob size”, was related to the number of parameters in the routine. All in all, it’s a nicely-done, well thought-out little app.

For my uses it had its drawbacks, of course. To start with, it wanted a .vbp (VB project) file for input, something I didn’t have in my Excel/VBA project. For another, I didn’t seem to have the sort of brain that could easily extract useful information out of the graphical display. The code didn’t like continued lines, something I use a lot, and I felt that size and parameter count were contributors to complexity: we didn’t really need three dimensions.

Most important of all, writing my own app looked like a fun part-time project and I’m a complete sucker for those.

I should confess now that a cyclomatic complexity calculation proved to be beyond me when I developed this utility in 2002. It’s probably still beyond me now, but back then the reference I had was the original paper and gaining the necessary understanding from that would in all likelihood have taken all the fun out of the whole thing. So I fudged it. Sue me.

Approaching the problem

The VBComponents of the target workbook are scanned and a collection of modules is constructed, each of which contains a separate object for each procedure2 it contains. As procedures are constructed from lines, they attempt to classify what they’re being presented and tally up the incidences of things worth counting:

Public Enum eCounterType
    ecComments = 1
    ecDeclares = 2
    ecElseIfs = 3
    ecGotos = 4
    ecHighestNestLevel = 5
    ecIifs = 6
    ecIfs = 7
    ecLines = 8
    ecLongLines = 9
    ecLoops = 10
    ecMultiDimCount = 11
    ecOneCharVars = 12
    ecParameters = 13
    ecSelects = 14
    ecWiths = 15
End Enum

You may have spotted that the deepest nesting is also recorded, although it’s not exactly a “counter” – perhaps I should have used “scoreCard” or something more expressive.

Each procedure gets a complexity “score”, which ended up being calculated, after much tweaking, using a tariff something like this:

Pts Awarded for
1.0 every “nesting” command (If, Do, For, Select, ElseIf, etc)
0.1 each line of executable code after the first, rounded down
0.1 each declared variableafter the first, rounded down
1.0 each line that contains multiple Dims
0.5 each parameter after the first
2.0 each If (more than loops because of the arbitrarily complex branching)
0.2 each long line (defined as 60 chars or more, after concatenating continued lines)
1.0 each Go To
special the square of the deepest nesting level encountered

.
It’s not supposed to be exact and it doesn’t need to be: the purpose is to give a quick sanity check and, particularly when encountering new code for the first time, an idea of where problem areas may lie.

There are deficiencies: I don’t think the code handles one-line If/End Ifs correctly – I believe it incorrectly increments the nesting level, which leads to a higher score. Since I really don’t like one-line Ifs, that’s just going to increase the chance that I see them, so I was happy to leave it like that.

If you look at the output, there are other metrics: information on the whole project, module-level summaries (if you see 300 global variables you might consider that worth investigating, for example).

There’s no penalty for “Static” declarations, probably because I hadn’t stumbled across any code that thought it was a good idea. I’d put a fairly hefty tariff on that: it’s a major obstacle to the unravelling process.

Another “drawback”: there’s no real parser – a better approach might have been to write one and utilise an AST (the very useful roodi library for Ruby uses one and can report a “proper” cyclomatic complexity metric as a result). Also, I didn’t use regular expressions – I probably hadn’t encountered them (or was still frightened) at that time: it would have made some of the line-parsing a good deal more terse.

I called it “LOCutus” – “LOC” for “lines of code”, the balance presumably indicating that the Borg were on my mind at the time. For better or worse, I’ve added it to the xlVBADevTools download (in a ZIP file with xlUnit). It’s an Excel XP file – I don’t think I’ve ever tested on XL2007 or higher, since I’ve yet to work at a site where it was used and prefer not to look at large crappy Excel apps without being paid for it…

If you download it, you’ll see that the report is showing the results of a scan on the app’s own code (shown as a HTML table below, so formatting may vary slightly):

VBProc (Module) Globals Members Procs
0 7 21
Procedure Lines Max nest Comments Declares Ifs Loops Long lines Go Tos OCV X
Load 8 2 0 3 1 1 0 0 0 9
ProcessLine 20 1 0 1 5 0 0 0 0 12
CheckForOneCharVars 7 2 0 1 1 1 0 0 0 8
CountParameters 14 3 0 4 3 1 0 0 0 17

.
The “worst” proc scores 17, largely due to its 3-level nesting. If I were to have a need to change it, I’d consider breaking out the inner loop (or taking the outermost If and refactoring into a guard clause). Seventeen isn’t generally high enough to make me concerned though.

That’s the most “formal” part of the diagnostic “toolkit” that I apply to strange new Excel spreadsheets – anyone out there have something else we could look at?

xlVBADevTool download


1 Aren’t they all? You should hear me around the office. Probably not, actually.
2 OK, exactly one.
3 These days I favour “routine” as a generic term for Function/Sub/Property or whatever. Times change.

Follow

Get every new post delivered to your Inbox.