Another notch in the utility belt

Having been largely an all-thumbs c# programmer for the last year or so I’ve had little opportunity to add to the world of Excel know-how (and I don’t really feel qualified to say much about the .Net world yet, even if I did once get a Tweet from Jon Skeet). I do keep an eye out for Excel questions on StackOverflow, though, and one caught my eye this morning.

The questioner wanted to add a zero to a chart range. Not sure why he didn’t just add an extra constant value at the end (start or beginning) of the range, but perhaps he couldn’t (maybe the data was in another, read-only workbook?). Anyway, I came up with something that extended an existing trick a little and I thought it was mildly interesting.
Using a named formula in a chart series is1 a fairly well-known idea theses days. It lets us define a dynamic series (usually using OFFSET()) and have our charts update automagically as our series grow and shrink. What I hadn’t previously thought of2 was that I could use a VBA function call as the Name’s “Refers to” with the result being useable in a chart Series definition.

Turns out it could.

Public Function PrependZero(rng As range)
Dim val As Variant, res As Variant
Dim idx As Long
    val = Application.Transpose(rng.Columns(1).Value) ' nifty!
    ReDim res(LBound(val) To UBound(val) + 1)
    res(LBound(res)) = 0 ' this is where the function lives up to its name
    For idx = LBound(val) To UBound(val)
        res(idx + 1) = val(idx) ' copy the remainder
    Next
    PrependZero = res
End Function

(That Transpose thing is handy – I hadn’t known about it as a way of getting a 1-D range value as a vector. Two transposes needed for data along a row.)

You could use this approach for lots of things. Like, well, lots of things. Honestly, you don’t need me to spell them out.


1 This makes me a little sad – it was a source of Excel-fu that impressed people to a quite disproportionate degree. Array formulae and VBA classes still have this power.
2 I’m aware that I might be the last kid in the class here. Be gentle.

My (Im)perfect Cousin?

in which we start to worry about the source of our inspiration
Mona Lisa Vito: So what’s your problem?
Vinny Gambini: My problem is, I wanted to win my first case without any help from anybody.
Lisa: Well, I guess that plan’s moot.
Vinny: Yeah.
Lisa: You know, this could be a sign of things to come. You win all your cases, but with somebody else’s help. Right? You win case, after case, – and then afterwards, you have to go up somebody and you have to say- “thank you“! Oh my God, what a fuckin’ nightmare!


It is one of the all-time great movies, and netted Marisa Tomei an Oscar in the process. Yes it is. It really is1.

Not only that, but My Cousin Vinny2 throws up parallels in real life all the time. Yes it does. It really does3.

Why only recently, I was puzzling over the best (or least worst) way to implement a particularly nonsensical requirement for an intransigent client. After summarising the various unpalatable options in an email, a reply arrived from a generally unproductive source. The message content made it obvious that he’d somewhat missed the point but the conclusion he drew from that misunderstanding triggered a new thought process that gave us a new, even less, er, worser solution to our problem.

Sadly, my unwitting muse has moved on now, but he left his mark for all time4 on our latest product. I suppose he should also take partial credit for the creation of a hitherto unknown development methodology: Powerpoint-Driven Development, but that’s a story for another day.


1 All right, IMHO
2 See also My Cousin Vinny At Work, application of quotes therefrom
3 YMMV.
4 Or at least until we have a better idea and change the whole damn thing

This Wheel Goes To Eleven

(in which we make an unexpected connection regarding the D in SOLID and get all hot under the collar about it)

Let’s not beat about the bush: I think I may have reinvented Dependency Injection. While it looks rather casual, stated like that, I’ve actually spent much of the last six months doing it. (Were you wondering? Well, that.)

I’ve been designing/building/testing/ripping apart/putting back together again a library/app/framework/tool thing that allows us to assemble an asset allocation algorithm for each of our ten or so products1, each of which may have been modified at various times since inception. It’s been interesting and not a little fun, plus I’ve been climbing the C# learning curve (through the three versions shipped since my last serious exposure) like Chris Bonnington on amphetamines.

Our products are broadly similar but differ in detail in some places. So there’s lots of potential for reuse, but no real hierarchy (if you can see a hierarchy in the little chart here, trust me, it’s not real).

So Product A need features 1, 2 & 3, in that order. B needs 1 & 4, C 1, 3 & 5, etc. What I came up with was to encapsulate each feature in a class, each class inheriting from a common interface. Call it IFeature or some such. At run-time, I can feed my program an XML file (or something less ghastly perhaps) that says which classes I need (and potentially the assemblies in which they may be found), applying the wonder that is System.Reflection to load the specified assembles and create instances of the classes I need, storing them in, for example, a List<IFeature>. To run my algorithm, all I need to do is call the method defined in my interface on each object in turn. A different product, or a new version of an existing one has a different specification and it Should Just Work.

It’s all very exciting.

So changing a single feature of an existing product means writing one new class that implements the standard interface and pointing the product definition at the library that contains the new class (which may – should – be different from those already in use).

The discerning reader may, er, discern that there are elements of Strategy and Command patterns in here as well. Aren’t we modern?

While all this is very exciting (to me at least – a profound and disturbing symptom of work-life imbalance) it’s still not the end of the line. I’ve built functions and then chosen to access them serially, relying on carefully (or tricky & tedious) XML definitions to dictate sequence. I’m thinking that I can go a long way further into declarative/functional territory, possibly gaining quite a bit. And there’s a whole world of Dynamic to be accessed plus Excel and C++ interfaces of varying degrees of sexiness to be devised .

More on much of that when I understand it well enough to say something.


1 There are billions at stake, here, billions I tell you.

Lacking Anything Worthwhile To Say, The Third Wise Monkey Remained Mostly Mute

Project Euler 100/304 complete (on the permanent leaderboard at last!)

(in which we discover What’s Been Going On lately)

 

Don't talk ... code

 

I’ve been coding like crazy in C# of late, a language I’ve barely touched in the last few years. Put it this way, generics were new and sexy the last time I wrote anything serious in .NET… (There should be some ExcelDNA fun to be had later.)

I’d forgotten how flat-out fast compiled languages, even the bytecode/IL kind could be. It’s not a completely fair comparison, to be sure, but a Ruby script to extract 300,000 accounts from my Oracle database and write them as XML takes a couple of hours, mostly in the output part. A C# program handled the whole thing in 5 minutes. Then processes the accounts in about 30 seconds, of which 10 are spent deserializing the XML into objects, 10 are serializing the results to XML and 10 are performing the moderately heavy-duty mathematical tranformations in between.

 


Click to test is value for money

 

Lacking at present a paid-for version of Visual Studio 2010 (the Express Edition, while brilliantly capable, won’t do plugins, which precludes integration Subversion and NUnit, to name but two essentials), I have been enjoying greatly my experience with SharpDevelop, which spots my installs of TortoiseSVN and NUnit and allows both to be used inside the IDE. It’s not perfect: there are areas, particularly in its Intellisense analogue, where exceptions get thrown, but they’re all caught and I have yet to lose any work. While the polish is, unsurprisingly, at a lower level than Microsoft’s, it’s entirely adequate (and I mean that in a good way) and the price is right. I particularly liked being able to open an IronRuby session in the IDE and use it to interact with the classes in the DLL on which I was working.

While I expect VS2010 to become available as the budgeting process grinds through, I’m not at all sure that it’ll be necessary to switch. An extended set of automated refactoring tools could be attractive, although Rename and Extract Method are probably the two most useful, productivity-wise, and they’re already present. I would rather like to have Extract Class, which isn’t needed often but would be a big time (and error) saver when called for.

On another topic entirely, should you be looking for entertaining reading in the vaguely technical, erudite and borderline insane category, may I recommend To Umm Is Human to you? Any blog that has “orang utan” amongst its tags is worth a look, I’d say. If you like it, you’ll like it a lot. I was once made redundant by a Doubleday, but I don’t think they’re related.

There’s an interesting new programmer-oriented podcast on the block, too: This Developer’s Life has slightly higher production values that may ultimately limit its life – the time to produce an episode must be substantial. I found myself wanting to join in the conversation with stories of my own, a sure sign that I was engaged in the content.

That Do Impress Me Much

Over at stackoverflow, now that they have a pile of money to spend invest, the rate of change is picking up. There’s the re-worked stack exchange model, which has changed dramatically – and quite likely for the better. They’ve moved away from the original paid-for hosted service to a community-driven process, whereby a community needs to form and commit to the idea of a new site. The objective is to improve the prospects of achieving critical mass for a new site, thus increasing its chances of success. I imagine a revenue model is mooted, although it may be little more than “if we build it, they will come” at present. Sponsored tags and ads spring to mind.

This week we’ve seen the covers removed (perhaps for a limited time initially) on a “third place“, to go with the existing main Q&A and “meta” (questions about the Q&A site). It’s a chat room. Well, lots of chat rooms of varying degrees of focus, to be more specific. Quite nicely done, too.

What has really impressed me has been that during this “limited sneak beta preview”, bugs, issues, feature requests and the like have been flowing through the interface at a fair rate of knots and many have been addressed and released within hours. Minutes, sometimes.

Think about it. User detects a bug, reports it and gets a fix, to an application with global reach, in a couple of hours or less. That’s agile.

A crucial part of Lean movement in manufacturing (and its younger counterpart in software development) is eliminating waste. “Waste” is broadly defined, very broadly defined, in fact, but one easily identifiable component is Work In Progress (WIP). In software terms, this often represents effort that has been invested (and money that’s been tied up) without having been included in a release. The more we invest effort without release the more we’re wasting, since we have no possibility of obtaining a return on that investment.

Here’s a particularly quick find/fix from earlier today:

Yes, it was probably a trivial bug, but the problem was notified, found, fixed and released in eight frickin’ minutes. How many of us can turn anything around that fast?

I’m looking forward to seeing where this goes.

Sorted for Excel and Whee!

If you happened upon the VBA Lamp and by rubbing it were able to produce the Excel VBA Genie and were granted a VBA Wish, would you ask for a built-in Sort() function?

If you build the kind of Excel apps that I do, you’ll run up against the need for a Sort all too frequently. Sorting arrays with sizes in the tens of thousands is not unusual and I was reminded of this when reading a post in this entry at Andrew’s Excel Tips the other day.

While it’s not crucial in the (useful) idea presented, the code has a quick and and dirty sort that is about the worst sort algorithm1 one could intelligently come up with. It’s also an prettty intuitive solution, which just goes to show that sorting may not be as simple as we may think. I’m not attacking Andrew’s skillz here, btw: the code as presented is certainly fit for purpose; it’s not presented as a general-purpose utility (at least I hope it isn’t).

I’ve accumulated a few algorithms over the years and I’ve coded up a couple more while “researching” this piece. On the one hand, we have the oft-derided BubbleSort and its close relation, CocktailSort. In the same group I’d include InsertionSort and SelectionSort. I’m going to be harsh and categorise those, with the naive sort above, as “Slow”. Well, “mostly slow”, as we’ll see.

In the “Fast” group, we have the much-touted QuickSort, and somewhere in between, we have HeapSort,and my current algorithm of choice, CombSort. As I was researching this, I also coded up ShellSort, which is about as old as I am and which was claimed to be faster than QuickSort under some conditions.

I ran some comparisons, not meant in any way to be perfectly scientific2. I ran each algorithm on arrays of 50 to 50,000 values with six different characteristics:

  • already sorted
  • exactly reversed
  • mostly sorted (values typically within one or two places of target)
  • ordered blocks of 100 random values (the first 100 values are 0 + RAND(), then 100 + RAND() and so on)
  • completely random
  • random 10-character strings

First off, the 50-record results:


50 recs (ms) sorted near sorted blocks random strings reversed
Shell 0.06 0.06 0.11 0.10 0.24 0.09
Quick 0.13 0.13 0.16 0.13 0.29 0.14
Comb 0.09 0.10 0.17 0.17 0.33 0.13
Heap 0.34 0.33 0.32 0.32 0.52 0.28
Insertion 0.02 0.02 0.20 0.17 0.47 0.37
Selection 0.25 0.25 0.25 0.25 0.54 0.25
Cocktail 0.01 0.02 0.44 0.39 1.02 0.77
Bubble 0.01 0.02 0.50 0.45 1.12 0.78
Naive 0.22 0.23 0.50 0.46 1.06 0.77

I’d say it’s pretty clear that it doesn’t matter much what you use to sort a small array, just about anything will be fast enough (unless you’re going to perform that sort tens of thousands of times in a run). It’s also apparent that the “slow” algorithms are actually pretty good if our data is already reasonably well-ordered.

So far, so “so what?”

Let’s look at the opposite end of the spectrum: 50,000 values? Here, the Fast/Slow divide is apparent. First the “Slows” (two tests only, for reasons that should become apparent):


50K (ms) near sorted random
Bubble 31 522,216
Cocktail 30 449,696
Insertion 19 179,127
Naive 219,338 510,010
Selection 220,735 220,743

Yes, that’s hundreds of thousands of milliseconds. “Three or four minutes” to you and me. The “Fasts”, meanwhile:


50K (ms) sorted near sorted blocks random strings reversed
Shell 162 164 219 377 929 250
Quick 296 298 327 365 790 306
Comb 390 396 477 622 1,348 452
Heap 899 903 885 874 1,548 844

(I only ran two tests on the “Slows”, for fear of dozing off completely.)

Again, for data where values are near their final sorted positions there’s clear evidence that something like an Insertion Sort is much faster than any of the “sexier” options. Provided you know your data will actually meet that criterion, of course.

All that considered, I’m switching from CombSort to ShellSort as my default algorithm. While it loses out a little to QuickSort in the “random” test (probably most representative of my normal use case) it doesn’t carry with it the fear of stack overflow through extreme recursion, something that’s bitten me with QS in the past. Anyway, us old’uns have got to stick together.

As already mentioned, if you have small quantities of data or infrequent sort calls in your application, it really doesn’t make much difference which algorithm you use, although I’d still suggest being aware of the properties of several options and having a basic implementation to hand. Once you reach a point where sorting contributes materially to your application run time then you owe it to yourself and your users to make an intelligent selection.

Here’s my ShellSort implemenation in VB, transcoded fairly literally from the Wikipedia pseudo-code (optimisations welcome):


Public Sub ShellSort(inp)
' sorts supplied array in place in ascending order
Dim inc As Long, i As Long, j As Long
Dim temp ' don't know what's in the input array...
  If Not IsArray(inp) Then Exit Sub ' ...but it had better be an array
  inc = (UBound(inp) - LBound(inp) + 1) / 2 ' use Shell's originally-proposed gap sequence
  Do While inc > 0
    For i = inc To UBound(inp)
      temp = inp(i)
      j = i
      Do
        If j < inc Then Exit Do ' check these conditions separately, as VBA Ors don't short-circuit
        If inp(j - inc) <= temp Then Exit Do ' ... and this will fail when j < inc
        inp(j) = inp(j - inc)
        j = j - inc
      Loop
      inp(j) = temp
    Next
    inc = Round(CDbl(inc) / 2.2)
  Loop

End Sub


1 Not the absolute worst: there’s the catastrophic genius of BogoSort, to name but one, but let’s not go anywhere nearer to there.
2 Just so we understand each other, these are my figures for my code on one of my machines. YMMV. A lot.

Going Metric

(stackoverflow rep: 10,307, Project Euler 97/288 complete)

(in which a single comment substitutes for a butterfly’s wing, metaphorically speaking)

A week or so back , as part of the unfocused rant1 that stemmed from exposure to a particularly ghastly Excel travesty.

Since it struck a chord with at least one person2 and I’d been meaning for months to write something more about it, I’m going to ramble about my feeble (but still surprisingly useful) approach to VBA code metrics.

The original idea came from three contemporaneous events several years: a large and nasty workbook to be heavily modified, some reading about McCabe’s cyclomatic complexity metric and coming across a little VB utility named Oh No!

Oh No!

This utility scanned a VB project and displayed a chart that attempted to show you where possible problem areas might lie by using size and complexity of routines on its X and Y axes (Function, Sub, Property etc). Size was number on non-blank, non-comment lines, while complexity was (from the Help file):

The complexity is measured as the total number of structure statements in a procedure. These statements are: Do, For, If, ElseIf, Select, While and With.

It looked like this (pointed in this instance at its own source code):

The third dimension, “blob size”, was related to the number of parameters in the routine. All in all, it’s a nicely-done, well thought-out little app.

For my uses it had its drawbacks, of course. To start with, it wanted a .vbp (VB project) file for input, something I didn’t have in my Excel/VBA project. For another, I didn’t seem to have the sort of brain that could easily extract useful information out of the graphical display. The code didn’t like continued lines, something I use a lot, and I felt that size and parameter count were contributors to complexity: we didn’t really need three dimensions.

Most important of all, writing my own app looked like a fun part-time project and I’m a complete sucker for those.

I should confess now that a cyclomatic complexity calculation proved to be beyond me when I developed this utility in 2002. It’s probably still beyond me now, but back then the reference I had was the original paper and gaining the necessary understanding from that would in all likelihood have taken all the fun out of the whole thing. So I fudged it. Sue me.

Approaching the problem

The VBComponents of the target workbook are scanned and a collection of modules is constructed, each of which contains a separate object for each procedure2 it contains. As procedures are constructed from lines, they attempt to classify what they’re being presented and tally up the incidences of things worth counting:

Public Enum eCounterType
    ecComments = 1
    ecDeclares = 2
    ecElseIfs = 3
    ecGotos = 4
    ecHighestNestLevel = 5
    ecIifs = 6
    ecIfs = 7
    ecLines = 8
    ecLongLines = 9
    ecLoops = 10
    ecMultiDimCount = 11
    ecOneCharVars = 12
    ecParameters = 13
    ecSelects = 14
    ecWiths = 15
End Enum

You may have spotted that the deepest nesting is also recorded, although it’s not exactly a “counter” – perhaps I should have used “scoreCard” or something more expressive.

Each procedure gets a complexity “score”, which ended up being calculated, after much tweaking, using a tariff something like this:

Pts Awarded for
1.0 every “nesting” command (If, Do, For, Select, ElseIf, etc)
0.1 each line of executable code after the first, rounded down
0.1 each declared variableafter the first, rounded down
1.0 each line that contains multiple Dims
0.5 each parameter after the first
2.0 each If (more than loops because of the arbitrarily complex branching)
0.2 each long line (defined as 60 chars or more, after concatenating continued lines)
1.0 each Go To
special the square of the deepest nesting level encountered

.
It’s not supposed to be exact and it doesn’t need to be: the purpose is to give a quick sanity check and, particularly when encountering new code for the first time, an idea of where problem areas may lie.

There are deficiencies: I don’t think the code handles one-line If/End Ifs correctly – I believe it incorrectly increments the nesting level, which leads to a higher score. Since I really don’t like one-line Ifs, that’s just going to increase the chance that I see them, so I was happy to leave it like that.

If you look at the output, there are other metrics: information on the whole project, module-level summaries (if you see 300 global variables you might consider that worth investigating, for example).

There’s no penalty for “Static” declarations, probably because I hadn’t stumbled across any code that thought it was a good idea. I’d put a fairly hefty tariff on that: it’s a major obstacle to the unravelling process.

Another “drawback”: there’s no real parser – a better approach might have been to write one and utilise an AST (the very useful roodi library for Ruby uses one and can report a “proper” cyclomatic complexity metric as a result). Also, I didn’t use regular expressions – I probably hadn’t encountered them (or was still frightened) at that time: it would have made some of the line-parsing a good deal more terse.

I called it “LOCutus” – “LOC” for “lines of code”, the balance presumably indicating that the Borg were on my mind at the time. For better or worse, I’ve added it to the xlVBADevTools download (in a ZIP file with xlUnit). It’s an Excel XP file – I don’t think I’ve ever tested on XL2007 or higher, since I’ve yet to work at a site where it was used and prefer not to look at large crappy Excel apps without being paid for it…

If you download it, you’ll see that the report is showing the results of a scan on the app’s own code (shown as a HTML table below, so formatting may vary slightly):

VBProc (Module) Globals Members Procs
0 7 21
Procedure Lines Max nest Comments Declares Ifs Loops Long lines Go Tos OCV X
Load 8 2 0 3 1 1 0 0 0 9
ProcessLine 20 1 0 1 5 0 0 0 0 12
CheckForOneCharVars 7 2 0 1 1 1 0 0 0 8
CountParameters 14 3 0 4 3 1 0 0 0 17

.
The “worst” proc scores 17, largely due to its 3-level nesting. If I were to have a need to change it, I’d consider breaking out the inner loop (or taking the outermost If and refactoring into a guard clause). Seventeen isn’t generally high enough to make me concerned though.

That’s the most “formal” part of the diagnostic “toolkit” that I apply to strange new Excel spreadsheets – anyone out there have something else we could look at?

xlVBADevTool download


1 Aren’t they all? You should hear me around the office. Probably not, actually.
2 OK, exactly one.
3 These days I favour “routine” as a generic term for Function/Sub/Property or whatever. Times change.

It’s An Ill Wind

I was pontificating recently about the horror of tightly-coupled worksheet calculations and macro code in Excel and the mess that results from carelessly mashed-together programming paradigms.

By a horrible coincidence, a few smug weeks later I suddenly found myself deep in the horror again. This time, I had to contend with two versions, needing to set up and run several tens of thousands of real-world cases through “before” and “after” scenarios.

I soon started to turn up nuggets of wonderment. Here are a couple of gems from the worksheets:

=IF(CPPI=TRUE,TRUE,FALSE)
=MAX(1,DATE(YEAR(C4),MONTH(C4),DAY(C4))-DATE(YEAR(C6),MONTH(C6),DAY(C6)))

“Why”, I asked myself, “don’t these monkeys think?” (When someone asked me which monkeys I was referring to, I realised that my inner monologue was malfunctioning, but that’s not important right now.)

Obviously they were thinking, apart from anything else the whole thing does seem to work, inasmuch as a set of inputs produce a set of plausible-looking outputs. I’d assert, however, that an experienced, competent developer has a continuous, higher-order review process running, asking questions like “does that look right?”, “could that be written better?”, “is that clear?”, “could someone else understand that?” It would appear that the individuals concerned in this case had no such process running.

I have a little tool that I bring out when faced with an unknown Excel VBA application. It scans the code and delives a report that gives me an idea of the degree of caution I should exercise, calculating for each routine an “X” number, being some arcane function of code lines, mesting level, numbers of ifs, loops, one-character variables (seldom straightforward to rename, even if they’re called “q” or “z”). A score of 10 or above indicates a possible problem area. Over 50 and we’re definitely in bandit country. Here’s the report for the top-scoring routine in the horror workbook (names changed to conceal the guilty):

422 is by a factor of about 3 the highest-scoring routine I’ve ever encountered. (By contrast, and perhaps it’s an unfair comparison, the top score in xlunit is 24.) That 1044 lines of code doesn’t include comments, blank lines, variable declarations or similar. Even better, it’s declared Static, something so appallingly dangerous (to me, at least – I get nauseous when I see even a Static variable) that I had to change the analyser because it had never seen it before. “Option Explicit” is honoured more in the breach than the observance and there are a mere (ha!) 156 global variables.

I was flicking through my cherished (and slightly stained) copy of The Pragmatic Programmer last night and I’ll take as my text the section “What Make a Pragmatic Programmer?” on pages xiii and xix of the Preface. Summarising the characteristics and comparing them with the authors of the unholy mess described, I get:

Early adopter/fast adapter: given that the abomination I’m working with is the culmination of more than two years’ work and is the fourth or fifth (and probably “best”) version, FAIL.

Inquisitive: I see no sign of any curiosity, not even “why do we suck?” FAIL

Realistic: I’m (generously) prepared to specify that this one may not really apply here. (Big of me.)

Jack of all trades: and masters of none? These guys didn’t make it to Jack of one trade. FAIL.

Critical Thinker: referring to the inner monologue again, there’s no evidence to suggest that anything like the questions listed above were ever applied here. And plenty of evidence to the contrary. BIG RED FAIL.

The tragedy here is that the monkeys described are unlikely to realise exactly how snivellingly far from acceptable (let’s not waste “best” here) practice they are, as the Dunning-Kruger effect tells us is likely, so they’re not going to change in any useful way. Failing to see that the twin demons of high coupling and low (non-existent, if we’re honest) cohesion are rampaging through their code leaves me wondering where we’d start.

Depressingly, these guys aren’t unusual, they’re not even the worst. The type is perhaps a little more common in, say, the Excel world, where the tools to develop staggering incompetence are so widely available, but they’re everywhere else – the invisible 90% of the iceberg.

On the plus side, even moderately competent technicians who are prepared to don the elbow-length protective gloves and dive in to such messes will be able to look forward with some confidence to a lifetime in which there will be a paycheck for as long as they can stand to pick it up.

My Mother Would Be So Proud

(stackoverflow rep: 10,038, Project Euler 96/283 complete)

(in which it transpires that I’ve been something creative all along)

Perhaps because I don’t speak Swedish very well (for values of “very well” that are exactly equal to “at all”) I didn’t notice this blog post until today, when I came across it in a link from a comment on post in a blog that I do read.

Earlier this week I was struggling to impress upon a colleague that I thought that he should write some code instead of spending additional time “getting a complete understanding of all the issues” and “acquiring a good knowledge of the technical framework”. I wondered if it was some deficiency in my communication “skills” – and that could very well be the case – but I’m starting to think that we have very different views on the purpose of code, perhaps because he’s from an academic mathematical background and I’m just an old hacker. I think he sees the code-writing process as the final expression of his understanding of a problem, where I use code to develop that understanding. Not surprisingly, within our problem domain I think I have the more appropriate model. Anyway, I outrank him…

Maybe it’s an Internet phenomenon (or more likely just human nature) that the same idea (meme?) is rediscovered on a regular basis. I remember reading – and being influenced by – a similar article several years ago. A quick search threw up an even older article making a similar point. Yes, even almost 20 years ago, the transition from source to executable was relatively cheap.

If we view the act of writing source code as part of the design of the program and not part of the build, then we move further away from the commonly-applied physical engineering metaphor and this is no bad thing. It doesn’t take much mental effort: any substantial change in the design of a building is likely to be immensely expensive after the building is constructed, whereas rebuilding a whole program or application is relatively cheap once the design (source) has been updated. Once you get to web apps, where even deployment is almost free, then you’re a long way from the building site.

Further, if we see coding as design, then the notion of starting small and enhancing iteratively, tidying (refactoring) as we go becomes a much more obviously good thing. To my mind, at least.

So – look Ma! I’m a designer!

A Third Way: DNA?

(stackoverflow rep: 8417, Project Euler 87/261 complete

When we want Excel to talk to a compiled library, the list of options available to us is not long.

On the one hand, we can write a compiled library that references the Excel SDK (pre- or post-2007), setting its prefix to XLL for convenience, typically we’d grit our collective teeth and accomplish this with C or (cough, spit) C++. One big plus here is speed – we’re talking to Excel in its own language. Or something awfully close to it. Another plus, of course, is that this approach probably involves the minimum set of dependencies for non-enterprise distribution convenience. If you haven’t spotted the big minus then we should probably agree to disagree on the relative merits of programming languages.

On the other hand, we can write a library that exposes its functions through good ol’ COM, referencing it though the Tool…Add-ins dialog. That’s probably still the best-known way to integrate functionality developed in .NET, although it does involve paying a performance price as the COM interface is crossed and recrossed. This would be the place in the world most usually occupied by VSTO. Of course, you could also write your COM library in non-managed code: VB6 is easy (if you can find it) but slow, the choice from others depends on your capacity (or desire) for pain.

On the gripping hand, we have ExcelDNA, which for my money is one of the all-round cunningest things you’re likely to come across in quite a long time. (Unless you’ve already encountered its cunningness, of course, in which case you’re probably already nodding along in sage agreement). How so? How about being able to write your spiffy new functions in C# but without having to incur the needless ins and outs of the wasteful and superfluous COM middleman?

ExcelDNA provides a small XLL that can talk to managed code. There’s one attribute to set in order to make a function visible to Excel and one libary to import (in order to be able to add the attribute). Oh, and a little config file to tell the XLL what to load. That’s it. Actually, that’s the complicated version. In case you didn’t know, the CLR includes a compiler, allowing code to be created and compiled at run-time. The simplest way to talk to Excel from .NET via ExcelDNA is just to put your code directly into that little config file and let the XLL compile it at load time. OK, there’s a second or so’s overhead, but how simple – anything simpler would probably have to be involve 21st-Century language integration where VBA lives today.

By now you’re probably muttering something like “write code, fat bloke”, which is a little cruel, but a sentiment otherwise understandable. Now only the other day, Dick Kusleika posted his take on the everybody-has-one timeless RangeJoin() UDF topic. Here’s a rather simpler implementation:


<DnaLibrary Language="CS">
<![CDATA[
using ExcelDna.Integration;
using System.Collections.Generic;

    public class MyFunctions
    {
        [ExcelFunction(Description="Joins cell values", Category="My ExcelDNA functions")]
        public static object RangeJoin(object[,] cells)
        {
            List<string> list = new List<string>();

            foreach (object o in cells)
                list.Add(o.ToString());

            return string.Join(",", list.ToArray());
        }
    }
]]>
</DnaLibrary>

To get it running, I saved it as “RangeJoin.dna”, then copied the ExcelDNA.xll into the same folder and renamed that to “RangeJoin.xll”. Because it isn’t currently anywhere on my path, I also put ExcelDna.Integration.dll in the folder. Then I double-clicked my new XLL file to start Excel and load the library. (Pictures below are from Excel 2002, but the code is tested up to 2007).

Not surprising...

There’s a tiny extra delay before the usual macro warning appears – the C# code’s being compiled, which is where compile-time failures are displayed. Then you’re up and running. The function shows up in the Insert Function dialog…

… and we can try calling it:

… with the following result:

I did notice the the =NA() output and the Euro amount came out different to what I’d have hoped. It rather looks like I’d have to dig a little deeper into the ExcelDna.Integration.dll and the ExcelReference object in particular in order to be able to access the equivalent of VBA’s Text property.

Still, not bad for a quick exercise, I’d say.

There are some alternative approaches to the dot-net-and-excel-without-going-anywhere-near-COM-add-ins topic:

For Free

Excel4Net is now free – it seems to work by implementing a single worksheet function that calls out to managed code;

The last time I looked at XLW it was focused on C++. That’s changed in the interim – C# and VB.NET are now options.

Possibly Paid-for Propositions

Len Holgate appears to be at beta test stage with JetXLL, which looks like it’ll be a commercial offering;

I also just discovered ExHale. In beta, appears current – can’t see what the terms are likely to be;

Definitely Demanding Dollars

Possibly the first implementation, ManagedXLL is (was?) a paid-for product that looks to have been similar to, but possibly broader in scope than ExcelDNA. It’s not clear whether the company is still active or the product is still available – a curious colleague recently tried to make enquiries and failed.

Follow

Get every new post delivered to your Inbox.