Clean, Tidy, Fast. Pick Three

Ten Simple Tricks to Speed up Your Excel VBA Code is a fine post that offers some highly practical insights into Excel/VBA performance that insiders have been using to their benefit. This is the kind of post that would quickly result in one’s expulsion from an Excel Magic Circle. If there was one. Which there isn’t. Unless you know otherwise.

The first five tips all deal with asking Excel to suspend various time-consuming activities during macro execution. Of the five, #1 is both the most important and the most dangerous:

Application.Calculation = xlCalculationManual
'Place your macro code here
Application.Calculation = xlCalculationAutomatic

It has the potential to save a lot of time, but crucially, it may also affect the way your workbook works. A naive (or just plain bad) workbook construction may have macro code interacting with worksheet calculations, so the state of the sheets may change during the execution of the macro depending on where calculations happen. This is a situation that can often develop when macros are developed from code created with the Macro Recorder. Be warned.

Of course, we should also be wary of assuming that the calculation mode started out as automatic: the user may have switched to manual for very good reasons. Anyway, it’s a bit rude.

Tips #2 to #5 add further performance boosters, managing the enabled and disabling of screen and status bar updating, , page break display and event processing behaviour (which may also need consideration similar to calculation mode as described above). By now, we’ve got quite a lot of boiler-platey code to wrap around the code that’s actually supposed to do something useful. I don’t like that.

How much cleaner would it be if we could write something like this:

With New ExcelInteractionSuspender ' please feel free to think of a better name...
    'Place your macro code here
End With

Quite a lot cleaner, I’d say.

Such a class might look something like this:

Option Explicit

Private storedCalcMode
Private storedScreenUpdateMode

Private Sub Class_Initialize()
    ' store current settings
    storedCalcMode = Application.Calculation
    storedScreenUpdateMode = Application.ScreenUpdating
    ' now switch 'em off
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False
End Sub

Private Sub Class_Terminate()
    Application.Calculation = storedCalcMode
    Application.ScreenUpdating = storedScreenUpdateMode
End Sub

For the sake of brevity I’ve only included the first two options (which are the ones I usually find provide the most benefit in any case). The others are left as an exercise for the student…

Another notch in the utility belt

Having been largely an all-thumbs c# programmer for the last year or so I’ve had little opportunity to add to the world of Excel know-how (and I don’t really feel qualified to say much about the .Net world yet, even if I did once get a Tweet from Jon Skeet). I do keep an eye out for Excel questions on StackOverflow, though, and one caught my eye this morning.

The questioner wanted to add a zero to a chart range. Not sure why he didn’t just add an extra constant value at the end (start or beginning) of the range, but perhaps he couldn’t (maybe the data was in another, read-only workbook?). Anyway, I came up with something that extended an existing trick a little and I thought it was mildly interesting.
Using a named formula in a chart series is1 a fairly well-known idea theses days. It lets us define a dynamic series (usually using OFFSET()) and have our charts update automagically as our series grow and shrink. What I hadn’t previously thought of2 was that I could use a VBA function call as the Name’s “Refers to” with the result being useable in a chart Series definition.

Turns out it could.

Public Function PrependZero(rng As range)
Dim val As Variant, res As Variant
Dim idx As Long
    val = Application.Transpose(rng.Columns(1).Value) ' nifty!
    ReDim res(LBound(val) To UBound(val) + 1)
    res(LBound(res)) = 0 ' this is where the function lives up to its name
    For idx = LBound(val) To UBound(val)
        res(idx + 1) = val(idx) ' copy the remainder
    Next
    PrependZero = res
End Function

(That Transpose thing is handy – I hadn’t known about it as a way of getting a 1-D range value as a vector. Two transposes needed for data along a row.)

You could use this approach for lots of things. Like, well, lots of things. Honestly, you don’t need me to spell them out.


1 This makes me a little sad – it was a source of Excel-fu that impressed people to a quite disproportionate degree. Array formulae and VBA classes still have this power.
2 I’m aware that I might be the last kid in the class here. Be gentle.

My (Im)perfect Cousin?

in which we start to worry about the source of our inspiration
Mona Lisa Vito: So what’s your problem?
Vinny Gambini: My problem is, I wanted to win my first case without any help from anybody.
Lisa: Well, I guess that plan’s moot.
Vinny: Yeah.
Lisa: You know, this could be a sign of things to come. You win all your cases, but with somebody else’s help. Right? You win case, after case, – and then afterwards, you have to go up somebody and you have to say- “thank you“! Oh my God, what a fuckin’ nightmare!


It is one of the all-time great movies, and netted Marisa Tomei an Oscar in the process. Yes it is. It really is1.

Not only that, but My Cousin Vinny2 throws up parallels in real life all the time. Yes it does. It really does3.

Why only recently, I was puzzling over the best (or least worst) way to implement a particularly nonsensical requirement for an intransigent client. After summarising the various unpalatable options in an email, a reply arrived from a generally unproductive source. The message content made it obvious that he’d somewhat missed the point but the conclusion he drew from that misunderstanding triggered a new thought process that gave us a new, even less, er, worser solution to our problem.

Sadly, my unwitting muse has moved on now, but he left his mark for all time4 on our latest product. I suppose he should also take partial credit for the creation of a hitherto unknown development methodology: Powerpoint-Driven Development, but that’s a story for another day.


1 All right, IMHO
2 See also My Cousin Vinny At Work, application of quotes therefrom
3 YMMV.
4 Or at least until we have a better idea and change the whole damn thing

This Wheel Goes To Eleven

(in which we make an unexpected connection regarding the D in SOLID and get all hot under the collar about it)

Let’s not beat about the bush: I think I may have reinvented Dependency Injection. While it looks rather casual, stated like that, I’ve actually spent much of the last six months doing it. (Were you wondering? Well, that.)

I’ve been designing/building/testing/ripping apart/putting back together again a library/app/framework/tool thing that allows us to assemble an asset allocation algorithm for each of our ten or so products1, each of which may have been modified at various times since inception. It’s been interesting and not a little fun, plus I’ve been climbing the C# learning curve (through the three versions shipped since my last serious exposure) like Chris Bonnington on amphetamines.

Our products are broadly similar but differ in detail in some places. So there’s lots of potential for reuse, but no real hierarchy (if you can see a hierarchy in the little chart here, trust me, it’s not real).

So Product A need features 1, 2 & 3, in that order. B needs 1 & 4, C 1, 3 & 5, etc. What I came up with was to encapsulate each feature in a class, each class inheriting from a common interface. Call it IFeature or some such. At run-time, I can feed my program an XML file (or something less ghastly perhaps) that says which classes I need (and potentially the assemblies in which they may be found), applying the wonder that is System.Reflection to load the specified assembles and create instances of the classes I need, storing them in, for example, a List<IFeature>. To run my algorithm, all I need to do is call the method defined in my interface on each object in turn. A different product, or a new version of an existing one has a different specification and it Should Just Work.

It’s all very exciting.

So changing a single feature of an existing product means writing one new class that implements the standard interface and pointing the product definition at the library that contains the new class (which may – should – be different from those already in use).

The discerning reader may, er, discern that there are elements of Strategy and Command patterns in here as well. Aren’t we modern?

While all this is very exciting (to me at least – a profound and disturbing symptom of work-life imbalance) it’s still not the end of the line. I’ve built functions and then chosen to access them serially, relying on carefully (or tricky & tedious) XML definitions to dictate sequence. I’m thinking that I can go a long way further into declarative/functional territory, possibly gaining quite a bit. And there’s a whole world of Dynamic to be accessed plus Excel and C++ interfaces of varying degrees of sexiness to be devised .

More on much of that when I understand it well enough to say something.


1 There are billions at stake, here, billions I tell you.

Lacking Anything Worthwhile To Say, The Third Wise Monkey Remained Mostly Mute

Project Euler 100/304 complete (on the permanent leaderboard at last!)

(in which we discover What’s Been Going On lately)

 

Don't talk ... code

 

I’ve been coding like crazy in C# of late, a language I’ve barely touched in the last few years. Put it this way, generics were new and sexy the last time I wrote anything serious in .NET… (There should be some ExcelDNA fun to be had later.)

I’d forgotten how flat-out fast compiled languages, even the bytecode/IL kind could be. It’s not a completely fair comparison, to be sure, but a Ruby script to extract 300,000 accounts from my Oracle database and write them as XML takes a couple of hours, mostly in the output part. A C# program handled the whole thing in 5 minutes. Then processes the accounts in about 30 seconds, of which 10 are spent deserializing the XML into objects, 10 are serializing the results to XML and 10 are performing the moderately heavy-duty mathematical tranformations in between.

 


Click to test is value for money

 

Lacking at present a paid-for version of Visual Studio 2010 (the Express Edition, while brilliantly capable, won’t do plugins, which precludes integration Subversion and NUnit, to name but two essentials), I have been enjoying greatly my experience with SharpDevelop, which spots my installs of TortoiseSVN and NUnit and allows both to be used inside the IDE. It’s not perfect: there are areas, particularly in its Intellisense analogue, where exceptions get thrown, but they’re all caught and I have yet to lose any work. While the polish is, unsurprisingly, at a lower level than Microsoft’s, it’s entirely adequate (and I mean that in a good way) and the price is right. I particularly liked being able to open an IronRuby session in the IDE and use it to interact with the classes in the DLL on which I was working.

While I expect VS2010 to become available as the budgeting process grinds through, I’m not at all sure that it’ll be necessary to switch. An extended set of automated refactoring tools could be attractive, although Rename and Extract Method are probably the two most useful, productivity-wise, and they’re already present. I would rather like to have Extract Class, which isn’t needed often but would be a big time (and error) saver when called for.

On another topic entirely, should you be looking for entertaining reading in the vaguely technical, erudite and borderline insane category, may I recommend To Umm Is Human to you? Any blog that has “orang utan” amongst its tags is worth a look, I’d say. If you like it, you’ll like it a lot. I was once made redundant by a Doubleday, but I don’t think they’re related.

There’s an interesting new programmer-oriented podcast on the block, too: This Developer’s Life has slightly higher production values that may ultimately limit its life – the time to produce an episode must be substantial. I found myself wanting to join in the conversation with stories of my own, a sure sign that I was engaged in the content.

That Do Impress Me Much

Over at stackoverflow, now that they have a pile of money to spend invest, the rate of change is picking up. There’s the re-worked stack exchange model, which has changed dramatically – and quite likely for the better. They’ve moved away from the original paid-for hosted service to a community-driven process, whereby a community needs to form and commit to the idea of a new site. The objective is to improve the prospects of achieving critical mass for a new site, thus increasing its chances of success. I imagine a revenue model is mooted, although it may be little more than “if we build it, they will come” at present. Sponsored tags and ads spring to mind.

This week we’ve seen the covers removed (perhaps for a limited time initially) on a “third place“, to go with the existing main Q&A and “meta” (questions about the Q&A site). It’s a chat room. Well, lots of chat rooms of varying degrees of focus, to be more specific. Quite nicely done, too.

What has really impressed me has been that during this “limited sneak beta preview”, bugs, issues, feature requests and the like have been flowing through the interface at a fair rate of knots and many have been addressed and released within hours. Minutes, sometimes.

Think about it. User detects a bug, reports it and gets a fix, to an application with global reach, in a couple of hours or less. That’s agile.

A crucial part of Lean movement in manufacturing (and its younger counterpart in software development) is eliminating waste. “Waste” is broadly defined, very broadly defined, in fact, but one easily identifiable component is Work In Progress (WIP). In software terms, this often represents effort that has been invested (and money that’s been tied up) without having been included in a release. The more we invest effort without release the more we’re wasting, since we have no possibility of obtaining a return on that investment.

Here’s a particularly quick find/fix from earlier today:

Yes, it was probably a trivial bug, but the problem was notified, found, fixed and released in eight frickin’ minutes. How many of us can turn anything around that fast?

I’m looking forward to seeing where this goes.

Sorted for Excel and Whee!

If you happened upon the VBA Lamp and by rubbing it were able to produce the Excel VBA Genie and were granted a VBA Wish, would you ask for a built-in Sort() function?

If you build the kind of Excel apps that I do, you’ll run up against the need for a Sort all too frequently. Sorting arrays with sizes in the tens of thousands is not unusual and I was reminded of this when reading a post in this entry at Andrew’s Excel Tips the other day.

While it’s not crucial in the (useful) idea presented, the code has a quick and and dirty sort that is about the worst sort algorithm1 one could intelligently come up with. It’s also an prettty intuitive solution, which just goes to show that sorting may not be as simple as we may think. I’m not attacking Andrew’s skillz here, btw: the code as presented is certainly fit for purpose; it’s not presented as a general-purpose utility (at least I hope it isn’t).

I’ve accumulated a few algorithms over the years and I’ve coded up a couple more while “researching” this piece. On the one hand, we have the oft-derided BubbleSort and its close relation, CocktailSort. In the same group I’d include InsertionSort and SelectionSort. I’m going to be harsh and categorise those, with the naive sort above, as “Slow”. Well, “mostly slow”, as we’ll see.

In the “Fast” group, we have the much-touted QuickSort, and somewhere in between, we have HeapSort,and my current algorithm of choice, CombSort. As I was researching this, I also coded up ShellSort, which is about as old as I am and which was claimed to be faster than QuickSort under some conditions.

I ran some comparisons, not meant in any way to be perfectly scientific2. I ran each algorithm on arrays of 50 to 50,000 values with six different characteristics:

  • already sorted
  • exactly reversed
  • mostly sorted (values typically within one or two places of target)
  • ordered blocks of 100 random values (the first 100 values are 0 + RAND(), then 100 + RAND() and so on)
  • completely random
  • random 10-character strings

First off, the 50-record results:


50 recs (ms) sorted near sorted blocks random strings reversed
Shell 0.06 0.06 0.11 0.10 0.24 0.09
Quick 0.13 0.13 0.16 0.13 0.29 0.14
Comb 0.09 0.10 0.17 0.17 0.33 0.13
Heap 0.34 0.33 0.32 0.32 0.52 0.28
Insertion 0.02 0.02 0.20 0.17 0.47 0.37
Selection 0.25 0.25 0.25 0.25 0.54 0.25
Cocktail 0.01 0.02 0.44 0.39 1.02 0.77
Bubble 0.01 0.02 0.50 0.45 1.12 0.78
Naive 0.22 0.23 0.50 0.46 1.06 0.77

I’d say it’s pretty clear that it doesn’t matter much what you use to sort a small array, just about anything will be fast enough (unless you’re going to perform that sort tens of thousands of times in a run). It’s also apparent that the “slow” algorithms are actually pretty good if our data is already reasonably well-ordered.

So far, so “so what?”

Let’s look at the opposite end of the spectrum: 50,000 values? Here, the Fast/Slow divide is apparent. First the “Slows” (two tests only, for reasons that should become apparent):


50K (ms) near sorted random
Bubble 31 522,216
Cocktail 30 449,696
Insertion 19 179,127
Naive 219,338 510,010
Selection 220,735 220,743

Yes, that’s hundreds of thousands of milliseconds. “Three or four minutes” to you and me. The “Fasts”, meanwhile:


50K (ms) sorted near sorted blocks random strings reversed
Shell 162 164 219 377 929 250
Quick 296 298 327 365 790 306
Comb 390 396 477 622 1,348 452
Heap 899 903 885 874 1,548 844

(I only ran two tests on the “Slows”, for fear of dozing off completely.)

Again, for data where values are near their final sorted positions there’s clear evidence that something like an Insertion Sort is much faster than any of the “sexier” options. Provided you know your data will actually meet that criterion, of course.

All that considered, I’m switching from CombSort to ShellSort as my default algorithm. While it loses out a little to QuickSort in the “random” test (probably most representative of my normal use case) it doesn’t carry with it the fear of stack overflow through extreme recursion, something that’s bitten me with QS in the past. Anyway, us old’uns have got to stick together.

As already mentioned, if you have small quantities of data or infrequent sort calls in your application, it really doesn’t make much difference which algorithm you use, although I’d still suggest being aware of the properties of several options and having a basic implementation to hand. Once you reach a point where sorting contributes materially to your application run time then you owe it to yourself and your users to make an intelligent selection.

Here’s my ShellSort implemenation in VB, transcoded fairly literally from the Wikipedia pseudo-code (optimisations welcome):


Public Sub ShellSort(inp)
' sorts supplied array in place in ascending order
Dim inc As Long, i As Long, j As Long
Dim temp ' don't know what's in the input array...
  If Not IsArray(inp) Then Exit Sub ' ...but it had better be an array
  inc = (UBound(inp) - LBound(inp) + 1) / 2 ' use Shell's originally-proposed gap sequence
  Do While inc > 0
    For i = inc To UBound(inp)
      temp = inp(i)
      j = i
      Do
        If j < inc Then Exit Do ' check these conditions separately, as VBA Ors don't short-circuit
        If inp(j - inc) <= temp Then Exit Do ' ... and this will fail when j < inc
        inp(j) = inp(j - inc)
        j = j - inc
      Loop
      inp(j) = temp
    Next
    inc = Round(CDbl(inc) / 2.2)
  Loop

End Sub


1 Not the absolute worst: there’s the catastrophic genius of BogoSort, to name but one, but let’s not go anywhere nearer to there.
2 Just so we understand each other, these are my figures for my code on one of my machines. YMMV. A lot.

Follow

Get every new post delivered to your Inbox.