It’s An Ill Wind

I was pontificating recently about the horror of tightly-coupled worksheet calculations and macro code in Excel and the mess that results from carelessly mashed-together programming paradigms.

By a horrible coincidence, a few smug weeks later I suddenly found myself deep in the horror again. This time, I had to contend with two versions, needing to set up and run several tens of thousands of real-world cases through “before” and “after” scenarios.

I soon started to turn up nuggets of wonderment. Here are a couple of gems from the worksheets:

=IF(CPPI=TRUE,TRUE,FALSE)
=MAX(1,DATE(YEAR(C4),MONTH(C4),DAY(C4))-DATE(YEAR(C6),MONTH(C6),DAY(C6)))

“Why”, I asked myself, “don’t these monkeys think?” (When someone asked me which monkeys I was referring to, I realised that my inner monologue was malfunctioning, but that’s not important right now.)

Obviously they were thinking, apart from anything else the whole thing does seem to work, inasmuch as a set of inputs produce a set of plausible-looking outputs. I’d assert, however, that an experienced, competent developer has a continuous, higher-order review process running, asking questions like “does that look right?”, “could that be written better?”, “is that clear?”, “could someone else understand that?” It would appear that the individuals concerned in this case had no such process running.

I have a little tool that I bring out when faced with an unknown Excel VBA application. It scans the code and delives a report that gives me an idea of the degree of caution I should exercise, calculating for each routine an “X” number, being some arcane function of code lines, mesting level, numbers of ifs, loops, one-character variables (seldom straightforward to rename, even if they’re called “q” or “z”). A score of 10 or above indicates a possible problem area. Over 50 and we’re definitely in bandit country. Here’s the report for the top-scoring routine in the horror workbook (names changed to conceal the guilty):

422 is by a factor of about 3 the highest-scoring routine I’ve ever encountered. (By contrast, and perhaps it’s an unfair comparison, the top score in xlunit is 24.) That 1044 lines of code doesn’t include comments, blank lines, variable declarations or similar. Even better, it’s declared Static, something so appallingly dangerous (to me, at least – I get nauseous when I see even a Static variable) that I had to change the analyser because it had never seen it before. “Option Explicit” is honoured more in the breach than the observance and there are a mere (ha!) 156 global variables.

I was flicking through my cherished (and slightly stained) copy of The Pragmatic Programmer last night and I’ll take as my text the section “What Make a Pragmatic Programmer?” on pages xiii and xix of the Preface. Summarising the characteristics and comparing them with the authors of the unholy mess described, I get:

Early adopter/fast adapter: given that the abomination I’m working with is the culmination of more than two years’ work and is the fourth or fifth (and probably “best”) version, FAIL.

Inquisitive: I see no sign of any curiosity, not even “why do we suck?” FAIL

Realistic: I’m (generously) prepared to specify that this one may not really apply here. (Big of me.)

Jack of all trades: and masters of none? These guys didn’t make it to Jack of one trade. FAIL.

Critical Thinker: referring to the inner monologue again, there’s no evidence to suggest that anything like the questions listed above were ever applied here. And plenty of evidence to the contrary. BIG RED FAIL.

The tragedy here is that the monkeys described are unlikely to realise exactly how snivellingly far from acceptable (let’s not waste “best” here) practice they are, as the Dunning-Kruger effect tells us is likely, so they’re not going to change in any useful way. Failing to see that the twin demons of high coupling and low (non-existent, if we’re honest) cohesion are rampaging through their code leaves me wondering where we’d start.

Depressingly, these guys aren’t unusual, they’re not even the worst. The type is perhaps a little more common in, say, the Excel world, where the tools to develop staggering incompetence are so widely available, but they’re everywhere else – the invisible 90% of the iceberg.

On the plus side, even moderately competent technicians who are prepared to don the elbow-length protective gloves and dive in to such messes will be able to look forward with some confidence to a lifetime in which there will be a paycheck for as long as they can stand to pick it up.

Buddy, Can You Paradigm?

(stackoverflow rep: 8998, Project Euler 89/273 complete

(or: “If Paradigm was half as nice”)

I started reading Coders At Work recently. In the style of a few predecessors, the book comprises transcribed (and presumably edited) interviews with fifteen programmers who have made significant contributions of various kinds. It’s a weighty tome, possibly a little overweight if I’m honest, but interesting nonetheless; it makes a change from the kind of technical tome I usually cart around on the daily commute.

Circling a little closer to, and nodding vaguely in the general direction of the point, interviewee number seven, reached just this morning, is Simon Peyton Jones, who was one of the progenitors of Haskell, one of, if not the first name that springs to mind when one hears “functional programming“. For the last decade, Jones has been a researcher at Microsoft Research in Cambridge, which suggests that he would have had at least one finger in the F# pie… There’s a DotNetRocks show number 310 features an interview with the man, by the way. I’m enviously disturbed that despite our very similar ages, he appears considerably less follicly-challenged than me, although there could be some comb-over action going on there.

In the taxonomy of programming paradigms (does that make sense?) we have Declarative and Imperative. They are opposites. In the Declarative fold, we get stuff like SQL, XSLT, all the functional languages and, I submit Excel worksheets. Actually it’s not just me: that Wikipedia link places spreadsheet cell-based programming within the Dataflow subcategory of Functional. Imperative programming, on the other hand, includes the panoply of more “traditional” languages, including good old Visual Basic and its slightly less functional sibling VBA.

Thinking about Excel and VBA developement, I’d say there are several fairly distinct uses for VBA code (usually, but not exclusively: we can write add-ins or perform automation in plenty of other ways) which include:

  • New worksheet functions;
  • Interface extensions (menus, toolbars, ribbons etc);
  • Helpers (or wizards);
  • Control and simulation

…of which I write a lot of the latter two.

(You know how some comedians talk about a string of seemingly unrelated things and then cleverly tie them all together at the end? Well, fingers crossed, stay tuned)

A while back, I was handed a workbook, developed by an external consultancy, that modelled the 20-year evolution of a financial product. We wanted to run it across a significant number of stochastic market data simulations to get a picture of the distribution of possible outcomes. About 30,000 such simulations (and about 16GB of simulated market data) would be sufficient, we thought. The workbook took about 20 minutes to perform one such calculation. We didn’t have a year.

Fortunately, there were any number of classic optimisation failures to rectify: massive VBA/worksheet interaction at the cell level (changed to use large arrays), screen update enabled at all times, lots of less painful VBA-specific stuff, Application.Calculation = xlCalculationAutomatic, you know the drill. Easy stuff. Within a day it was taking about a minute and I could look at having the whole analysis done in under a week (I have a four-CPU machine).

If it all seems straightforward, it wasn’t. I’d wrapped the workbook in a modified (hacked-up from xlUnit) test jacket to check that my changes weren’t affecting the results, pretty much a necessity for any refactoring work, and one change kept breaking. Every time I tried to switch off automatic calculation, triggering it only when I thought it was needed, the final results came out different. I finally realised that the VBA and worksheet were extremely highly-coupled – calculations were occurring as the code pushed values through: change a value, auto-calc, change another value, auto-calc again and so on. Eek. It took almost a week to unravel. Of course, since I actually enjoy doing this kind of work for the most part, it wasn’t a complete disaster, but I could have lived without the time pressure…

As a finance worker (perversely, I’m increasingly inclined to describe myself as a Banker1 these days) I have cause, from time to time, to access The Bloomberg from Excel. At least one of the functions in the (extensive and powerful, let’s be fair) API is a real functional-imperative mess. Here’s one with which I’m not at all happy:

=BDH("MXWO", "PX_LAST", "01/06/2007", "28/12/2009", "Dir=V", "Dts=S", "Sort=A", "Quote=C", "QtTyp=Y", "Days=T", "Per=cd", "DtFmt=D", "cols=2;rows=672", "FX=EUR")

Yes, there are a lot of arguments2, but most are at least not positional, which shows they’re trying. This particular example asks for the last daily quote of the MSCI World Index, in Euro, between the dates given. Even better, I guess, there’s a wizard to generate it all for you. But here’s the nasty bit: the call fills as many rows – without warning about overwrites – as it needs. Oh, and that "cols=2;rows=672" parameter? It’s rewritten if you make any change that would affect the number of cells output. This is not easy to accomplish. Try duplicating the effect in VBA, go on, I double-dare you. I think there’s a separate server process at work here, able to make asynchronous (but timely) modifications to user worksheets.

Yikes. And ugh.

To me, there’s something broken here. They’re using am apparently functional programming model but allowing the code to affect the state of the worksheet in (from the user’s perspective at least) a rather arbitrary (or at least, unpredictable) way. We’re not even getting an array formula here – the function just causes chunks of worksheet to be filled with data. In point of fact, there looks to have been some serious programming work here – I don’t think in the normal course of events that calling a function is allowed to cause areas outside the calling range to be modified. I rather suspect an external process is being asked to pump values into the sheet through some devious means like (but let’s hope it isn’t) DDE.

(This is where I try to pull it all together. A little encouragement wouldn’t go amiss here…)

The fact that we have both paradigms represented as first-class components of a single development platform may be a large part of the reason for Excel’s dominance in the marketplace. But the interface between declarative and imperative code needs to be carefully managed to avoid the kind of unnecessary side-effects that I spent so long untangling.

The messed-up simulation I whined about above failed to recognise that the worksheets should have been, in effect, a great big, super-complex function, which the code used by setting up the input conditions, firing off a Calculate and then reading the output values. Instead, it tried to implement a tightly-coupled declarative-imperative abomination, with unfortunate results.

Let’s try not to repeat that mistake, for the sake of the children.

I’ve been (and remain) grumpy – thank you and good night.

You've been a lovely audience


1 Wanna make something of it?

2 Many Reuters functions do something similar, but push all the key-value pairs into a long string.

A Third Way: DNA?

(stackoverflow rep: 8417, Project Euler 87/261 complete

When we want Excel to talk to a compiled library, the list of options available to us is not long.

On the one hand, we can write a compiled library that references the Excel SDK (pre– or post-2007), setting its prefix to XLL for convenience, typically we’d grit our collective teeth and accomplish this with C or (cough, spit) C++. One big plus here is speed – we’re talking to Excel in its own language. Or something awfully close to it. Another plus, of course, is that this approach probably involves the minimum set of dependencies for non-enterprise distribution convenience. If you haven’t spotted the big minus then we should probably agree to disagree on the relative merits of programming languages.

On the other hand, we can write a library that exposes its functions through good ol’ COM, referencing it though the Tool…Add-ins dialog. That’s probably still the best-known way to integrate functionality developed in .NET, although it does involve paying a performance price as the COM interface is crossed and recrossed. This would be the place in the world most usually occupied by VSTO. Of course, you could also write your COM library in non-managed code: VB6 is easy (if you can find it) but slow, the choice from others depends on your capacity (or desire) for pain.

On the gripping hand, we have ExcelDNA, which for my money is one of the all-round cunningest things you’re likely to come across in quite a long time. (Unless you’ve already encountered its cunningness, of course, in which case you’re probably already nodding along in sage agreement). How so? How about being able to write your spiffy new functions in C# but without having to incur the needless ins and outs of the wasteful and superfluous COM middleman?

ExcelDNA provides a small XLL that can talk to managed code. There’s one attribute to set in order to make a function visible to Excel and one libary to import (in order to be able to add the attribute). Oh, and a little config file to tell the XLL what to load. That’s it. Actually, that’s the complicated version. In case you didn’t know, the CLR includes a compiler, allowing code to be created and compiled at run-time. The simplest way to talk to Excel from .NET via ExcelDNA is just to put your code directly into that little config file and let the XLL compile it at load time. OK, there’s a second or so’s overhead, but how simple – anything simpler would probably have to be involve 21st-Century language integration where VBA lives today.

By now you’re probably muttering something like “write code, fat bloke”, which is a little cruel, but a sentiment otherwise understandable. Now only the other day, Dick Kusleika posted his take on the everybody-has-one timeless RangeJoin() UDF topic. Here’s a rather simpler implementation:


<DnaLibrary Language="CS">
<![CDATA[
using ExcelDna.Integration;
using System.Collections.Generic;

    public class MyFunctions
    {
        [ExcelFunction(Description="Joins cell values", Category="My ExcelDNA functions")]
        public static object RangeJoin(object[,] cells)
        {
            List<string> list = new List<string>();

            foreach (object o in cells)
                list.Add(o.ToString());

            return string.Join(",", list.ToArray());
        }
    }
]]>
</DnaLibrary>

To get it running, I saved it as “RangeJoin.dna”, then copied the ExcelDNA.xll into the same folder and renamed that to “RangeJoin.xll”. Because it isn’t currently anywhere on my path, I also put ExcelDna.Integration.dll in the folder. Then I double-clicked my new XLL file to start Excel and load the library. (Pictures below are from Excel 2002, but the code is tested up to 2007).

Not surprising...

There’s a tiny extra delay before the usual macro warning appears – the C# code’s being compiled, which is where compile-time failures are displayed. Then you’re up and running. The function shows up in the Insert Function dialog…

… and we can try calling it:

… with the following result:

I did notice the the =NA() output and the Euro amount came out different to what I’d have hoped. It rather looks like I’d have to dig a little deeper into the ExcelDna.Integration.dll and the ExcelReference object in particular in order to be able to access the equivalent of VBA’s Text property.

Still, not bad for a quick exercise, I’d say.

There are some alternative approaches to the dot-net-and-excel-without-going-anywhere-near-COM-add-ins topic:

For Free

Excel4Net is now free – it seems to work by implementing a single worksheet function that calls out to managed code;

The last time I looked at XLW it was focused on C++. That’s changed in the interim – C# and VB.NET are now options.

Possibly Paid-for Propositions

Len Holgate appears to be at beta test stage with JetXLL, which looks like it’ll be a commercial offering;

I also just discovered ExHale. In beta, appears current – can’t see what the terms are likely to be;

Definitely Demanding Dollars

Possibly the first implementation, ManagedXLL is (was?) a paid-for product that looks to have been similar to, but possibly broader in scope than ExcelDNA. It’s not clear whether the company is still active or the product is still available – a curious colleague recently tried to make enquiries and failed.

Taming The Beast

(stackoverflow rep: 7963, Project Euler 83/261 complete – slacker!)

It’s a little while back now (I’m a slow thinker) but Simon Murphy, developing on a topic found in a post by Dick Moffat, discussed some observations of Excel usage within organisations. He concluded that general user skill levels are largely unchanged but that the uses to which Excel is being put are becoming more complex and at the same time training quality is declining. As a result, it’s increasingly commonplace that such organisations are seeking to reduce or limit the degree to which user-written spreadsheets are used.

The way the future was

The way the future was

It’s a tricky one. We have this enabling technology that makes it possible for end-users to manipulate information in ways unheard-of before 1979 (the year Visicalc was launched). Since the introduction of the computer spreadsheet, the power and complexity (yes, ease of use too, but see below) of the products available has increased steadily, necessarily, if you think about it: if Multiplan, Lotus 1-2-3, Quattro Pro, Wingz, Excel et al didn’t offer something more then we’d still be using Visicalc.

Putting a spreadsheet program on every desk-top computer, as is commonplace nowadays, due to Microsoft’s bundling policy: most will need Word and Outlook, they’ll probably expect Powerpoint, God help us, so Excel is almost certainly going to come along for the ride. Once it’s there, all but the most dedicated non-fiddlers will find it, even if it’s only because they’re drawn in by their peers.

I’d say a not-untypical progression starts with using the electronic squared paper to make lists or tables, moving on to the occasional diagram and then one day there’s the discovery of arithmetic, at which point Pandora (which wouldn’t be a bad name for a spreadsheet program) has opened the box and we’re heading downhill on a slippery slope into the land of overused metaphors, not to mention end-user spreadsheets.

There, I’ve said it: “end-user spreadsheets”. Three words (or two, depending on how you count hyphens) to strike fear into the giblets of any spreadsheet professional. Why? Here’s a selection of risks to which an organisation may be exposed:

  • the spreadsheet uses the wrong function(s): anyone know the difference between STDEV() and STDEVP(), for example? Which should you use and when? What happens when the wrong one is still not accurate enough? What other areas might be dangerous? How about the entire “Financial” category?
  • dynamically-growing data ranges and fixed-size references: not all data is included in calculations
  • the user spends more time playing with his spreadsheet creation than he1 should – losing focus on the job for which s/he is actually being paid
  • the spreadsheet becomes so useful that users other than the creator start to use it: the creator is now part of application support. Was that what you had in mind?
  • because protecting spreadsheets can be tricky, and changing cell contents to perform “what-if” calculations is easy, formulae may be overwritten with new un-noticed and un-wanted constants.
  • the spreadsheet acquires a noxious sludge of user-written VBA (shudder). You know, pages of global variables, no classes, 1500-line Subs, that sort of joy.

That’s the 5-minute, off-the-top-of-my-head list. This is the sort of stuff that has IT people waking up screaming in the middle of the night and contract support people (been there, done that) reassured of continuing employment for the remainder of their working lives.

How to address the problem without flushing both baby and bathwater?plughole

First of all, if no-one is prepared to accept that there may be some real risk here, then then whole question is moot and you’ve already irrevocably lost a few minutes of your life getting this far. Sorry about that.

A review process needs to be put in place: what XLS files are being stored on the network? Can you see who is referencing them? Reading or writing? Are there patterns that might indicate that any of the problems above are being experienced?

Can you write a “spider” that looks around the user spreadsheet “corpus”  for frequent updates, evidence of VBA, all-round horribleness? Remember you’re not spying, you’re auditing.

As candidate files are identified, review them with the relevant user representatives to determine if the workbook is an signal for action to be taken. Such actions may include

  • doing nothing – proactively chossing to do nothing is a perfectly reasonable option;
  • adding the workbook to the portfolio of maintained and supported spreadsheet applications;
  • setting up a project to address the business need in a more rigorous way;
  • a bit of business process re-engineering to address any procedural deficiency that may have driven the need a an ad hoc spreadsheet.

Again, that’s the five-minute list. What other courses of action might one take to mitigate the risks of the enthusiastic amateur running wild with a spreadsheet program in one’s business?

 


1 Let’s face it, folks, it’s usually a “he”

 

Nothing Like A Name

(stackoverflow rep: 6622, Project Euler 82/252 complete)

I was suddenly plunged head-first back into the world of Excel/VBA this week. After a year or so of mostly Ruby coding, I was struck by how, well, wordy VBA can be by comparison. There’s a definite lower level of expressiveness, too, although that can be mitigated to some extent with extra (mostly mental) effort. While I was rapidly back up to speed and having a good time refactoring to make room for some new functions, I couldn’t help feeling that I’d moved on and was practising what had become a legacy skill. That has happened before: I left COBOL behind (without a second glance, it must be said) in 1990 and have not written a line in it since.

Anyway, I was thinking about named ranges in excel, and how they aren’t. Named ranges, I mean. Well they are, but they’re not. A name can refer to a range, but it’s really referring to a formula that references a range.

new_name

And by formula, I mean anything (pretty much) that can be input into a cell after an equals sign. The result need not be a range. Want to define a named constant, Faraday’s Constant (96485.3399) say? Just define a name with “=96485.3399” as the “Refers to” value.

In older Excels, you’d get this:

Named constant - in a Name

Named constant - in a Name

I’m a huge fan of using names to define OFFSET range for charting:

SwimDateCount is also a formula...

SwimDateCount, in turn, is =COUNT(SwimStats!$A$2:$A$500)

The excellent Professional Excel Development (I have the older edition; I’m sure the new one will be just as good when I finally start working with 2007/2010 in anger) covers this, I’m sure. I’d check, but it’s been borrowed by a colleague. Because it’s that good.

The only slightly less excellent Excel Hacks definitely does cover it. (Also borrowed, but I managed to find it). Hack #52 in my copy, if you’re interested.

A Long Time Dead

(stackoverflow rep: 6602, Project Euler 78/252 complete)

I realised just now that Visual Basic1 went from 1.0 to 6.02 in about 7 years and stopped progressing 11 years ago. Even so, if I needed to do some COM automation work in an EXE, it’s still the tool I’d reach for, assuming I could locate my old Visual Studio 98 CD-ROMS, that is.

So by now VB’s pretty much been end-of-life longer than it’s been, er, life. Golly. It really is time Microsoft came up with a viable alternative. What I can figure out about VSTA doesn’t fill me with excitement, that’s for sure. VSTO, for the power-user at least (emphasis on “user”) still looks even worse. Neither offers, as far as I can see, any route for legacy VBA code, which is, I suppose, why VBA is still expected to be around for the foreseeable future.

Still, for the little it’s probably worth, we now have Make Office Better, within which we can rant about such importances. The buzz following its appearance seems to suggest that Microsoft still have some way to go. I am looking forward to sparklines in Excel 2010 though…

Mmmmm, sparklines...

Mmmmm, sparklines...

1 (the “classic” one, not the impossibly confusing – to a still-active VBA user)
2 did we ever get a point release? I can’t remember any, although some of the ActiveX stuff did get service-packed.

The Glove Puppet Programmer

While I prevaricate over the four (!) drafts sitting at various stages of incompletion in my “Posts” view, I’ll take lunchtime out to reminisce about a non-contemporaneous* colleague and his unique status as probably the most expensive programmer I’ve ever had to deal with.

This particular individual, who was either French or French Canadian, had the singular identifying characteristic of a name that was a literal translation of a stuffed children’s TV character from my childhood. This in itself should have been a warning.basil_brush He had created the Credit Risk reporting system that a team of ten expensive contractors, myself including, were labouring feverishly to replace. On his own. As a series of Excel spreadsheets. With extensive VBA macros, coming from no discernible programming background. If you’re wincing, I’m not surprised.

Why am I asserting he was peerless in the cost stakes? Because the application he’d built actually worked. Badly, slowly, requiring regular manual intervention, occasionally obviously inaccurate and more frequently rather less so, but it worked. And we were trying to reverse-engineer it**, distinguish between the right and wrong parts and deliver something that was everything the old “system” was not. Well, almost: we did need our system to work.

This is one of the gems I discovered in the VBA:

    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 0).value = bv_sSTRATEGY
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 1).value = bv_sTRADENAME
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 2).value = bv_sTRADESTATUS
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 3).value = bv_sStructType
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 4).value = bv_sSTRUCTMODEL
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 5).value = bv_sTRANCHENAME
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 6).value = bv_iTRANCHENUMBER
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 7).value = bv_lTRANCHENOTIONAL

    ... are you getting the picture? We continue incrementing the offset for a while, until:

    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 29).value = bv_sCOMMENTDEFAULT
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 30).value = bv_sNEWTRANSID
    br_OutputWB.Worksheets(bv_sOutPutWSName).Range("A1").offset(br_lCount + 1, 31).value = bv_sCOMMENT

In case you were wondering, the “bv_” prefix signifies “By Value.” And yes, that does mean that each of those values appeared as parameter to the routine. I counted 32 in total. That’s thirty-two input parameters. You really should be wincing by now.

gogglesSeveral other routines also had the same (or 99% the same) code blocks. Our friend the cut-and-paste programmer must have really earned his money that day…

Oh yes, you should see an example of how our chum called his parameter-heavy subroutines. Consider acquiring welding goggles before looking at this, by the way.

Call SS_BuildMTMByTrade(br_OutputWB:=OutputWB, br_lCount:=lMMcount_SS, bv_dbFXrate:=GetFXRate(vMM_FXRates, RgCursor.offset(lOffset, Range("TRANCHECCY").Column - 1).value), _
bv_sSourceFile:=m_sSS_MM_Source & IIf(bv_sTradeNamePrefix <> "", bv_sTradeNamePrefix & RgCursor.offset(lOffset, Range("TRADENAME").Column - 1).value, RgCursor.offset(lOffset, Range("TRADENAME").Column - 1).value) & "_" & "MM" & "_" & RgCursor.offset(lOffset, Range("MM_CSB").Column - 1).value & "_" & RgCursor.offset(lOffset, Range("MM_R").Column - 1).value & "_" & IIf(RgCursor.offset(lOffset, Range("MM_PW").Column - 1).value, "T", "F") & "_" & IIf(RgCursor.offset(lOffset, Range("MM_SCS").Column - 1).value, "T", "F") & "_" & Format(m_dSS_MM_Asof, "yyyymmdd") & ".xls", bv_dbParticipationFactor:=RgCursor.offset(lOffset, Range("DEALNOTIONAL").Column - 1).value / RgCursor.offset(lOffset, Range("TRANCHENOTIONAL").Column - 1).value, bv_iShift:=RgCursor.offset(lOffset, Range("MM_R").Column - 1).value, bv_bIgnoreFirstCprty:=CBool(RgCursor.offset(lOffset, Range("IGNORECPTY").Column - 1).value), _
bv_dbSpread:=RgCursor.offset(lOffset, Range("DEALSPREADBP").Column - 1).value, _
bv_bUsePVFormula:=RgCursor.offset(lOffset, Range("USEPVFORMULA").Column - 1).value, _
bv_sSTRATEGY:=RgCursor.offset(lOffset, Range("STRATEGY").Column - 1).value, _
bv_sTRADENAME:=IIf(bv_sTradeNamePrefix <> "", bv_sTradeNamePrefix & RgCursor.offset(lOffset, Range("TRADENAME").Column - 1).value, RgCursor.offset(lOffset, Range("TRADENAME").Column - 1).value), _
bv_sTRADESTATUS:=RgCursor.offset(lOffset, Range("TRADESTATUS").Column - 1).value, _
bv_sStructType:=RgCursor.offset(lOffset, Range("STRUCTTYPE").Column - 1).value, bv_sSTRUCTMODEL:=RgCursor.offset(lOffset, Range("STRUCTMODEL").Column - 1).value, _
bv_sTRANCHENAME:=RgCursor.offset(lOffset, Range("TRANCHENAME").Column - 1).value, bv_iTRANCHENUMBER:=RgCursor.offset(lOffset, Range("TRANCHENUMBER").Column - 1).value, bv_lTRANCHENOTIONAL:=RgCursor.offset(lOffset, Range("TRANCHENOTIONAL").Column - 1).value, bv_sTRANCHESUBORDINATION:=RgCursor.offset(lOffset, Range("TRANCHESUBORDINATION").Column - 1).value, bv_sTRANCHECCY:=RgCursor.offset(lOffset, Range("TRANCHECCY").Column - 1).value, bv_dtTRANCHEMATURITY:=RgCursor.offset(lOffset, Range("TRANCHEMATURITY").Column - 1).value, _
bv_sBUYSELL:=RgCursor.offset(lOffset, Range("BUYSELL").Column - 1).value, _
bv_lDEALNOTIONAL:=RgCursor.offset(lOffset, Range("DEALNOTIONAL").Column - 1).value, _
bv_iDEALSPREADBP:=RgCursor.offset(lOffset, Range("DEALSPREADBP").Column - 1).value, _
bv_sUPFRONTFEE:=RgCursor.offset(lOffset, Range("UPFRONTFEE").Column - 1).value, _
bv_sCOUNTERPARTY:=RgCursor.offset(lOffset, Range("COUNTERPARTY").Column - 1).value, _
bv_dtTRADEDATE:=RgCursor.offset(lOffset, Range("TRADEDATE").Column - 1).value, _
bv_dtSETTLMTDATE:=RgCursor.offset(lOffset, Range("SETTLMTDATE").Column - 1).value, _
bv_sREPACKVEHICLE:=RgCursor.offset(lOffset, Range("REPACKVEHICLE").Column - 1).value, _
bv_sCOMMENT:=RgCursor.offset(lOffset, Range("COMMENT").Column - 1).value, _
bv_lADRNOTIONAL:=RgCursor.offset(lOffset, Range("ADRN").Column - 1).value, _
bv_sCOMMENTDEFAULT:=RgCursor.offset(lOffset, Range("COMMENTDEFAULT").Column - 1).value, _
bv_sNEWTRANSID:=RgCursor.offset(lOffset, Range("NEWTRANSID").Column - 1).value, _
bv_sOutPutWSName:=MMTEMPSHEETNAME, _
bv_BookName:=RgCursor.offset(lOffset, Range("BOOKNAME").Column - 1).value, bv_UniqueID:=RgCursor.offset(lOffset, Range("UNIQUEID").Column - 1).value, bv_KMVCorr:=RgCursor.offset(lOffset, Range("KMVCORR").Column - 1).value, bv_MarketCorr:=RgCursor.offset(lOffset, Range("MARKETCORR").Column - 1).value, bv_RunFlag:=RgCursor.offset(lOffset, Range("RUNLISTFLAG").Column - 1).value, bv_DataSource:=RgCursor.offset(lOffset, Range("SOURCETYPE").Column - 1).value)

The second line is about 900 characters long. Nine. Hundred. If you’re starting to want to make it go away, perhaps by clawing out your eyeballs, I don’t blame you.

Finally, for dessert, if you will, or maybe like one of those hot steamy towels presented at the end of some Indian meals, here’s our friend’s take on copying an array:

Public Sub CopyArray(ByVal bv_FromArray As Variant, ByRef br_ToArray() As Variant)
    Dim l As Long
    Dim i As Integer

    For l = LBound(bv_FromArray, 2) To UBound(bv_FromArray, 2)
        ReDim Preserve br_ToArray(1, l)
        For i = LBound(bv_FromArray, 1) To UBound(bv_FromArray, 1)
        Debug.Print bv_FromArray(i, l)

            br_ToArray(i, l) = bv_FromArray(i, l)
        Next
    Next

End Sub

From the blank line following the “Debug.Print” line I surmise that some operation may, in the distant past, have been occurring that made the routine more than a criminal waste of electricity. I hope the use of lower-case “L” is a soothing balm to your eyes, as is the continuous “ReDim Preserve” of the outer loop…

I don’t know the final cost of the project that replaced this, I elected not to renew my contract after nine months. I’d guess it didn’t exceed $10 million, not counting the possible losses incurred from trading on inaccurate information in the several years the “system” was live.

I’d call that dangerous.


*we worked at the same place but not at the same time, if that makes any sense

** specs? Don’t be ridiculous.

Version control for Excel workbooks, Part 2

(stackoverflow rep: 4824, Project Euler 72/241 complete)

Not a month ago (about seven , actually) I started to gibber about source code version control and Excel. Then Stuff got In The Way and I was derailed. Well, Ruby-on-Railed, really – I’ve been almost totally immersed in intranet development and related matters and mostly having lots of fun.

exportThen along came Internal Audit (shudder). Some degree of consternation appeared to exist about the potential (albeit slight) vulnerability of our Source Control solutions (SourceSafe for Visual Studio and Excel stuff, subversion for the webby bits). Combined with the newly-available – and rather sexy – new enterprise SourceForge that had materialised, we bit the bullet and set out to put everything into the Company-Approved Solution. Which dragged me seven months into the past to ask again – how does one diff an XLS?

That’s not a rhetorical question, I do have a sort of answer. I’m thinking along the following lines:

  • Some VBA component-extraction code in an add-in
  • Perform the export on a Save or Close
  • Save VBA references as a separate text file

I’m not going to get into dumping worksheet content, although there are arguments for doing so.

Excel exposes the handy WorkbookBeforeClose and WorkbookBeforeSave events on the Application object. We’ll need an Application object and it’ll have to be declared WithEvents, so we need a class. I’m going to call it WorkbookDecomposer:

Option Explicit

Public wkbk As Workbook
Public WithEvents xlapp As Application

Private Sub xlapp_WorkbookBeforeSave(ByVal Wb As Workbook, ByVal SaveAsUI As Boolean, Cancel As Boolean)
    Set wkbk = Wb
    Decompose
End Sub

(We’ll have the same code for the Close event). Each component can be exported, and the references can be dealt with something like this:

Public Sub ExportReferences()

Dim ref As Reference

    With New FileSystemObject
        With .OpenTextFile(.BuildPath(wkbk.path, wkbk.Name & ".references"), ForWriting, True)
            For Each ref In wkbk.VBProject.References
                .WriteLine RefToString(ref)
            Next
        End With
    End With

End Sub

To start the thing, we can have a module with a little Sub such as:

Option Explicit
Option Private Module

Public decomposer As WorkbookDecomposer

Public Sub StartDecomposer()
        Set decomposer = New WorkbookDecomposer
        Set decomposer.xlapp = Application
End Sub

Now, as long as we don’t lose state, running this code once will give us an object that will automatically export copies of our code for storing separately in our code control system. So we can diff. The decomposer can be extended to shell out useful source control instructions such as commit, perhaps automatically, perhaps not.

I don’t like that “as long as we don’t lose state” thing. When we’re coding in VBA we lose state all the time. The best I can come up with is to use an OnTime macro to keep trying to restart the decomposer object:

    If decomposer Is Nothing Then
        Set decomposer = New WorkbookDecomposer
        Set decomposer.xlapp = Application
        Debug.Print "Decomposer (re)started @ " & Now
    End If
    Application.OnTime Now + TimeSerial(0, 0, 10), "StartDecomposer"

It sort of works, but it’s potentially intrusive when we have incomplete/invalid code (hardly unlikely in the scenario under consideration) and the OnTime macro may fail to fire for many reasons.

What are the alternatives? Custom external scripts? An external process that monitors changes on XLS files and auto-exports?

What Would Proudhon Do?

(stackoverflow rep: 3958, Project Euler 66/236 complete)

Dick Kusleika wrote about some of his personal code construction methods in Daily Dose of Excel recently. Something that caught my eye was this:

I generally end up with a lot of Property statements that do nothing but read and write to a module-level variable.

Probably fairly familiar, goodness knows I’ve written enough of those, and I never used MZ-Tools to automate the process, either. So I was nodding away as I read, muttering “amen brother” and whatnot, when I had a vicious attack of the Whys.

Why do we write a Property Set/Let/Get just to set, let or get a member variable? What’s a Property for, anyway?

I think a Property should probably only be used to assign a value when some processing needs to occur other than simply setting the member variable. Code such as:

Private mName As String
Public Property Let Name(value As String)
    mName = value
End Property
Serendipitously, I just completed Project Euler problem 84...

Serendipitously, I just completed Project Euler problem 84...

…just doesn’t seem to be adding any value. Rename the member variable, make it Public and move on, secure in the knowledge that you just saved some keystrokes and reduced your carbon footprint (well, maybe) by some infinitesimal fraction of a gram. The world thanks you.

Now if you need to validate the input in some way, a Property Let may be the ideal place to do so. It may not, as well, but that’s not important right now.

What else? What if you need something to happen, other than validation, when you set a property? Here’s where I’d suggest that you’re doing something wrong: I’m not a big fan of hidden side-effects (British understatement warning) – I have too many scars from self-inflicted wounds from doing that. If setting a property causes some activity or state-change, then the code should probably be refactored into a Sub with a suitably informative name.

I suppose write-only properties need a Let/Set (so that we can ensure no Get is written) but how often do we need that?

Otherwise, let’s just access the member variable directly without the indirection (think of the planet).

Getting.

Honestly, how different is a Property Get from a Function?

Given a class defined thus:


Public ws As Worksheet

Public Property Get Wksh() As Worksheet
    Set Wksh = ws
End Property

Public Function WkshF() As Worksheet
    Set Wksh = ws
End Function

… what are the differences between the three Sets here:


Public Sub GetWksh()
Dim wsh As Worksheet
    With New Class1
        Set wsh = .ws
        Set wsh = .Wksh
        Set wsh = .WkshF
    End With
End Sub

I submit, Your Honours, that the only time a Property Get makes much sense over a Function is when there’s no pre- or (ugh) post-processing around the accessing of the member variable. In which case, why bother?

Anyway, Proudhon wouldn’t have liked Properties.

Lazing On A Sunny Afternoon

(stackoverflow rep: 2906, Project Euler 59/230 complete)

Below (lightly edited) is a recent answer to a question on StackOverflow. The question is pretty much a waste of time, but what an answer!

Laziness is indeed the first of the Three Programming Virtues, but it is misunderstood. Programming Perl defines it well:

* LAZINESS: The quality that makes you go to great effort to reduce overall energy expenditure.

Good programming calls for laziness, but laziness requires hard work. Good programmers must constantly think of and implement new ways to be lazy. The first compiler had to be written in assembly, and the first assembler had to be written in machine language. Wonderfully lazy, but hard. You don’t get to call it a day after an hour just because what used to take a day takes an hour.

That’s about as close to an encapsulation of my personal programming philosophy as I’ve ever seen. The virtue of Laziness is closely related, if not identical to the DRY (Don’t Repeat Yourself) Principle as espoused by the Pragmatic Programmers in The Pragmatic Programmer.

(I just found myself wondering if there’s a way to refactor that last sentence to remove some of the duplication)

Laziness - all I have to do is this...

Laziness - all I have to do is this...

The idea behind xlUnit (which has been languishing somewhat since being codeplexed in, good grief, September) was to minimise the amount of manual repetition involved in testing Excel/VBA code as it is developed. As a useful side-effect, it also made TDD (Test-Driven Design, or Development, depending on your choice of definition) possible. A fair amount of the code complexity of the framework is involved in eliminating repetitive manual tasks such as test class creation.

There are only three significant UI-level entry points in xlUnit as it stands: application creation (used rarely, only about once per app) class creation, used rather more often, and test execution, which is used all the time. Oh, and there’s a little “Options” dialog, but it only has one setting and that’s not really very interesting so we’ll gloss over that.

Since the first two things are used relatively infrequently, I tucked them away on a menu, whereas the test execution, which is an all-the-time thing, gets a toolbar button. I could have been fancier with that, but I never got past the simple text “Run Tests”, which had the extra benefit of giving a larger target area for the mouse.

...and I get this for free.

...and I get this for free.

There is no interface within the VBE, which is most definitely a shortcoming. To be honest, I only even tried briefly (failing, obviously) to get it to work: with a dual screen setup, which my working environment has had for the best part of a decade, both workbook and VBA are usually visible, so I can get to the “Run” button easily enough. For a similar reason, there is no built-in keyboard shortcut at present. Of course, in Excel 2007, it’s all a bit of a mess. I have the (massive) RibbonX book, by the way, I just need the intestinal fortitude to sit down and read it.

The “Create Testable Application” and “Create Testable Class” routines both live in the add-in itself and are located within UserAccessibleEntryPoints.bas. I’ll come back to “Run Tests” another time – not only was it a bit tricky but I have some new ideas that would make it even trickier.

Choosing to create an application causes a throwaway instance of the xlUnitCodeBuilder class to create a workbook and its tester and create references from the tester to both the application (so it can call classes there) and the framework (so it can use the framework). The same class contains the code to create new classes. I think I put the functions together because they’re about building code but I don’t think I’ve achieved very good separation of concerns here: there’s workbook creation and test class creation and they would probably be better off being separated. Since there’s work to do with Excel 2007 anyway, I think I’ll refactor that bit next time it’s open.

I mentioned “throwaway instance” above without explanation: it’s a way I describe this VB pattern:

    With New UsefulClassThatDoesntReallyNeedToBeDimmed
        .DoSomethingProfound
        .DoSomethingSimilarlyDeep "AndMeaningful"
    End With

Is there a more “standard” way to describe this? Something using “anonymous” perhaps? Anyway, I use it a lot (and I’m now fervently hoping that no-one points out some awful risk that I’m running as a consequence) as it’s a way to get part of the economy of class (“static” methods in Java or C#, where the word means something rather different than in VBA) method.