## Getting Bigger All The Time

24 July 2012 Leave a comment

*in which we try to get tricky with Excel and find that Excel has adequate built-in trickiness already, thank you very much indeed*

Over the years I’ve had a number of roles that involved fixing, tuning, cleaning and otherwise future- and bullet-proofing a fair number of Excel workbooks. There’s a whole catalog of potential issues to entertain and amuse but very common is the Extending Input Data Set. (I capitalised that in the hope that I might have christened some kind of Excel anti-pattern).

Your average Excel user is smart enough to know that some input data may grow over time: they may be copy-pasting from some other source, grabbing a database query result or even just typing in more values over time. It’s very common to see a row-by-row enrichment of such data for analysis and the formulae need to be extended to match the rows of input data. In the worst case, users have pre-copied more rows than they expect ever to need and don’t notice when they’ve started to miss data. More happily, they copy down as they see the data extending. *If* they see the data extending, that is^{1}.

Helping a colleague to avoid such an unpleasantness recently led to a couple of interesting (to me) solutions. Firstly, we looked at extending the calculations applied to a list of data that we knew would extend every time we ran the spreadsheet model.

We assume we have a defined name, **input_data** that does what it says. Further, we have **output_formulae**, which should correspond one-for-one with the input_data rows. When **input_data** changes, we want to automatically extend **output_formulae** to match. Pretty straightforward, as it turns out:

Private Sub Worksheet_Change(ByVal Target As Range) If Intersect(Target, Range("input_data")) Is Nothing Then Exit Sub End If ExtendOutputFormulaeAsRequired End Sub Sub ExtendOutputFormulaeAsRequired() Dim inputRowCount As Long, calcRowCount As Long inputRowCount = Range("input_data").Rows.Count calcRowCount = Range("output_formulae").Rows.Count If inputRowCount <= calcRowCount Then Exit Sub ' enough formula rows already - could reduce if needed, but not done here End If With Range("output_formulae") ' Assumes just formulae are needed: .Offset(calcRowCount, 0).Resize(inputRowCount - calcRowCount, .Columns.Count).Formula = _ .Rows(calcRowCount).Formula End With End Sub

That works pretty well. At least, it does if **input_data** is defined, something we can usually do fairly easily, using a named formula, something like **=OFFSET(B2,,,COUNT(B:B),6)** (assuming we have 6 rows and there’s a non-numeric column heading in **B1**, which in this case there is).

Here’s the wrinkle that gives rise to the second interesting (to me) question, that necessitated a question on Stack Overflow. What if I can’t be sure that all my columns will be the same length? I want the length of the longest column (there won’t be gaps vertically, which is something at least). So in the sample below, there are 7 rows of data in columns B to G, with a heading on row 1, so 7 is the longest column and defines the number of rows in **input_data**.

Input data – extend me! | |||||

1 | 1 | 1 | 1 | 1 | 1 |

1 | 1 | 2 | 1 | 1 | 1 |

1 | 1 | 1 | 1 | 1 | 1 |

1 | 1 | 1 | 1 | 1 | 1 |

1 | 1 | 1 | |||

1 | |||||

1 |

I don’t want to use **.CurrentRegion** because I can’t be certain that adjacent cells won’t be populated at some time in the future. We can’t do something like **MAX(COUNT(B:G))** because** COUNT()** is built to handle two-dimensional ranges. I tried to be tricky, **{=MAX(COUNT(OFFSET(B1:G1),,,1000000,1))}**, hoping that Excel would evaluate the **COUNT()** for a set of **OFFSETs** from B1 to G1. Excel wasn’t fooled. Stack Overflow time then, and enter **SUBTOTAL()**, a function I confess I’ve never used in over 20 years of Excellence.

I define a new name, **input_data_columns** as **=Sheet1!$B:$G**, which I can then use in the definition of **input_data_rowcount**:

**=MAX(SUBTOTAL(2,OFFSET(input_data_columns,,COLUMN(input_data_columns)-MIN(COLUMN(input_data_columns)),,1)))**

In a worksheet this needs to be Control-Shift-Entered as an array formula, but there’s no obvious way to do this when defining a Named Formula. Fortunately, although I don’t understand why, it works OK anyway, leading in turn to **input_data** becoming

**=OFFSET(Sheet1!$B$2,0,0,input_data_rowcount,6)**

That works! Is it future-proof? Probably not^{2} but it might very well be an improvement on what came before.

Briefly checking what happens, if we remove the **MAX()** and put the remainder into a worksheet as an array formula, we get {4, 4, 7, 4, 5, 5}, which is indeed the result of **COUNT()** on each column.

^{1} From a selfish perspective, this is a Good Thing – it requires “professional” expertise to deal with and has put bread on my table many a time.

^{2} See ^{1}