Getting Bigger All The Time

in which we try to get tricky with Excel and find that Excel has adequate built-in trickiness already, thank you very much indeed

Over the years I’ve had a number of roles that involved fixing, tuning, cleaning and otherwise future- and bullet-proofing a fair number of Excel workbooks. There’s a whole catalog of potential issues to entertain and amuse but very common is the Extending Input Data Set. (I capitalised that in the hope that I might have christened some kind of Excel anti-pattern).

Your average Excel user is smart enough to know that some input data may grow over time: they may be copy-pasting from some other source, grabbing a database query result or even just typing in more values over time. It’s very common to see a row-by-row enrichment of such data for analysis and the formulae need to be extended to match the rows of input data. In the worst case, users have pre-copied more rows than they expect ever to need and don’t notice when they’ve started to miss data. More happily, they copy down as they see the data extending. If they see the data extending, that is1.

Helping a colleague to avoid such an unpleasantness recently led to a couple of interesting (to me) solutions. Firstly, we looked at extending the calculations applied to a list of data that we knew would extend every time we ran the spreadsheet model.

We assume we have a defined name, input_data that does what it says. Further, we have output_formulae, which should correspond one-for-one with the input_data rows. When input_data  changes, we want to automatically extend output_formulae to match. Pretty straightforward, as it turns out:

Private Sub Worksheet_Change(ByVal Target As Range)
  If Intersect(Target, Range("input_data")) Is Nothing Then
    Exit Sub
  End If
  ExtendOutputFormulaeAsRequired
End Sub

Sub ExtendOutputFormulaeAsRequired()

  Dim inputRowCount As Long, calcRowCount As Long
  inputRowCount = Range("input_data").Rows.Count
  calcRowCount = Range("output_formulae").Rows.Count

  If inputRowCount <= calcRowCount Then
    Exit Sub ' enough formula rows already - could reduce if needed, but not done here
  End If

  With Range("output_formulae")
    ' Assumes just formulae are needed:
    .Offset(calcRowCount, 0).Resize(inputRowCount - calcRowCount, .Columns.Count).Formula = _
        .Rows(calcRowCount).Formula
  End With
End Sub

That works pretty well. At least, it does if input_data is defined, something we can usually do fairly easily, using a named formula, something like =OFFSET(B2,,,COUNT(B:B),6) (assuming we have 6 rows and there’s a non-numeric column heading in B1, which in this case there is).

Here’s the wrinkle that gives rise to the second interesting (to me) question, that necessitated a question on Stack Overflow. What if I can’t be sure that all my columns will be the same length? I want the length of the longest column (there won’t be gaps vertically, which is something at least). So in the sample below, there are 7 rows of data in columns B to G, with a heading on row 1, so 7 is the longest column and defines the number of rows in input_data.

Input data – extend me!
1 1 1 1 1 1
1 1 2 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1
1
1

I don’t want to use .CurrentRegion because I can’t be certain that adjacent cells won’t be populated at some time in the future. We can’t do something like MAX(COUNT(B:G)) because COUNT() is built to handle two-dimensional ranges. I tried to be tricky, {=MAX(COUNT(OFFSET(B1:G1),,,1000000,1))}, hoping that Excel would evaluate the COUNT() for a set of OFFSETs from B1 to G1. Excel wasn’t fooled. Stack Overflow time then, and enter SUBTOTAL(), a function I confess I’ve never used in over 20 years of Excellence.

I define a new name, input_data_columns as =Sheet1!$B:$G, which I can then use in the definition of input_data_rowcount:

=MAX(SUBTOTAL(2,OFFSET(input_data_columns,,COLUMN(input_data_columns)-MIN(COLUMN(input_data_columns)),,1)))

In a worksheet this needs to be Control-Shift-Entered as an array formula, but there’s no obvious way to do this when defining a Named Formula. Fortunately, although I don’t understand why, it works OK anyway, leading in turn to input_data becoming

=OFFSET(Sheet1!$B$2,0,0,input_data_rowcount,6)

That works! Is it future-proof? Probably not2 but it might very well be an improvement on what came before.

Briefly checking what happens, if we remove the MAX() and put the remainder into a worksheet as an array formula, we get {4, 4, 7, 4, 5, 5}, which is indeed the result of COUNT() on each column.


1 From a selfish perspective, this is a Good Thing – it requires “professional” expertise to deal with and has put bread on my table many a time.

2 See 1