How to discover new functions in Excel

Firstly, believe that Excel can do anything.

It’s true. Excel is a functional programming language. Not with the same power as some programming languages, maybe. But power is just a way of making a little go a long way (power = succinctness, according to Paul Graham). And Fred Brooks, in No Silver Bullet, argues:

I believe the single most powerful software-productivity strategy for many organizations today is to equip the computer-naive intellectual workers who are on the firing line with personal computers and good generalized writing, drawing, file, and spreadsheet programs and then to turn them loose.

Next, believe that Excel probably already has the function you’re looking for. Excel 2003 has over 300 functions. Presumably these are the most popular functions people use. Fair chance your function is one of them. Excellent chance that you don’t know about it.

So first, search through Excel’s help. I’ll admit, it’s not the best way to do it. I’ve learnt a trick to help me out. I search for a function that does similar stuff, and see the “See Also” section. Let me give you an example.

Once, we were modelling the revenues of a leasing company. Their finance manager had prepared a model to calculate the interest accruing from a lease. We needed the interest across several leases. With his model, we’d have to create 1 sheet for each lease. We were going to model thousands of leases. Clearly impossible.

Since I knew PMT could calculate the EMI, I checked the help on PMT, clicked the “See Also” link, and found a bunch of related functions. This, among others, lists the IPMT function, which can be used to calculate the interest at a single stroke, and a bunch of other useful functions. (That’s how I first learnt about IPMT.

Related functions in Excel

But the really useful link is the “Financial functions” one, which lists every single financial function in Excel. That’s worth going through in detail. In fact, there are many such categories that are useful: database functions, information functions, lookup and reference functions and text functions have some unexplored gems. Check out the List of worksheet functions on Excel.

Creating variwide charts in Excel

I mentioned that it’s possible to create variwides using X-Y scatter plots. The video below shows how.

User-defined functions to get cell formatting

Sometimes you want to check the colour of a cell, or whether a cell is bold. This can be easily done with user-defined functions (UDFs). (To create a UDF, press Alt-F11, Alt-I-M, and type the code below.)

User defined functions to get the background colour and bold value of a cell

You can use ISBOLD(cell) to check if a cell is bold, and BGCOLOR(cell) to get the background colour of the cell. This lets you selectively process bold or shaded cells. The two examples below show how you can add only the cells in bold, or only the shaded cells.

Example to selectively add shaded cells

Example to selectively add bold cells

Rather than use an additional column for ISBOLD or BGCOLOR, you can use an array formula, like below. (Remember to press Ctrl-Shift-Enter instead of Enter after typing this formula)

Example to selectively add bold cells using a single array formula

But first, you need to change the UDF to return an array rather than a single value. So IsBold will have to be modified as shown.

User defined function isBold modified to return an array

User-defined functions that process arrays can be very powerful. It can bring the full power of functional programming into Excel. I’ll describe some next week.

PS: In case you’re wondering, Application.Volatile tells Excel to recalculate the function every time the worksheet is recalculated. When a cell is made bold, or shaded, the value of the cell doesn’t change. So Excel doesn’t recalculate any formulas. You’ll have to manually press F9 every time to recalculate these cells. And Application.Volatile ensures that when you press F9, these functions are recalculated.

User-defined functions in Excel

Excel lets you create your own functions. If you wanted to create a function that returned the distance between two points (x1,y1) and (x2,y2), you can create a function DIST that takes these 4 parameters, and use it as shown below.

Example of a user-defined function in Excel

To create such a function,

  1. press Alt-F11 to open the Visual Basic Editor
  2. insert a new module (Alt-I-M)
  3. type the following code:

    Visual Basic code for the DIST user-defined Excel function

Anything you declare as a “Function” in Excel’s Visual Basic automatically becomes visible in the Insert-Function dialog box under the category “User Defined” (see examples). The function is normally saved with the file. This is a good idea if you’re going to distribute the file. You can also save your functions in your personal.xls file and load it on startup.

There are 3 places where I suggest using UDFs.

  1. You need to repeat a formula or use an additional cell to get the job done (e.g. replace Excel errors with empty strings)
  2. You can’t get the information from a formula (e.g. a cell’s background colour)
  3. It’s very cumbersome to get the information using formulas (e.g. regular expressions)

Let’s take the first one: replace Excel errors with empty strings. Normally, you’d store the results in a cell (say A2), and have another cell with the formula =IF(ISERROR(A2),””,A2). Instead, create this function NOERROR:

Function NOERROR in Excel Visual Basic

Now you can enclose any Excel function inside a NOERROR() and it’ll filter out the errors.

How the NOERROR function is used

Notice that cell E2 would’ve had a #N/A error if you’d just used the VLOOKUP. This function also filters out #REF, #DIV/0!, #NAME? and all other errors.

BTW, you see column F displaying the formula in column E. I didn’t type it out. That’s another UDF.

FormulaString function returns the formula of a cell

I will talk about the other two places where you use UDFs tomorrow.

In-cell Excel charts

Juice analytics has some Excel graphing tips. You can make charts like below without using charts, using just text.

These are useful because the charts are aligned with the data.

Excel Gantt charts using just text

Excel bar charts using just text

I once used a similar technique to display people’s staffing position. The sheet below lists people, projects they’re on and how long they’ll be on. The coloured cells to the right are a calendar display of the same stuff. Makes it easy to read.

Excel Staffing Plan without using charts

The trick is to place each week for each person as a thin cell, like below. Then the cell is populated with a formula that makes it 0 or 1 depending on whether the person is available that week or not. (The blue row #2 stores the start date of the week, and I compare this with the end date of each person’s project.)

Excel Staffing Plan - formula

And then, you can turn on conditional formatting.

Excel Staffing Plan - conditional formatting

Excel – Make your model visually obvious

Rule #4: Make your model visually obvious.

After years of creating Excel models with lots of inputs and lots of outputs, I’ve learnt two things.

Usually, only ONE input parameter matters. Think of this as being the constraint in the Theory of Constraints, or the principal component in factor analysis. You want your model to communicate the impact of the ONE parameter, and get a decision based on that. Keep the rest at their best default value.

Usually, only ONE output function matters. This is either a single number, or at most, a visually obvious function.

For example, I was working on creating an offshore test centre. The question was: what test activities should we outsource? I made a huge model evaluating 1,200 activities. A very elegant model. And totally incomprehensible. The issue really was simple. We could not recruit too fast. But at the same time, the more we offshore, the more the cost savings. So I built a summary sheet that showed the impact of one parameter (speed of ramp-up) on one function (offshore profile).

Offshore ramp-up profile

The fixed blue line shows the number of people required. The slider on the right ranges from “conservative ramp-up” to “aggressive ramp-up”. The green line shows how many resources will be offshore. In the conservative ramp-up, the recruitment rate is very manageable, but the saving is negligible, since hardly anyone is offshore. The aggressive ramp-up calls for an unmanageable ramp-up rate.

We showed the management this tradeoff. They said, “We can recruit 30 people next year”. That implied the moderate ramp-up scenario, and a cost saving of 27%. Total time spent in making decision: 2 minutes.

Creating a slider-based model is quite useful. You can download a US Treasury yields example to see how this is done. As you move the slider, the yield curve moves over time, showing how it has fluctuated. The trick is to:

  1. build the entire model based on a single cell. Cell H1 in this example acts as the index to the dates.
  2. create a slider. Go to View – Toolbars – Control Toolbox and add a slider.
  3. and link the slider to the cell. Right-click on the cell in design mode, select View Code, and type Range(“H1”).Value = ScrollBar1.Value in the Scrollbar change event.

Excel – Avoid manual labour 5

A few tips, without getting into the details. Array formulas let you perform any calculation on an entire list, and get the result in a single formula. Database functions like DSUM can manipulate tables of data very flexibly. Using these and Pivot tables, you can do anything with any tabular data.

Excel – Avoid manual labour 4

Debugging in Excel is another time consuming task. 80% of the trouble is identifying the problem (Error? What error? Where?) as opposed to fixing it (Why’s THAT cell showing THAT?). Most of my time is spent chasing three kinds of errors: wrong reference (leading to a #N/A or #REF!), wrong data input, or wrong formula.

Wrong references are easy to spot. You’ll see a #N/A or a #REF! sticking out. But on large sheets, even that’s tough to spot. I always have a SUM (or some kind of total) function that covers EVERY cell in EVERY table, even if I don’t need that information. If ther SUM shows a #N/A or #REF!, I can use Trace Error (Alt-T-U-E) to see where the problem is coming from.

If you know there’s a wrong reference in a cell (say A1), and want to ignore it, use a new cell with the formula =IF(ISERROR(A1),0,A1). You can substitute an entire row, column or table this way.

Wrong data input is best avoided upfront. Before I hand my Excel sheets over for large scale data entry, I do three things:

  1. Let them enter data only in input areas. Unprotect the cells the user can enter data in (Ctrl-1, Protection, remove the tick against ‘Locked’), and protect the whole sheet (Alt-T-P-P).
  2. Validate the data. Turn on data validation (Alt-D-L) on all editable cells, and specify the validation criteria.
  3. Make it easy to spot errors. If there are percentages that should add to 100%, show the total in a cell that turns green if the total is 100%, and red otherwise. (Use conditional formatting – Alt-O-D). If data about 20 people must be entered, show the number of people about whom data is entered, and mark it red until it’s 20. Make sure all criteria are captured. When the spreadsheet is filled, it should be impossible to make errors.

Wrong formulae (like + instead of -) are tough to spot. The best way to check for this is to do the same calculation in different ways, and compare the results. Whenever I create complex tables, I always have an error row at the bottom. I compare the totals on the table with the totals calculated in a simpler way and check the difference. This warns me when I miss out some elements, or double count something, in the table.

Excel – Avoid manual labour 3

A corollary of Rule 3: Never type the same formula twice. Design the formula so that if you cut and paste it elsewhere, it works correctly. The $ symbol and the F4 key for cell references help in 90% of the cases. For complex requirements and large data, 5 functions come in handy: INDIRECT, OFFSET, ADDRESS, ROW and COLUMN.

I once did a survey, and had data spread across 300 sheets (same format on all sheets). I needed cell D3 across all sheets in a column, to summarise the results. The image explains what I did.

Excel snapshot

INDIRECT returns the value of a cell. INDIRECT(“Sheet2!D3″) is the value of cell D3 in Sheet2. And INDIRECT(CONCATENATE(A2,”!D3″)) will give you the value of cell D3 in whatever sheet A2 specifies! I created a list of sheet names in column A, and column B had “D3” in each of those sheets. In effect, INDIRECT can transpose sheets into columns.

Getting a list of sheet names on to a column is tough, however. If your sheets are sequentially numbered, this image shows a trick that may help.

Excel snapshot

If you need multiple cells from a sheet, say D3:Z3, use the ADDRESS, ROW and COLUMN functions. ROW(D3) returns 3, and COLUMN(D3) returns 4 — the respective row and column. If you copy ROW(D3) to multiple rows, you will see ROW(D3), ROW(D4), ROW(D5), … which are 3, 4, 5 respectively. Similarly for COLUMN. It’s a useful way of linking values to position.

Excel snapshot

ADDRESS does the opposite. ADDRESS(ROW(D3),COLUMN(D3)) = ADDRESS(3,4) = “$D$3”. ADDRESS(3,4,1,1,”Sheet2″) returns “Sheet2!$D$3”. (See help for the ,1,1 in the middle, and just put it in always.) To cells D3:Z3 from all the sheets, copy the formula INDIRECT(ADDRESS(3,COLUMN(D3),1,1,$A2)) to the entire range. The INDIRECT, ADDRESS, ROW/COLUMN combination can slice contiguous data across sheets in any way you want.

Another useful function is OFFSET. OFFSET(D3,2,1) returns the value in cell E5. It shifts the reference D3 down by 2 rows and right by 1 column. OFFSET can be used instead of the INDIRECT and ADDRESS when multiple sheets are not involved. OFFSET can also return a range. OFFSET(D3,0,0,2,2) returns the range D3:E4, which is the 2×2 range starting from 0,0. So SUM(OFFSET(D3,0,0,2,2)) is the same as SUM(D3:E4). With OFFSET, you can specify a range with variable position and variable size (which you can’t with $ references).

Once, we were modelling a leasing company’s accounts. (Warning: this is a complex example.) We knew the volume of loans they would disburse over the next 3 years. The monthly interest rate is, say, 1%. What would be their interest income every month? Well, it’s not just 1% of what they’ve lent out. Customers pay back in equal monthly installments (EMIs). The EMI includes the principal and the interest. Initially, the EMI has a large interest component and very little principal, because there’s a lot left to repay. Towards the end, the balance dies down and so does the interest; it’s mainly the balance principal that’s being repaid. The interest income is not the same every month even for a single lease.

The calculation is conceptually simple. The IPMT function tells you the monthly interest each month. Let’s say all leases are for 36 months. So, to calculate the March interest income, take the January disbursals and multiply it by the third month interest component: IPMT(1%,3,36,-1). Take the Feb disbursals and multiply it by IPMT(1%,2,36,-1). Take the March disbursals and multiply it by IPMT(1%,1,36,-1). And add them up. For April, you’d add 4 terms. And so on. Mathematicians call this a convolution. It’s like a SUMPRODUCT of a series with another series in reverse.

Excel snapshot

Cell E4 on the image alongside does exactly that for month 3 (March). There are 5 columns:
A: Month
B: Amount disbursed that month
C: Months in reverse
D: Interest component for month in reverse
E: Interest income for month
E4 is the sumproduct of B2 to B4 (the Jan, Feb and March disbursals) with something else — an OFFSET. The offset says, from D1, move down C4 (34) rows and select A4 (3) cells further down. This has the interest components for the first, second and third months in reverse. So, the disbursal for Jan is multiplied with the 3rd month’s interest, Feb with the 2nd month’s interest, and Mar with the 1st month’s interest. That’s exactly what we wanted.

It may look complex. But remember: you have to type this complex formula only once, not 36 times. (And in my case, I had 18 product types.) Also, you’re less likely to make typing errors when cutting and pasting. So this saves you debugging time as well.

Excel – Avoid manual labour 2

Rule #3: Avoid manual labour (continued)

Reconciling data is where I spend most of my time on Excel. Say you have a list of branches by city from 2 banks. You want to know where both banks have branches. Excel doesn’t know that Kolkata is Calcutta. There are 500 cities, and you have 30 minutes.

Excel snapshot

Use VLOOKUP for a start. If Bank A’s cities are in column A (say 2-500) and Bank B’s cities in column B (say 2-400), in C2 type VLOOKUP(A2, B$2:B$400, 1, 0) (read Excel help — all I’ll say is, don’t miss out the 0 at the end: otherwise you get approximate match, and that’s not good). Copy the formula to down to C500. Similarly, in D2 type VLOOKUP(B2, A$2:A$500, 1, 0). Copy the formula down to D400.

Excel snapshot

You’ll see the #N/A where there’s no match. #N/A in Column C means Bank A has a branch here, but Bank B does not. Column D has the converse. But we’re not done yet. There could be spelling mistakes. Using two VLOOKUPs simplifies that problem considerably. We just need to match the cities having #N/A on both lists to check for alternate spellings of cities — which is a lot less work! So prepare a separate list: unmatched cities from Bank A, and unmatched cities from Bank B. (See the section on removing unwanted rows to simplify this.)

Excel snapshot

Sort both the lists while remembering the original order. You’ll want to remember the original order often — so just add a column, number it sequentially (1,2,3… use Alt-E-I-S), and sort the city names along with the numbers. When you want to get back the original order, just sort by the numbers again. To avoid distraction, you can move or hide these numbers. Now, you have a sorted list of unmatched cities. Notice that you can visually match many of these cities. There’s nothing easier to search (visually) than a sorted list.

Finally, when you’ve mapped all your columns, the ones that are remaining are the ones where there is no overlap.