Excel tips

Web lookup using Excel

Take a look at the Excel screenshot below.

Amazon Web Prices

Yes, that’s right. I have a user-defined function called AMAZONPRICE. And it returns these cameras’ prices directly from Amazon.com. (Given the category and some keywords, it returns the price of the bestselling item on Amazon.com.)

Here’s the code behind the function.

Function AmazonPrice(index As String, keywords As String) As String
  Dim xDoc As MSXML2.DOMDocument30
  Set xDoc = New MSXML2.DOMDocument30
  xDoc.async = False
  If xDoc.Load("http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService" _
    "&Version=2005-03-23&Operation=ItemSearch&SubscriptionId=03SDGQDFEB455W53SB82" _
    "&AssociateTag=sanand-20&MinimumPrice=10000&ResponseGroup=OfferSummary,Small" _
    "&Sort=salesrank&SearchIndex=" & index & "&Keywords=" & keywords) Then
    xDoc.setProperty "SelectionLanguage", "XPath"
    xDoc.setProperty "SelectionNamespaces", _
      "xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""
    AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text
  End If
End Function

This is how it all started…

Flickr has a camera finder that shows the most popular cameras in the Flickr community.

Flickr Camera Finder

I love comparing gadgets, I’d been doing some research around these cameras, and the Fuji series (because I own a Fuji Finepix S5600). I’d normally make a spreadsheet that compares these cameras on various parameters, including price.

Since I believe in never typing in data, wondered if there was a way to get the prices in automatically…

Two things made this possible.

  1. Amazon offers web services which let you get price (and almost any other) data for their products
  2. Visual Basic lets you use pretty much any ActiveX object as a control. Microsoft offers MSXML which you can use to load (or download) any XML file, and parse it.

The first task is to get the XML feed for a product you want. Amazon lets you do that through by typing in a URL. The best way to construct the URL is through AWSZone. I picked the US ItemSearch method, which searches for a title or keyword within a category, and returns the matches. The feed for the Canon EOS Digital Rebel XT, based on this, would be at:

http://ecs.amazonaws.com/onca/xml?
    Service=AWSECommerceService&
    Version=2005-03-23&
    Operation=ItemSearch&
    SubscriptionId=0525E2PQ81DD7ZTWTK82&
    SearchIndex=Electronics&
    Keywords=Canon EOS Digital Rebel XT&
    Sort=salesrank&
    ResponseGroup=Offers,Small

(You really need to replace the Subscription ID with your own.)

If you retrieved this URL, you have an XML file containing the details of all Canon EOS Digital Rebel XTs, sorted by sales rank.

To load this in Excel, you need to create a UDF in Visual Basic. First, go to Tools – References and enable Microsoft XML, v3.0 or v4.0. Now, to load an XML document, do this:

Dim xDoc As MSXML2.DOMDocument30
Set xDoc = New MSXML2.DOMDocument30
xDoc.async = False
xDoc.Load(url)

If the load succeeds, then you can extract the information fairly easily, using XPath.

xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.setProperty "SelectionNamespaces", _
        "xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""

The first line says we’ll be searching using XPath. The second line is a workaround to support default namespaces. (Don’t worry about it. I don’t quite get it either.)

Finally, you get the price from the XML tree. In this case, it’s under ItemSearchResponse/Items/Item/OfferSummary/LowestNewPrice/Amount, and it’s in cents.

        AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text

That’s it! Now that this function is defined, just pass it the category and keywords, and you have the price of the first matching product. You can retrieve any other information about products as well — like sales rank, weight, ratings, reviews, etc.


Here’s the spreadsheet for you to play around with.


Excel 2003 Power Programming with VBA (Excel Power Programming With Vba) Writing Excel Macros with VBA, 2nd Edition

User defined array functions in Excel

Many languages have functions to process lists (array). These functions usually return a list, so you can pass that to another list function. This chaining of functions is really powerful.

UNIX provides this sort of chaining capability. If I had a cities (with some repetitions) and I wanted to find out how many started with the letter ‘A’, I’d just type:

cat cities | sort | uniq | grep "^A" | wc
cat: types the cities.
sort: sorts the cities alphabetically.
uniq: finds unique cities (works only if sorted - that's why we had to sort the list).
grep: filters the cities. Only allows cities beginning with A.
wc: word count

To do this on Excel, the only way is to

  1. get the unique values. Data – Filter – Advanced Filter, and select “Unique records only”, “Copy the list to another location”, and select a location
  2. get the first letter. =LEFT(cell,1) returns the first letter of the cell.
  3. count the number of “A”s. =COUNT(range, “A”) counts the number of “A”s.

But ideally, I’d like a 1-line formula like

=LENGTH(UNIQUE(GREP("^A", range)))

Excel doesn’t provide these functions by default, but you can add them as user defined functions. Doing this lets you condense several cells into one. Instead of having to copy all your data into a set of unique values, and then adding a column for the first cell, the entire operation can be condensed into one formula.

I consider the following functions the a basic set for list processing.

  • LENGTH(list) counts the number of elements in a list
  • INDEX(list, n) returns the nth element of the list
  • GREP(string, list) returns elements of the list that have the string
  • UNIQUE(list) filters unique values
  • UNION(list, list) returns elements in at least one of the lists
  • INTERSECTION(list, list) returns elements in both lists
  • DIFFERENCE(list, list) returns the elements in the first list but not the second
  • REVERSE(list) reverses the order of the list
  • STRJOIN(separator, list) joins the elements of the list into a string, separated by a separator
  • STRSPLIT(separator, string) splits the string into a list, using a separator
  • MVLOOKUP(value, lookup, result) looks up value in “lookup”, and returns the corresponding MULTIPLE values from “result”

I created these UDFs. You can download the functions and play with them. Below are some tasks that you can do with them, that are difficult otherwise.

  • Get the file name from a path.
    =INDEX(REVERSE(STRSPLIT("\", filename)), 1)
  • Count the number of unique elements in a range.
    =LENGTH(UNIQUE(range))
  • How many common elements are there in range 1 and range 2?
    =LENGTH(INTERSECTION(range1, range2))
  • How many words are there in a string?
    =LENGTH(STRSPLIT(" ", string))
  • Get the smallest unique numbers in a range
    =SMALL(UNIQUE(range), 5)
  • Count the number of mismatches between two lists.
    =COUNT(Range1)+COUNT(Range2) - COUNT(INTERSECTION(Range1,Range2))
  • Get a list of mismatches between two lists.
    =STRJOIN(",",UNION(DIFFERENCE(Range1,Range2), DIFFERENCE(Range2,Range1)))
  • Count duplicate entries in a range.
    =LENGTH(Range)-LENGTH(UNIQUE(Range))
  • VLOOKUP multiple values
    =MVLOOKUP(A1,Lookup_Range,Return_Range)
  • Count the unique matches in a VLOOKUP
    =COUNT(UNIQUE(MVLOOKUP(A1,Lookup_Range,Return_Range)

This is a small sample. The power of list processing is phenomenal, especially when combined with array formulas. Download these macros and play with them!

How to discover new functions in Excel

Firstly, believe that Excel can do anything.

It’s true. Excel is a functional programming language. Not with the same power as some programming languages, maybe. But power is just a way of making a little go a long way (power = succinctness, according to Paul Graham). And Fred Brooks, in No Silver Bullet, argues:

I believe the single most powerful software-productivity strategy for many organizations today is to equip the computer-naive intellectual workers who are on the firing line with personal computers and good generalized writing, drawing, file, and spreadsheet programs and then to turn them loose.

Next, believe that Excel probably already has the function you’re looking for. Excel 2003 has over 300 functions. Presumably these are the most popular functions people use. Fair chance your function is one of them. Excellent chance that you don’t know about it.

So first, search through Excel’s help. I’ll admit, it’s not the best way to do it. I’ve learnt a trick to help me out. I search for a function that does similar stuff, and see the “See Also” section. Let me give you an example.

Once, we were modelling the revenues of a leasing company. Their finance manager had prepared a model to calculate the interest accruing from a lease. We needed the interest across several leases. With his model, we’d have to create 1 sheet for each lease. We were going to model thousands of leases. Clearly impossible.

Since I knew PMT could calculate the EMI, I checked the help on PMT, clicked the “See Also” link, and found a bunch of related functions. This, among others, lists the IPMT function, which can be used to calculate the interest at a single stroke, and a bunch of other useful functions. (That’s how I first learnt about IPMT.

Related functions in Excel

But the really useful link is the “Financial functions” one, which lists every single financial function in Excel. That’s worth going through in detail. In fact, there are many such categories that are useful: database functions, information functions, lookup and reference functions and text functions have some unexplored gems. Check out the List of worksheet functions on Excel.

User-defined functions to get cell formatting

Sometimes you want to check the colour of a cell, or whether a cell is bold. This can be easily done with user-defined functions (UDFs). (To create a UDF, press Alt-F11, Alt-I-M, and type the code below.)

User defined functions to get the background colour and bold value of a cell

You can use ISBOLD(cell) to check if a cell is bold, and BGCOLOR(cell) to get the background colour of the cell. This lets you selectively process bold or shaded cells. The two examples below show how you can add only the cells in bold, or only the shaded cells.

Example to selectively add shaded cells

Example to selectively add bold cells

Rather than use an additional column for ISBOLD or BGCOLOR, you can use an array formula, like below. (Remember to press Ctrl-Shift-Enter instead of Enter after typing this formula)

Example to selectively add bold cells using a single array formula

But first, you need to change the UDF to return an array rather than a single value. So IsBold will have to be modified as shown.

User defined function isBold modified to return an array

User-defined functions that process arrays can be very powerful. It can bring the full power of functional programming into Excel. I’ll describe some next week.

PS: In case you’re wondering, Application.Volatile tells Excel to recalculate the function every time the worksheet is recalculated. When a cell is made bold, or shaded, the value of the cell doesn’t change. So Excel doesn’t recalculate any formulas. You’ll have to manually press F9 every time to recalculate these cells. And Application.Volatile ensures that when you press F9, these functions are recalculated.

User-defined functions in Excel

Excel lets you create your own functions. If you wanted to create a function that returned the distance between two points (x1,y1) and (x2,y2), you can create a function DIST that takes these 4 parameters, and use it as shown below.

Example of a user-defined function in Excel

To create such a function,

  1. press Alt-F11 to open the Visual Basic Editor
  2. insert a new module (Alt-I-M)
  3. type the following code:

    Visual Basic code for the DIST user-defined Excel function

Anything you declare as a “Function” in Excel’s Visual Basic automatically becomes visible in the Insert-Function dialog box under the category “User Defined” (see examples). The function is normally saved with the file. This is a good idea if you’re going to distribute the file. You can also save your functions in your personal.xls file and load it on startup.

There are 3 places where I suggest using UDFs.

  1. You need to repeat a formula or use an additional cell to get the job done (e.g. replace Excel errors with empty strings)
  2. You can’t get the information from a formula (e.g. a cell’s background colour)
  3. It’s very cumbersome to get the information using formulas (e.g. regular expressions)

Let’s take the first one: replace Excel errors with empty strings. Normally, you’d store the results in a cell (say A2), and have another cell with the formula =IF(ISERROR(A2),””,A2). Instead, create this function NOERROR:

Function NOERROR in Excel Visual Basic

Now you can enclose any Excel function inside a NOERROR() and it’ll filter out the errors.

How the NOERROR function is used

Notice that cell E2 would’ve had a #N/A error if you’d just used the VLOOKUP. This function also filters out #REF, #DIV/0!, #NAME? and all other errors.

BTW, you see column F displaying the formula in column E. I didn’t type it out. That’s another UDF.

FormulaString function returns the formula of a cell

I will talk about the other two places where you use UDFs tomorrow.

In-cell Excel charts

Juice analytics has some Excel graphing tips. You can make charts like below without using charts, using just text.

These are useful because the charts are aligned with the data.

Excel Gantt charts using just text

Excel bar charts using just text

I once used a similar technique to display people’s staffing position. The sheet below lists people, projects they’re on and how long they’ll be on. The coloured cells to the right are a calendar display of the same stuff. Makes it easy to read.

Excel Staffing Plan without using charts

The trick is to place each week for each person as a thin cell, like below. Then the cell is populated with a formula that makes it 0 or 1 depending on whether the person is available that week or not. (The blue row #2 stores the start date of the week, and I compare this with the end date of each person’s project.)

Excel Staffing Plan - formula

And then, you can turn on conditional formatting.

Excel Staffing Plan - conditional formatting

Excel – Make your model visually obvious

Rule #4: Make your model visually obvious.

After years of creating Excel models with lots of inputs and lots of outputs, I’ve learnt two things.

Usually, only ONE input parameter matters. Think of this as being the constraint in the Theory of Constraints, or the principal component in factor analysis. You want your model to communicate the impact of the ONE parameter, and get a decision based on that. Keep the rest at their best default value.

Usually, only ONE output function matters. This is either a single number, or at most, a visually obvious function.

For example, I was working on creating an offshore test centre. The question was: what test activities should we outsource? I made a huge model evaluating 1,200 activities. A very elegant model. And totally incomprehensible. The issue really was simple. We could not recruit too fast. But at the same time, the more we offshore, the more the cost savings. So I built a summary sheet that showed the impact of one parameter (speed of ramp-up) on one function (offshore profile).

Offshore ramp-up profile

The fixed blue line shows the number of people required. The slider on the right ranges from “conservative ramp-up” to “aggressive ramp-up”. The green line shows how many resources will be offshore. In the conservative ramp-up, the recruitment rate is very manageable, but the saving is negligible, since hardly anyone is offshore. The aggressive ramp-up calls for an unmanageable ramp-up rate.

We showed the management this tradeoff. They said, “We can recruit 30 people next year”. That implied the moderate ramp-up scenario, and a cost saving of 27%. Total time spent in making decision: 2 minutes.

Creating a slider-based model is quite useful. You can download a US Treasury yields example to see how this is done. As you move the slider, the yield curve moves over time, showing how it has fluctuated. The trick is to:

  1. build the entire model based on a single cell. Cell H1 in this example acts as the index to the dates.
  2. create a slider. Go to View – Toolbars – Control Toolbox and add a slider.
  3. and link the slider to the cell. Right-click on the cell in design mode, select View Code, and type Range(“H1”).Value = ScrollBar1.Value in the Scrollbar change event.

Excel – Avoid manual labour 4

Debugging in Excel is another time consuming task. 80% of the trouble is identifying the problem (Error? What error? Where?) as opposed to fixing it (Why’s THAT cell showing THAT?). Most of my time is spent chasing three kinds of errors: wrong reference (leading to a #N/A or #REF!), wrong data input, or wrong formula.

Wrong references are easy to spot. You’ll see a #N/A or a #REF! sticking out. But on large sheets, even that’s tough to spot. I always have a SUM (or some kind of total) function that covers EVERY cell in EVERY table, even if I don’t need that information. If ther SUM shows a #N/A or #REF!, I can use Trace Error (Alt-T-U-E) to see where the problem is coming from.

If you know there’s a wrong reference in a cell (say A1), and want to ignore it, use a new cell with the formula =IF(ISERROR(A1),0,A1). You can substitute an entire row, column or table this way.

Wrong data input is best avoided upfront. Before I hand my Excel sheets over for large scale data entry, I do three things:

  1. Let them enter data only in input areas. Unprotect the cells the user can enter data in (Ctrl-1, Protection, remove the tick against ‘Locked’), and protect the whole sheet (Alt-T-P-P).
  2. Validate the data. Turn on data validation (Alt-D-L) on all editable cells, and specify the validation criteria.
  3. Make it easy to spot errors. If there are percentages that should add to 100%, show the total in a cell that turns green if the total is 100%, and red otherwise. (Use conditional formatting – Alt-O-D). If data about 20 people must be entered, show the number of people about whom data is entered, and mark it red until it’s 20. Make sure all criteria are captured. When the spreadsheet is filled, it should be impossible to make errors.

Wrong formulae (like + instead of -) are tough to spot. The best way to check for this is to do the same calculation in different ways, and compare the results. Whenever I create complex tables, I always have an error row at the bottom. I compare the totals on the table with the totals calculated in a simpler way and check the difference. This warns me when I miss out some elements, or double count something, in the table.