Excel tips

Web lookup using Google Spreadsheets

31 Dec 2007 | comments

I'd written earlier about Web lookup in Excel. I showed an example how you could create a movie wishlist that showed the links to the torrents from Mininova.

You can do that even easier on Google Spreadsheets. It has 4 functions that let you import external data:

Firstly, you can see straight off why it's easy to view RSS feeds in Google Spreadsheets. Just use the importFeed function straight away. So, for example, if I wanted to track all 8GB iPods on Google Base, I can import its feed in Google Spreadsheets.

Google Spreadsheets ImportFeed

This automatically creates a list of the latest 8GB iPods.

Incidentally, the "Price" column doesn't appear automatically. It's a part of the description. But it's quite easy to get the price using the standard Excel functions. Let's say the description is in cell C1. =MID(C1, FIND("Price", C1), 20) gets you the 20 characters starting from "Price". Then you can sort and play around as usual.

The other powerful thing about Google Spreadsheets is the CONTINUE function. The importFeed function creates a multi-dimensional array. You can extract any cell from the array (for example, row 3, column 2 from cell C1) using CONTINUE(C1, 3, 2). So you can just pick up the title and description, or only alternate rows, or put all rows and columns in a single column -- whatever.

Google Spreadsheets CONTINUE

The most versatile of the import functions is the importXML function. It lets you import any URL (including an RSS feed), filtering only the XPath you need. As I mentioned earlier, you can scrape any site using XPath.

For example, =importXML("http://www.imdb.com/chart/top", "//table//table//table//a") imports the top 250 movies from the IMDb Top 250. the second parameter says, get all links (a) inside a table inside a table inside a table. This populates a list with the entire Top 250.

Google Spreadsheets - ImportXML

Now, against each of these, we could get a feed of Mininova's torrents. Mininova's RSS URL is http://www.mininova.org/rss/search_string. So, in cell B1, I can get a torrent for the cell A1 (The Godfather) using the importFeed function. (Note: you need to replace spaces with a + symbol. These functions don't like invalid URLs.).

Google Spreadsheets - Import Mininova Feed

Just copy this formula down to get a torrent against each of the IMDb Top 250 movies!

Check out the sheet I've created. (You need a Google account to see the sheet. If you don't want have one, you can view the sheet.)


Now, that's still not the best of it. You can extract this file as an RSS feed! Google lets you publish your sheets as HTML, PDF, Text, XLS, etc. and RSS and Atom are included as well. Here's the RSS feed for my sheet above.

Think about it. We now have an application that sucks in data from a web page, does a web-based vlookup on another service, and returns the results as an RSS feed!

There are only two catches to this. The first is that Google has restricted us to 50 import functions per sheet. So you can't really have the IMDb Top 250 populated here -- only the top 49. The second is that the spreadsheet updates only when you open it again. So it's not really a dynamically updating feed. You need to open the spreadsheet to refresh it.

But if you really wanted these things, there's always Yahoo! Pipes.

Web lookup using Excel

03 May 2007 | comments

Take a look at the Excel screenshot below.

Amazon Web Prices

Yes, that's right. I have a user-defined function called AMAZONPRICE. And it returns these cameras' prices directly from Amazon.com. (Given the category and some keywords, it returns the price of the bestselling item on Amazon.com.)

Here's the code behind the function.

Function AmazonPrice(index As String, keywords As String) As String
Dim xDoc As MSXML2.DOMDocument30
Set xDoc = New MSXML2.DOMDocument30
xDoc.async = False
If xDoc.Load("http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService" _
"&Version=2005-03-23&Operation=ItemSearch&SubscriptionId=03SDGQDFEB455W53SB82" _
"&AssociateTag=sanand-20&MinimumPrice=10000&ResponseGroup=OfferSummary,Small" _
"&Sort=salesrank&SearchIndex=" & index & "&Keywords=" & keywords) Then
xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.setProperty "SelectionNamespaces", _
"xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""
AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text
End If
End Function

This is how it all started...

Flickr has a camera finder that shows the most popular cameras in the Flickr community.

Flickr Camera Finder

I love comparing gadgets, I'd been doing some research around these cameras, and the Fuji series (because I own a Fuji Finepix S5600). I'd normally make a spreadsheet that compares these cameras on various parameters, including price.

Since I believe in never typing in data, wondered if there was a way to get the prices in automatically...

Two things made this possible.

  1. Amazon offers web services which let you get price (and almost any other) data for their products
  2. Visual Basic lets you use pretty much any ActiveX object as a control. Microsoft offers MSXML which you can use to load (or download) any XML file, and parse it.

The first task is to get the XML feed for a product you want. Amazon lets you do that through by typing in a URL. The best way to construct the URL is through AWSZone. I picked the US ItemSearch method, which searches for a title or keyword within a category, and returns the matches. The feed for the Canon EOS Digital Rebel XT, based on this, would be at:

http://ecs.amazonaws.com/onca/xml?
Service=AWSECommerceService&
Version=2005-03-23&
Operation=ItemSearch&
SubscriptionId=0525E2PQ81DD7ZTWTK82&
SearchIndex=Electronics&
Keywords=Canon EOS Digital Rebel XT&
Sort=salesrank&
ResponseGroup=Offers,Small

(You really need to replace the Subscription ID with your own.)

If you retrieved this URL, you have an XML file containing the details of all Canon EOS Digital Rebel XTs, sorted by sales rank.

To load this in Excel, you need to create a UDF in Visual Basic. First, go to Tools - References and enable Microsoft XML, v3.0 or v4.0. Now, to load an XML document, do this:

Dim xDoc As MSXML2.DOMDocument30
Set xDoc = New MSXML2.DOMDocument30
xDoc.async = False
xDoc.Load(url)

If the load succeeds, then you can extract the information fairly easily, using XPath.

xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.setProperty "SelectionNamespaces", _
"xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""

The first line says we'll be searching using XPath. The second line is a workaround to support default namespaces. (Don't worry about it. I don't quite get it either.)

Finally, you get the price from the XML tree. In this case, it's under ItemSearchResponse/Items/Item/OfferSummary/LowestNewPrice/Amount, and it's in cents.

AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text

That's it! Now that this function is defined, just pass it the category and keywords, and you have the price of the first matching product. You can retrieve any other information about products as well -- like sales rank, weight, ratings, reviews, etc.


Here's the spreadsheet for you to play around with.


Excel 2003 Power Programming with VBA (Excel Power Programming With Vba) Writing Excel Macros with VBA, 2nd Edition

| alternate titles: Excel Web

RSS feeds in Excel

03 May 2007 | comments

The technique of Web lookups in Excel I described yesterday is very versatile. I will be running through some of the practical uses it can be put to over the next few days

TO generalise things beyond just getting the Amazon price, I created a user-defined function called XPATH. It takes two parameters:

URL of the XML feed to read
Search XPath list string (separated by spaces)

This function can be used to extract information out of any XML file on the Web and get it out as a table. For example, if you wanted to watch the Top 10 movies on the IMDb Top 250, and were looking for torrents, an RSS feed is available from mininova. The URL http://www.mininova.org/rss/movie_name/4 gives you an RSS file matching all movies with "movie_name". From this, we need to extract the <item><title> and <item><link> elements. That's represented by "//item title link" on my search string.

Mininova RSS feed in Excel

The formula becomes XPath2( "http://www.mininova.org/rss/"&A2&"/4", "//item title link"). The result is a 2-dimensional array returning individual items in rows, and the columns are title and link. Pulling it all together, you can get the sheet above.

All of this could be done using a command-line program. Excel has one huge advantage though. It's one of the most powerful user-interfaces. Increasingly, I'm beginning to rely on just two user interfaces for almost any task. One is the browser, and the other is Excel. With Excel, I could have a sheet that has my movie wishlist (which changes often), and add check to see if the torrent exists. Every time I add a bunch of movies to the wishlist, it's just a matter of copying the formula down. No need to visit a torrent search engine and typing each movie in, one by one.

Another example. Someone suggests 10 movies to watch. I'd rather watch the ones with a higher IMDb rating. Again, enter the Web lookup. Type in the movie names. Use a function like this to look up the rating on IMDb, and sort by the rating.

The possibilities are endless.

User defined array functions in Excel

23 Feb 2007 | comments

Many languages have functions to process lists (array). These functions usually return a list, so you can pass that to another list function. This chaining of functions is really powerful.

UNIX provides this sort of chaining capability. If I had a cities (with some repetitions) and I wanted to find out how many started with the letter 'A', I'd just type:

cat cities | sort | uniq | grep "^A" | wc

cat: types the cities.
sort: sorts the cities alphabetically.
uniq: finds unique cities (works only if sorted - that's why we had to sort the list).
grep: filters the cities. Only allows cities beginning with A.
wc: word count

To do this on Excel, the only way is to

  1. get the unique values. Data - Filter - Advanced Filter, and select "Unique records only", "Copy the list to another location", and select a location
  2. get the first letter. =LEFT(cell,1) returns the first letter of the cell.
  3. count the number of "A"s. =COUNT(range, "A") counts the number of "A"s.

But ideally, I'd like a 1-line formula like

=LENGTH(UNIQUE(GREP("^A", range)))

Excel doesn't provide these functions by default, but you can add them as user defined functions. Doing this lets you condense several cells into one. Instead of having to copy all your data into a set of unique values, and then adding a column for the first cell, the entire operation can be condensed into one formula.

I consider the following functions the a basic set for list processing.

I created these UDFs. You can download the functions and play with them. Below are some tasks that you can do with them, that are difficult otherwise.

This is a small sample. The power of list processing is phenomenal, especially when combined with array formulas. Download these macros and play with them!


Excel 2003 Power Programming with VBA (Excel Power Programming With Vba) Professional Excel Development: The Definitive Guide to Developing Applications Using Microsoft(R) Excel and VBA(R) (The Addison-Wesley Microsoft Technology Series) VBA and Macros for Microsoft Excel (Business Solutions) Advanced modelling in finance using Excel and VBA

How to discover new functions in Excel

13 Jan 2007 | comments

Firstly, believe that Excel can do anything.

It's true. Excel is a functional programming language. Not with the same power as some programming languages, maybe. But power is just a way of making a little go a long way (power = succinctness, according to Paul Graham). And Fred Brooks, in No Silver Bullet, argues:

I believe the single most powerful software-productivity strategy for many organizations today is to equip the computer-naive intellectual workers who are on the firing line with personal computers and good generalized writing, drawing, file, and spreadsheet programs and then to turn them loose.

Next, believe that Excel probably already has the function you're looking for. Excel 2003 has over 300 functions. Presumably these are the most popular functions people use. Fair chance your function is one of them. Excellent chance that you don't know about it.

So first, search through Excel's help. I'll admit, it's not the best way to do it. I've learnt a trick to help me out. I search for a function that does similar stuff, and see the "See Also" section. Let me give you an example.

Once, we were modelling the revenues of a leasing company. Their finance manager had prepared a model to calculate the interest accruing from a lease. We needed the interest across several leases. With his model, we'd have to create 1 sheet for each lease. We were going to model thousands of leases. Clearly impossible.

Since I knew PMT could calculate the EMI, I checked the help on PMT, clicked the "See Also" link, and found a bunch of related functions. This, among others, lists the IPMT function, which can be used to calculate the interest at a single stroke, and a bunch of other useful functions. (That's how I first learnt about IPMT.

Related functions in Excel

But the really useful link is the "Financial functions" one, which lists every single financial function in Excel. That's worth going through in detail. In fact, there are many such categories that are useful: database functions, information functions, lookup and reference functions and text functions have some unexplored gems. Check out the List of worksheet functions on Excel.


Excel 2003 Formulas Excel 2003 for Dummies Excel 2003 Bible Excel 2003 Personal Trainer (Personal Trainer (O'Reilly))


The other option, of course, is to search Google for what you're looking for. I've pulled together a custom search engine for Excel sites. Try it out below.

Creating variwide charts in Excel

22 Sep 2006 | comments

I mentioned that it's possible to create variwides using X-Y scatter plots. The video below shows how.

If you're not able to access the video (I remember Google Video was not accessible from India), please let me know in the comments box below.

User-defined functions to get cell formatting

02 Sep 2006 | comments

Sometimes you want to check the colour of a cell, or whether a cell is bold. This can be easily done with user-defined functions (UDFs). (To create a UDF, press Alt-F11, Alt-I-M, and type the code below.)

User defined functions to get the background colour and bold value of a cell

You can use ISBOLD(cell) to check if a cell is bold, and BGCOLOR(cell) to get the background colour of the cell. This lets you selectively process bold or shaded cells. The two examples below show how you can add only the cells in bold, or only the shaded cells.

Example to selectively add shaded cells

Example to selectively add bold cells

Rather than use an additional column for ISBOLD or BGCOLOR, you can use an array formula, like below. (Remember to press Ctrl-Shift-Enter instead of Enter after typing this formula)

Example to selectively add bold cells using a single array formula

But first, you need to change the UDF to return an array rather than a single value. So IsBold will have to be modified as shown.

User defined function isBold modified to return an array

User-defined functions that process arrays can be very powerful. It can bring the full power of functional programming into Excel. I'll describe some next week.

PS: In case you're wondering, Application.Volatile tells Excel to recalculate the function every time the worksheet is recalculated. When a cell is made bold, or shaded, the value of the cell doesn't change. So Excel doesn't recalculate any formulas. You'll have to manually press F9 every time to recalculate these cells. And Application.Volatile ensures that when you press F9, these functions are recalculated.


Excel 2003 Power Programming with VBA (Excel Power Programming With Vba) Professional Excel Development: The Definitive Guide to Developing Applications Using Microsoft(R) Excel and VBA(R) (The Addison-Wesley Microsoft Technology Series) VBA and Macros for Microsoft Excel (Business Solutions) Advanced modelling in finance using Excel and VBA

User-defined functions in Excel

01 Sep 2006 | comments

Excel lets you create your own functions. If you wanted to create a function that returned the distance between two points (x1,y1) and (x2,y2), you can create a function DIST that takes these 4 parameters, and use it as shown below.

Example of a user-defined function in Excel

To create such a function,

  1. press Alt-F11 to open the Visual Basic Editor
  2. insert a new module (Alt-I-M)
  3. type the following code:

    Visual Basic code for the DIST user-defined Excel function

Anything you declare as a "Function" in Excel's Visual Basic automatically becomes visible in the Insert-Function dialog box under the category "User Defined" (see examples). The function is normally saved with the file. This is a good idea if you're going to distribute the file. You can also save your functions in your personal.xls file and load it on startup.

There are 3 places where I suggest using UDFs.

  1. You need to repeat a formula or use an additional cell to get the job done (e.g. replace Excel errors with empty strings)
  2. You can't get the information from a formula (e.g. a cell's background colour)
  3. It's very cumbersome to get the information using formulas (e.g. regular expressions)

Let's take the first one: replace Excel errors with empty strings. Normally, you'd store the results in a cell (say A2), and have another cell with the formula =IF(ISERROR(A2),"",A2). Instead, create this function NOERROR:

Function NOERROR in Excel Visual Basic

Now you can enclose any Excel function inside a NOERROR() and it'll filter out the errors.

How the NOERROR function is used

Notice that cell E2 would've had a #N/A error if you'd just used the VLOOKUP. This function also filters out #REF, #DIV/0!, #NAME? and all other errors.

BTW, you see column F displaying the formula in column E. I didn't type it out. That's another UDF.

FormulaString function returns the formula of a cell

I will talk about the other two places where you use UDFs tomorrow.


Excel VBA Programming For Dummies (For Dummies (Computer/Tech)) Professional Excel Development: The Definitive Guide to Developing Applications Using Microsoft(R) Excel and VBA(R) (The Addison-Wesley Microsoft Technology Series) VBA and Macros for Microsoft Excel (Business Solutions) Financial Modeling Using Excel and VBA (Wiley Finance)

In-cell Excel charts

17 Aug 2006 | comments

Juice analytics has some Excel graphing tips. You can make charts like below without using charts, using just text.

These are useful because the charts are aligned with the data.

Excel Gantt charts using just text

Excel bar charts using just text

I once used a similar technique to display people's staffing position. The sheet below lists people, projects they're on and how long they'll be on. The coloured cells to the right are a calendar display of the same stuff. Makes it easy to read.

Excel Staffing Plan without using charts

The trick is to place each week for each person as a thin cell, like below. Then the cell is populated with a formula that makes it 0 or 1 depending on whether the person is available that week or not. (The blue row #2 stores the start date of the week, and I compare this with the end date of each person's project.)

Excel Staffing Plan - formula

And then, you can turn on conditional formatting.

Excel Staffing Plan - conditional formatting

Excel - Make your model visually obvious

24 Dec 2005 | comments
Rule #4: Make your model visually obvious.

After years of creating Excel models with lots of inputs and lots of outputs, I've learnt two things.

Usually, only ONE input parameter matters. Think of this as being the constraint in the Theory of Constraints, or the principal component in factor analysis. You want your model to communicate the impact of the ONE parameter, and get a decision based on that. Keep the rest at their best default value.

Usually, only ONE output function matters. This is either a single number, or at most, a visually obvious function.

For example, I was working on creating an offshore test centre. The question was: what test activities should we outsource? I made a huge model evaluating 1,200 activities. A very elegant model. And totally incomprehensible. The issue really was simple. We could not recruit too fast. But at the same time, the more we offshore, the more the cost savings. So I built a summary sheet that showed the impact of one parameter (speed of ramp-up) on one function (offshore profile).
Offshore ramp-up profile
The fixed blue line shows the number of people required. The slider on the right ranges from "conservative ramp-up" to "aggressive ramp-up". The green line shows how many resources will be offshore. In the conservative ramp-up, the recruitment rate is very manageable, but the saving is negligible, since hardly anyone is offshore. The aggressive ramp-up calls for an unmanageable ramp-up rate.

We showed the management this tradeoff. They said, "We can recruit 30 people next year". That implied the moderate ramp-up scenario, and a cost saving of 27%. Total time spent in making decision: 2 minutes.

Creating a slider-based model is quite useful. You can download a US Treasury yields example to see how this is done. As you move the slider, the yield curve moves over time, showing how it has fluctuated. The trick is to:
  1. build the entire model based on a single cell. Cell H1 in this example acts as the index to the dates.
  2. create a slider. Go to View - Toolbars - Control Toolbox and add a slider.
  3. and link the slider to the cell. Right-click on the cell in design mode, select View Code, and type Range("H1").Value = ScrollBar1.Value in the Scrollbar change event.

Excel - Avoid manual labour 5

21 Dec 2005 | comments

A few tips, without getting into the details. Array formulas let you perform any calculation on an entire list, and get the result in a single formula. Database functions like DSUM can manipulate tables of data very flexibly. Using these and Pivot tables, you can do anything with any tabular data.

Excel - Avoid manual labour 4

21 Dec 2005 | comments
Debugging in Excel is another time consuming task. 80% of the trouble is identifying the problem (Error? What error? Where?) as opposed to fixing it (Why's THAT cell showing THAT?). Most of my time is spent chasing three kinds of errors: wrong reference (leading to a #N/A or #REF!), wrong data input, or wrong formula.

Wrong references are easy to spot. You'll see a #N/A or a #REF! sticking out. But on large sheets, even that's tough to spot. I always have a SUM (or some kind of total) function that covers EVERY cell in EVERY table, even if I don't need that information. If ther SUM shows a #N/A or #REF!, I can use Trace Error (Alt-T-U-E) to see where the problem is coming from.

If you know there's a wrong reference in a cell (say A1), and want to ignore it, use a new cell with the formula =IF(ISERROR(A1),0,A1). You can substitute an entire row, column or table this way.

Wrong data input is best avoided upfront. Before I hand my Excel sheets over for large scale data entry, I do three things:
  1. Let them enter data only in input areas. Unprotect the cells the user can enter data in (Ctrl-1, Protection, remove the tick against 'Locked'), and protect the whole sheet (Alt-T-P-P).
  2. Validate the data. Turn on data validation (Alt-D-L) on all editable cells, and specify the validation criteria.
  3. Make it easy to spot errors. If there are percentages that should add to 100%, show the total in a cell that turns green if the total is 100%, and red otherwise. (Use conditional formatting - Alt-O-D). If data about 20 people must be entered, show the number of people about whom data is entered, and mark it red until it's 20. Make sure all criteria are captured. When the spreadsheet is filled, it should be impossible to make errors.
Wrong formulae (like + instead of -) are tough to spot. The best way to check for this is to do the same calculation in different ways, and compare the results. Whenever I create complex tables, I always have an error row at the bottom. I compare the totals on the table with the totals calculated in a simpler way and check the difference. This warns me when I miss out some elements, or double count something, in the table.

Excel - Avoid manual labour 3

16 Dec 2005 | comments
A corollary of Rule 3: Never type the same formula twice. Design the formula so that if you cut and paste it elsewhere, it works correctly. The $ symbol and the F4 key for cell references help in 90% of the cases. For complex requirements and large data, 5 functions come in handy: INDIRECT, OFFSET, ADDRESS, ROW and COLUMN.

I once did a survey, and had data spread across 300 sheets (same format on all sheets). I needed cell D3 across all sheets in a column, to summarise the results. The image explains what I did.
Excel snapshot

INDIRECT returns the value of a cell. INDIRECT("Sheet2!D3") is the value of cell D3 in Sheet2. And INDIRECT(CONCATENATE(A2,"!D3")) will give you the value of cell D3 in whatever sheet A2 specifies! I created a list of sheet names in column A, and column B had "D3" in each of those sheets. In effect, INDIRECT can transpose sheets into columns.

Getting a list of sheet names on to a column is tough, however. If your sheets are sequentially numbered, this image shows a trick that may help.
Excel snapshot

If you need multiple cells from a sheet, say D3:Z3, use the ADDRESS, ROW and COLUMN functions. ROW(D3) returns 3, and COLUMN(D3) returns 4 -- the respective row and column. If you copy ROW(D3) to multiple rows, you will see ROW(D3), ROW(D4), ROW(D5), ... which are 3, 4, 5 respectively. Similarly for COLUMN. It's a useful way of linking values to position.
Excel snapshot
ADDRESS does the opposite. ADDRESS(ROW(D3),COLUMN(D3)) = ADDRESS(3,4) = "$D$3". ADDRESS(3,4,1,1,"Sheet2") returns "Sheet2!$D$3". (See help for the ,1,1 in the middle, and just put it in always.) To cells D3:Z3 from all the sheets, copy the formula INDIRECT(ADDRESS(3,COLUMN(D3),1,1,$A2)) to the entire range. The INDIRECT, ADDRESS, ROW/COLUMN combination can slice contiguous data across sheets in any way you want.

Another useful function is OFFSET. OFFSET(D3,2,1) returns the value in cell E5. It shifts the reference D3 down by 2 rows and right by 1 column. OFFSET can be used instead of the INDIRECT and ADDRESS when multiple sheets are not involved. OFFSET can also return a range. OFFSET(D3,0,0,2,2) returns the range D3:E4, which is the 2x2 range starting from 0,0. So SUM(OFFSET(D3,0,0,2,2)) is the same as SUM(D3:E4). With OFFSET, you can specify a range with variable position and variable size (which you can't with $ references).

Once, we were modelling a leasing company's accounts. (Warning: this is a complex example.) We knew the volume of loans they would disburse over the next 3 years. The monthly interest rate is, say, 1%. What would be their interest income every month? Well, it's not just 1% of what they've lent out. Customers pay back in equal monthly installments (EMIs). The EMI includes the principal and the interest. Initially, the EMI has a large interest component and very little principal, because there's a lot left to repay. Towards the end, the balance dies down and so does the interest; it's mainly the balance principal that's being repaid. The interest income is not the same every month even for a single lease.

The calculation is conceptually simple. The IPMT function tells you the monthly interest each month. Let's say all leases are for 36 months. So, to calculate the March interest income, take the January disbursals and multiply it by the third month interest component: IPMT(1%,3,36,-1). Take the Feb disbursals and multiply it by IPMT(1%,2,36,-1). Take the March disbursals and multiply it by IPMT(1%,1,36,-1). And add them up. For April, you'd add 4 terms. And so on. Mathematicians call this a convolution. It's like a SUMPRODUCT of a series with another series in reverse.

Excel snapshot
Cell E4 on the image alongside does exactly that for month 3 (March). There are 5 columns:
A: Month
B: Amount disbursed that month
C: Months in reverse
D: Interest component for month in reverse
E: Interest income for month
E4 is the sumproduct of B2 to B4 (the Jan, Feb and March disbursals) with something else -- an OFFSET. The offset says, from D1, move down C4 (34) rows and select A4 (3) cells further down. This has the interest components for the first, second and third months in reverse. So, the disbursal for Jan is multiplied with the 3rd month's interest, Feb with the 2nd month's interest, and Mar with the 1st month's interest. That's exactly what we wanted.

It may look complex. But remember: you have to type this complex formula only once, not 36 times. (And in my case, I had 18 product types.) Also, you're less likely to make typing errors when cutting and pasting. So this saves you debugging time as well.

Excel - Avoid manual labour 2

09 Dec 2005 | comments
Rule #3: Avoid manual labour (continued)

Reconciling data is where I spend most of my time on Excel. Say you have a list of branches by city from 2 banks. You want to know where both banks have branches. Excel doesn't know that Kolkata is Calcutta. There are 500 cities, and you have 30 minutes.
Excel snapshot

Use VLOOKUP for a start. If Bank A's cities are in column A (say 2-500) and Bank B's cities in column B (say 2-400), in C2 type VLOOKUP(A2, B$2:B$400, 1, 0) (read Excel help -- all I'll say is, don't miss out the 0 at the end: otherwise you get approximate match, and that's not good). Copy the formula to down to C500. Similarly, in D2 type VLOOKUP(B2, A$2:A$500, 1, 0). Copy the formula down to D400.
Excel snapshot

You'll see the #N/A where there's no match. #N/A in Column C means Bank A has a branch here, but Bank B does not. Column D has the converse. But we're not done yet. There could be spelling mistakes. Using two VLOOKUPs simplifies that problem considerably. We just need to match the cities having #N/A on both lists to check for alternate spellings of cities -- which is a lot less work! So prepare a separate list: unmatched cities from Bank A, and unmatched cities from Bank B. (See the section on removing unwanted rows to simplify this.)
Excel snapshot

Sort both the lists while remembering the original order. You'll want to remember the original order often -- so just add a column, number it sequentially (1,2,3... use Alt-E-I-S), and sort the city names along with the numbers. When you want to get back the original order, just sort by the numbers again. To avoid distraction, you can move or hide these numbers. Now, you have a sorted list of unmatched cities. Notice that you can visually match many of these cities. There's nothing easier to search (visually) than a sorted list.

Finally, when you've mapped all your columns, the ones that are remaining are the ones where there is no overlap.

Excel - Avoid manual labour

17 Nov 2005 | comments
Rule #3: Avoid manual labour. Use Excel to automate the task.

I use Excel's formulas to speed up repetitous tasks. These techniques are powerful, meaning, you can do a lot with a little, but can have unforeseen consequences.

Excel can find and replace formulas. If you had hardcoded formulas and wanted to change =B1*3.14 to =B1*3.1416 across all rows, just find "*3.14" and replace it with "*3.1416". Find and replace works in formulas. This is very powerful. You can use it to change the source (e.g. change the source from column B to C by finding "=B" and replacing with "=C") or even the formula (find and replace "SUM(" with "SUBTOTAL(9,").

You can also search and replace for errors (like #N/A, #REF, etc). In the Find dialog options, select "Values" under the Look in" option. To replace these cells, copy and paste the cells by value (Ctrl-C, go where you want to paste, Alt-E-S-V-Enter). Now you can search and replace #N/A just like any other value.

Find external links. If you have links to other Excel files, and one of them is missing, you'll get an error saying "This workbook contains links ..." It's annoying, and difficult to trace the source. But since links to external files have the formula =Path\[file]Sheet!Cell, just search for "[" across sheets (you can search across sheets using the Options button).

Format based on value. Conditional formatting (Alt-O-D) accepts formulas. You can set a cell's background to red, yellow or green if it's value is low, medium or high. Pick a cell, say D3. In conditional formatting, select "Formula is" instead of "Cell value is", and type "=D3<10". Set the format to a red background. Copy the formatting to all the cells (use the Format Painter button, or Edit-Paste Special-Formats). All cells less than 10 will have a red background. This is powerful because you can use any formula based on any cell. For example, you could pick the conditional formatting formula "=$C3<10" for the cell D3. The cell becomes red if the cell to its left is less than 10. Best of all, you can say "=$C3<$A$1". As you change the value in cell A1, the colours change automatically. Since you can copy and paste the formatting alone (Edit - Paste Special - Format), you can set the conditional format in one cell, and copy & paste it across any selection.

Remove unwanted rows. Sometimes, when you import data, you have empty rows, or errors or whatever. To delete empty rows, select the data, go to Data - Filter - Autofilter. Click on one of the arrow buttons, and select the "(Blank)". This will display all the blank rows. Select them all and delete them. If you want to delete rows with errors, click the arrow button, select whichever values are errors, and delete those rows. If you want to delete rows based on some other criteria, create a new column that shows TRUE or FALSE based on the criteria, do the Autofilter, and delete the rows.

An alternative to Autofilter is to sort the data based on the column you want. All blank rows (or errors, etc) will be grouped together, and you can delete them at one shot.

Excel - Never type in data

08 Nov 2005 | comments
Rule #2: Never type in data in Excel.

You rarely spend time creating voluminous data. Usually, you're just processing it (copying, transforming, whatever).

Sometimes data is on a web page -- typically tables. To copy such data, open the page in Internet Explorer and paste it in Excel. You won't like the formatting. So copy the cells you just pasted, go to a different sheet, and Edit-Paste Special just the values (Alt-E-S-V-Enter).

Sometimes data is on a text file. You can open text files directly in Excel. Each line becomes a row. You can split lines into columns if there is a "delimiter" between any two cells. Just load a text file, select all the rows, and play with the Data - Text to Columns menu (Alt-D-E).

Sometimes, data is on a PDF file. Usually, such data is in a table. If you have Adobe Reader, tough luck. Just select and copy the table, paste it into Notepad, manually format it (painful), copy again from Notepad and paste in Excel. If you have Adobe Acrobat, it's slightly better. You can use the "Select Column" tool to select and copy entire columns of the table in one shot.

Sometimes, data is on paper. Scanner often come with an optical character recognition (OCR) software. If not, Microsoft Office 2003 comes with a Microsoft Office Document Imaging tool has OCR. Just scan the image, open it in the Microsoft Document Imaging tool, go to the Tools - Recognize Text Using OCR... menu, and pray.

After all this importing, the data is never "clean". Errors due to unintended delimiters, extraneous blank lines, etc are fairly frequent. I'll talk about how to manage this when discussing Rule #3: Automate the task

Excel - Never use the mouse

01 Nov 2005 | comments
I spend a lot of time building models on Excel. I have 4 rules that help me get things done fast.
Rule #1. Never use the mouse. The keyboard is much faster.
Rule #2: Never type in data. You can always import it.
Rule #3: Avoid manual labour. Use Excel to automate the task.
Rule #4: Make your data visually obvious.

Let's look at Rule #1: Never use the mouse.

Using the keyboard can be 10 times faster than the mouse. It takes time to move one hand from the keyboard to the mouse, locate the item you want to click at, move the mouse there, adjust it finely so it's pointing at the exact spot, and then click it. For example, to insert text without formatting, I'd just go Alt-E, S, enter. It takes half a second. It took me 5 seconds with a mouse. (I timed 10 continuous attempts in both cases.)

A factor of 10 speed advantage like that is good for two reasons: it saves you time, and it doesn't distract you from what you're doing (provided the keyboard shortcuts have become a habit.)

For newbies: To use menus using keyboard shortcuts, first go to Start - Settings - Control Panel - Display - Appearance tab - Effects button - Hide underlined letters for keyboard navigation until I press the Alt key. Make sure it is turned off. To use a menu, let's say "Insert - Row". look for the underlined letter on the menu bar (the "I" on "Insert"), press Alt and the underlined letter (Alt-I in this case), and look for the underlined letter on the next menu item ("R" on the "Rows" in this case) and press that letter. So, Alt-I-R is the shortcut to insert a row. Now, just practice Alt-I-R, Alt-I-R, Alt-I-R repeatedly.

Shortcuts next to the menu are quicker, where they exist. For example, the Ctrl-C next to the Edit-Copy menu.

Apart from the arrow keys, Ctrl-S, Ctrl-X, Ctrl-C, Ctrl-V, the keys I use frequently are:
F2: edit the current cell
F4: repeat the last action (very useful)
Ctrl-Z, Ctrl-Y: Undo, Redo
Ctrl-1: Format cells
Ctrl-Shift-down arrow: Select all filled cells below selection (also works with other arrow keys)
Ctrl-PgUp/PgDn: Shift between tabs
Ctrl-Home/End: Go to top-left, or bottom-right of the sheet (or cell, if you're editing a cell)
Ctrl-`: Show formulae
Shift-Space, Ctrl-Space: Select row, select column
Alt-Enter: To create a new line while you're typing in a cell
Alt-E-I-S-Enter: Select a set of cells and fill a continuous series of numbers in it
Alt-E-S-Enter: Paste unformatted text
Alt-D-G-G: Group a set of rows (use Alt-D-G-U for ungroup)
Alt-D-G-S: Show a collapsed group (use Alt-D-G-H to collapse a group)
Alt-O-D: Conditional formatting
Alt-F8: Macros
S Anand, Infosys Consulting, London UK. +44 7957 440 260