I'd written earlier about Web lookup in Excel. I showed an example how you could create a movie wishlist that showed the links to the torrents from Mininova.
You can do that even easier on Google Spreadsheets. It has 4 functions that let you import external data:
Firstly, you can see straight off why it's easy to view RSS feeds in Google Spreadsheets. Just use the importFeed function straight away. So, for example, if I wanted to track all 8GB iPods on Google Base, I can import its feed in Google Spreadsheets.
This automatically creates a list of the latest 8GB iPods.
Incidentally, the "Price" column doesn't appear automatically. It's a part of the description. But it's quite easy to get the price using the standard Excel functions. Let's say the description is in cell C1. =MID(C1, FIND("Price", C1), 20) gets you the 20 characters starting from "Price". Then you can sort and play around as usual.
The other powerful thing about Google Spreadsheets is the CONTINUE function. The importFeed function creates a multi-dimensional array. You can extract any cell from the array (for example, row 3, column 2 from cell C1) using CONTINUE(C1, 3, 2). So you can just pick up the title and description, or only alternate rows, or put all rows and columns in a single column -- whatever.
The most versatile of the import functions is the importXML function. It lets you import any URL (including an RSS feed), filtering only the XPath you need. As I mentioned earlier, you can scrape any site using XPath.
For example, =importXML("http://www.imdb.com/chart/top", "//table//table//table//a") imports the top 250 movies from the IMDb Top 250. the second parameter says, get all links (a) inside a table inside a table inside a table. This populates a list with the entire Top 250.
Now, against each of these, we could get a feed of Mininova's torrents. Mininova's RSS URL is http://www.mininova.org/rss/search_string. So, in cell B1, I can get a torrent for the cell A1 (The Godfather) using the importFeed function. (Note: you need to replace spaces with a + symbol. These functions don't like invalid URLs.).
Just copy this formula down to get a torrent against each of the IMDb Top 250 movies!
Check out the sheet I've created. (You need a Google account to see the sheet. If you don't want have one, you can view the sheet.)
Now, that's still not the best of it. You can extract this file as an RSS feed! Google lets you publish your sheets as HTML, PDF, Text, XLS, etc. and RSS and Atom are included as well. Here's the RSS feed for my sheet above.
Think about it. We now have an application that sucks in data from a web page, does a web-based vlookup on another service, and returns the results as an RSS feed!
There are only two catches to this. The first is that Google has restricted us to 50 import functions per sheet. So you can't really have the IMDb Top 250 populated here -- only the top 49. The second is that the spreadsheet updates only when you open it again. So it's not really a dynamically updating feed. You need to open the spreadsheet to refresh it.
But if you really wanted these things, there's always Yahoo! Pipes.
Take a look at the Excel screenshot below.
Yes, that's right. I have a user-defined function called AMAZONPRICE. And it returns these cameras' prices directly from Amazon.com. (Given the category and some keywords, it returns the price of the bestselling item on Amazon.com.)
Here's the code behind the function.
Function AmazonPrice(index As String, keywords As String) As String
Dim xDoc As MSXML2.DOMDocument30
Set xDoc = New MSXML2.DOMDocument30
xDoc.async = False
If xDoc.Load("http://ecs.amazonaws.com/onca/xml?Service=AWSECommerceService" _
"&Version=2005-03-23&Operation=ItemSearch&SubscriptionId=03SDGQDFEB455W53SB82" _
"&AssociateTag=sanand-20&MinimumPrice=10000&ResponseGroup=OfferSummary,Small" _
"&Sort=salesrank&SearchIndex=" & index & "&Keywords=" & keywords) Then
xDoc.setProperty "SelectionLanguage", "XPath"
xDoc.setProperty "SelectionNamespaces", _
"xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""
AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text
End If
End Function
This is how it all started...
Flickr has a camera finder that shows the most popular cameras in the Flickr community.
I love comparing gadgets, I'd been doing some research around these cameras, and the Fuji series (because I own a Fuji Finepix S5600). I'd normally make a spreadsheet that compares these cameras on various parameters, including price.
Since I believe in never typing in data, wondered if there was a way to get the prices in automatically...
Two things made this possible.
The first task is to get the XML feed for a product you want. Amazon lets you do that through by typing in a URL. The best way to construct the URL is through AWSZone. I picked the US ItemSearch method, which searches for a title or keyword within a category, and returns the matches. The feed for the Canon EOS Digital Rebel XT, based on this, would be at:
http://ecs.amazonaws.com/onca/xml? Service=AWSECommerceService& Version=2005-03-23& Operation=ItemSearch& SubscriptionId=0525E2PQ81DD7ZTWTK82& SearchIndex=Electronics& Keywords=Canon EOS Digital Rebel XT& Sort=salesrank& ResponseGroup=Offers,Small
(You really need to replace the Subscription ID with your own.)
If you retrieved this URL, you have an XML file containing the details of all Canon EOS Digital Rebel XTs, sorted by sales rank.
To load this in Excel, you need to create a UDF in Visual Basic. First, go to Tools - References and enable Microsoft XML, v3.0 or v4.0. Now, to load an XML document, do this:
Dim xDoc As MSXML2.DOMDocument30 Set xDoc = New MSXML2.DOMDocument30 xDoc.async = False xDoc.Load(url)
If the load succeeds, then you can extract the information fairly easily, using XPath.
xDoc.setProperty "SelectionLanguage", "XPath" xDoc.setProperty "SelectionNamespaces", _ "xmlns:a=""http://webservices.amazon.com/AWSECommerceService/2005-03-23"""
The first line says we'll be searching using XPath. The second line is a workaround to support default namespaces. (Don't worry about it. I don't quite get it either.)
Finally, you get the price from the XML tree. In this case, it's under ItemSearchResponse/Items/Item/OfferSummary/LowestNewPrice/Amount, and it's in cents.
AmazonPrice = xDoc.selectSingleNode("/a:ItemSearchResponse//a:Amount").Text
That's it! Now that this function is defined, just pass it the category and keywords, and you have the price of the first matching product. You can retrieve any other information about products as well -- like sales rank, weight, ratings, reviews, etc.
Here's the spreadsheet for you to play around with.
The technique of Web lookups in Excel I described yesterday is very versatile. I will be running through some of the practical uses it can be put to over the next few days
TO generalise things beyond just getting the Amazon price, I created a user-defined function called XPATH. It takes two parameters:
URL of the XML feed to read
Search XPath list string (separated by spaces)
This function can be used to extract information out of any XML file on the Web and get it out as a table. For example, if you wanted to watch the Top 10 movies on the IMDb Top 250, and were looking for torrents, an RSS feed is available from mininova. The URL http://www.mininova.org/rss/movie_name/4 gives you an RSS file matching all movies with "movie_name". From this, we need to extract the <item><title> and <item><link> elements. That's represented by "//item title link" on my search string.
The formula becomes XPath2( "http://www.mininova.org/rss/"&A2&"/4", "//item title link"). The result is a 2-dimensional array returning individual items in rows, and the columns are title and link. Pulling it all together, you can get the sheet above.
All of this could be done using a command-line program. Excel has one huge advantage though. It's one of the most powerful user-interfaces. Increasingly, I'm beginning to rely on just two user interfaces for almost any task. One is the browser, and the other is Excel. With Excel, I could have a sheet that has my movie wishlist (which changes often), and add check to see if the torrent exists. Every time I add a bunch of movies to the wishlist, it's just a matter of copying the formula down. No need to visit a torrent search engine and typing each movie in, one by one.
Another example. Someone suggests 10 movies to watch. I'd rather watch the ones with a higher IMDb rating. Again, enter the Web lookup. Type in the movie names. Use a function like this to look up the rating on IMDb, and sort by the rating.
The possibilities are endless.
Many languages have functions to process lists (array). These functions usually return a list, so you can pass that to another list function. This chaining of functions is really powerful.
UNIX provides this sort of chaining capability. If I had a cities (with some repetitions) and I wanted to find out how many started with the letter 'A', I'd just type:
cat cities | sort | uniq | grep "^A" | wc
cat: types the cities.
sort: sorts the cities alphabetically.
uniq: finds unique cities (works only if sorted - that's why we had to sort the list).
grep: filters the cities. Only allows cities beginning with A.
wc: word count
To do this on Excel, the only way is to
But ideally, I'd like a 1-line formula like
=LENGTH(UNIQUE(GREP("^A", range)))
Excel doesn't provide these functions by default, but you can add them as user defined functions. Doing this lets you condense several cells into one. Instead of having to copy all your data into a set of unique values, and then adding a column for the first cell, the entire operation can be condensed into one formula.
I consider the following functions the a basic set for list processing.
I created these UDFs. You can download the functions and play with them. Below are some tasks that you can do with them, that are difficult otherwise.
=INDEX(REVERSE(STRSPLIT("\", filename)), 1)=LENGTH(UNIQUE(range))
=LENGTH(INTERSECTION(range1, range2))
=LENGTH(STRSPLIT(" ", string))=SMALL(UNIQUE(range), 5)
=COUNT(Range1)+COUNT(Range2) - COUNT(INTERSECTION(Range1,Range2))
=STRJOIN(",",UNION(DIFFERENCE(Range1,Range2),
DIFFERENCE(Range2,Range1)))=LENGTH(Range)-LENGTH(UNIQUE(Range))
=MVLOOKUP(A1,Lookup_Range,Return_Range)
=COUNT(UNIQUE(MVLOOKUP(A1,Lookup_Range,Return_Range)
This is a small sample. The power of list processing is phenomenal, especially when combined with array formulas. Download these macros and play with them!
Firstly, believe that Excel can do anything.
It's true. Excel is a functional programming language. Not with the same power as some programming languages, maybe. But power is just a way of making a little go a long way (power = succinctness, according to Paul Graham). And Fred Brooks, in No Silver Bullet, argues:
I believe the single most powerful software-productivity strategy for many organizations today is to equip the computer-naive intellectual workers who are on the firing line with personal computers and good generalized writing, drawing, file, and spreadsheet programs and then to turn them loose.
Next, believe that Excel probably already has the function you're looking for. Excel 2003 has over 300 functions. Presumably these are the most popular functions people use. Fair chance your function is one of them. Excellent chance that you don't know about it.
So first, search through Excel's help. I'll admit, it's not the best way to do it. I've learnt a trick to help me out. I search for a function that does similar stuff, and see the "See Also" section. Let me give you an example.
Once, we were modelling the revenues of a leasing company. Their finance manager had prepared a model to calculate the interest accruing from a lease. We needed the interest across several leases. With his model, we'd have to create 1 sheet for each lease. We were going to model thousands of leases. Clearly impossible.
Since I knew PMT could calculate the EMI, I checked the help on PMT, clicked the "See Also" link, and found a bunch of related functions. This, among others, lists the IPMT function, which can be used to calculate the interest at a single stroke, and a bunch of other useful functions. (That's how I first learnt about IPMT.
But the really useful link is the "Financial functions" one, which lists every single financial function in Excel. That's worth going through in detail. In fact, there are many such categories that are useful: database functions, information functions, lookup and reference functions and text functions have some unexplored gems. Check out the List of worksheet functions on Excel.
The other option, of course, is to search Google for what you're looking for. I've pulled together a custom search engine for Excel sites. Try it out below.
I mentioned that it's possible to create variwides using X-Y scatter plots. The video below shows how.
If you're not able to access the video (I remember Google Video was not accessible from India), please let me know in the comments box below.
Sometimes you want to check the colour of a cell, or whether a cell is bold. This can be easily done with user-defined functions (UDFs). (To create a UDF, press Alt-F11, Alt-I-M, and type the code below.)
You can use ISBOLD(cell) to check if a cell is bold, and BGCOLOR(cell) to get the background colour of the cell. This lets you selectively process bold or shaded cells. The two examples below show how you can add only the cells in bold, or only the shaded cells.
Rather than use an additional column for ISBOLD or BGCOLOR, you can use an array formula, like below. (Remember to press Ctrl-Shift-Enter instead of Enter after typing this formula)
But first, you need to change the UDF to return an array rather than a single value. So IsBold will have to be modified as shown.
User-defined functions that process arrays can be very powerful. It can bring the full power of functional programming into Excel. I'll describe some next week.
PS: In case you're wondering, Application.Volatile tells Excel to recalculate the function every time the worksheet is recalculated. When a cell is made bold, or shaded, the value of the cell doesn't change. So Excel doesn't recalculate any formulas. You'll have to manually press F9 every time to recalculate these cells. And Application.Volatile ensures that when you press F9, these functions are recalculated.
Excel lets you create your own functions. If you wanted to create a function that returned the distance between two points (x1,y1) and (x2,y2), you can create a function DIST that takes these 4 parameters, and use it as shown below.
To create such a function,
Anything you declare as a "Function" in Excel's Visual Basic automatically becomes visible in the Insert-Function dialog box under the category "User Defined" (see examples). The function is normally saved with the file. This is a good idea if you're going to distribute the file. You can also save your functions in your personal.xls file and load it on startup.
There are 3 places where I suggest using UDFs.
Let's take the first one: replace Excel errors with empty strings. Normally, you'd store the results in a cell (say A2), and have another cell with the formula =IF(ISERROR(A2),"",A2). Instead, create this function NOERROR:
Now you can enclose any Excel function inside a NOERROR() and it'll filter out the errors.
Notice that cell E2 would've had a #N/A error if you'd just used the VLOOKUP. This function also filters out #REF, #DIV/0!, #NAME? and all other errors.
BTW, you see column F displaying the formula in column E. I didn't type it out. That's another UDF.
I will talk about the other two places where you use UDFs tomorrow.
Juice analytics has some Excel graphing tips. You can make charts like below without using charts, using just text.
These are useful because the charts are aligned with the data.
I once used a similar technique to display people's staffing position. The sheet below lists people, projects they're on and how long they'll be on. The coloured cells to the right are a calendar display of the same stuff. Makes it easy to read.
The trick is to place each week for each person as a thin cell, like below. Then the cell is populated with a formula that makes it 0 or 1 depending on whether the person is available that week or not. (The blue row #2 stores the start date of the week, and I compare this with the end date of each person's project.)
And then, you can turn on conditional formatting.
A few tips, without getting into the details. Array formulas let you perform any calculation on an entire list, and get the result in a single formula. Database functions like DSUM can manipulate tables of data very flexibly. Using these and Pivot tables, you can do anything with any tabular data.