Web lookup using Google Spreadsheets

I’d written earlier about Web lookup in Excel. I showed an example how you could create a movie wishlist that showed the links to the torrents from Mininova.

You can do that even easier on Google Spreadsheets. It has 4 functions that let you import external data:

  • =importData(“URL of CSV or TSV file”).
    Imports a comma-separated or tab-separated file.
  • =importFeed(URL).vLets you import any Atom or RSS feed.
  • =importHtml(URL, “list” | “table”, index).
    Imports a table or list from any web page.
  • =importXML(“URL”,”query”).
    Imports anything from any web page using XPath.

Firstly, you can see straight off why it’s easy to view RSS feeds in Google Spreadsheets. Just use the importFeed function straight away. So, for example, if I wanted to track all 8GB iPods on Google Base, I can import its feed in Google Spreadsheets.

Google Spreadsheets ImportFeed

This automatically creates a list of the latest 8GB iPods.

Incidentally, the “Price” column doesn’t appear automatically. It’s a part of the description. But it’s quite easy to get the price using the standard Excel functions. Let’s say the description is in cell C1. =MID(C1, FIND("Price", C1), 20) gets you the 20 characters starting from “Price”. Then you can sort and play around as usual.

The other powerful thing about Google Spreadsheets is the CONTINUE function. The importFeed function creates a multi-dimensional array. You can extract any cell from the array (for example, row 3, column 2 from cell C1) using CONTINUE(C1, 3, 2). So you can just pick up the title and description, or only alternate rows, or put all rows and columns in a single column — whatever.

Google Spreadsheets CONTINUE

The most versatile of the import functions is the importXML function. It lets you import any URL (including an RSS feed), filtering only the XPath you need. As I mentioned earlier, you can scrape any site using XPath.

For example, =importXML("http://www.imdb.com/chart/top", "//table//table//table//a") imports the top 250 movies from the IMDb Top 250. the second parameter says, get all links (a) inside a table inside a table inside a table. This populates a list with the entire Top 250.

Google Spreadsheets - ImportXML

Now, against each of these, we could get a feed of Mininova’s torrents. Mininova’s RSS URL is http://www.mininova.org/rss/search_string. So, in cell B1, I can get a torrent for the cell A1 (The Godfather) using the importFeed function. (Note: you need to replace spaces with a + symbol. These functions don’t like invalid URLs.).

Google Spreadsheets - Import Mininova Feed

Just copy this formula down to get a torrent against each of the IMDb Top 250 movies!

Check out the sheet I’ve created. (You need a Google account to see the sheet. If you don’t want have one, you can view the sheet.)


Now, that’s still not the best of it. You can extract this file as an RSS feed! Google lets you publish your sheets as HTML, PDF, Text, XLS, etc. and RSS and Atom are included as well. Here’s the RSS feed for my sheet above.

Think about it. We now have an application that sucks in data from a web page, does a web-based vlookup on another service, and returns the results as an RSS feed!

There are only two catches to this. The first is that Google has restricted us to 50 import functions per sheet. So you can’t really have the IMDb Top 250 populated here — only the top 49. The second is that the spreadsheet updates only when you open it again. So it’s not really a dynamically updating feed. You need to open the spreadsheet to refresh it.

But if you really wanted these things, there’s always Yahoo! Pipes.

  1. Gregory Dillon says:

    Hey, thanks. Your page was the most helpful of all. In the example with the movie names, how did you figure out that it was a table x3. or table within table within table.

  2. S Anand says:

    Looks like the IMDb top 250 page format has changed. =importXML(“http://www.imdb.com/chart/top”, “//div[@id=””main””]//table//font/a”) seems to work now.

  3. Sintl says:

    Hi

    Is there a possibility to get selective text from webpage by using import url function.

    =ImportHtml(URL; “list” | “table”; index)

    I want to pull following text Updated Sunday, December 19, 2010 from a html page

    Thnx

  4. akc says:

    Hey

    Very Helpful website it is……….Please help me

    I wan to import data from (http://dsebd.org/latest_share_price_scroll_l.php ) . to google spreadsheet . I wan to import share price .

    Now tell me which way i will use. I m failed

  5. Siamak says:

    I have an excel file with the name of about 300 businesses that I want to look up their phone numbers or websites. Any idea how I can automate this, rather than type each entry in Google?

    Thanks