ImportHtml doesn’t auto-refresh

A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:

=importHtml("http://www.imdb.com/chart/top", "table", 1)

imports the Internet Movie Database top 250 table on to Google Spreadsheets.

Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.

There’s just one problem: it doesn’t auto update.

There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.

#!/usr/bin/env python
 
import datetime, os.path
 
print 'Content-Type: text/plain; charset=utf-8'
print ''
 
logfile = 'timenow.log'
try:    timelog = open(logfile).readlines()
except: timelog = []
timelog.append(str(datetime.datetime.now()) + '\n')
open(logfile, 'w').writelines(timelog)
print ''.join(timelog)

Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.

Pity. Guess YQL is still the best option.

  1. I’ve just started using google spreadsheets. Really impressed with how user friendly they are. I went ine expecting the worse and was very suprised.

  2. Barry says:

    I am running into the same thing. Can’t get importHtml() to refresh when google spreadsheet is closed. Was thinking I could hammer force a cell value to equal the IH() function via a script but that seems awfully inelegant.

  3. alois says:

    so what happens when you add a parameter to the url with a value coming from a field which is set to show the current time – minute or hour ?