ImportHtml doesn’t auto-refresh

A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:

=importHtml("http://www.imdb.com/chart/top", "table", 1)

imports the Internet Movie Database top 250 table on to Google Spreadsheets.

Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.

There’s just one problem: it doesn’t auto update.

There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.

#!/usr/bin/env python
 
import datetime, os.path
 
print 'Content-Type: text/plain; charset=utf-8'
print ''
 
logfile = 'timenow.log'
try:    timelog = open(logfile).readlines()
except: timelog = []
timelog.append(str(datetime.datetime.now()) + '\n')
open(logfile, 'w').writelines(timelog)
print ''.join(timelog)

Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.

Pity. Guess YQL is still the best option.

3 thoughts on “ImportHtml doesn’t auto-refresh”

  1. I am running into the same thing. Can’t get importHtml() to refresh when google spreadsheet is closed. Was thinking I could hammer force a cell value to equal the IH() function via a script but that seems awfully inelegant.

  2. so what happens when you add a parameter to the url with a value coming from a field which is set to show the current time – minute or hour ?

Leave a Comment

Your email address will not be published. Required fields are marked *