ImportHtml doesn’t auto-refresh

A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:

=importHtml("http://www.imdb.com/chart/top", "table", 1)

imports the Internet Movie Database top 250 table on to Google Spreadsheets.

Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.

There’s just one problem: it doesn’t auto update.

There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.

#!/usr/bin/env python
 
import datetime, os.path
 
print 'Content-Type: text/plain; charset=utf-8'
print ''
 
logfile = 'timenow.log'
try:    timelog = open(logfile).readlines()
except: timelog = []
timelog.append(str(datetime.datetime.now()) + '\n')
open(logfile, 'w').writelines(timelog)
print ''.join(timelog)

Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.

Pity. Guess YQL is still the best option.

  1. I’ve just started using google spreadsheets. Really impressed with how user friendly they are. I went ine expecting the worse and was very suprised.