A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:
=importHtml("http://www.imdb.com/chart/top", "table", 1)
imports the Internet Movie Database top 250 table on to Google Spreadsheets.
Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.
There’s just one problem: it doesn’t auto update.
There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.
#!/usr/bin/env python
import datetime, os.path
print 'Content-Type: text/plain; charset=utf-8'
print ''
logfile = 'timenow.log'
try: timelog = open(logfile).readlines()
except: timelog = []
timelog.append(str(datetime.datetime.now()) + '\n')
open(logfile, 'w').writelines(timelog)
print ''.join(timelog)
Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.
Pity. Guess YQL is still the best option.
Comments