If a site doesn’t have an RSS feed, your simplest option is to use Page2Rss, which gives a feed of what’s changed on a page.
My needs, sometimes, are a bit more specific. For example, I want to track new movies on the IMDb Top 250. They don’t offer a feed. I don’t want to track all the other junk on that page. Just the top 250.
There’s a standard called XPath. It can be used to search in an HTML document in a pretty straightforward way. Here are some examples:
|//a||Matches all <a> links|
|//p/b||Matches all <b> bold items in a <p> para. (the <b> must be immediately under the <p>)|
|//table//a||Matches all links inside a table (the links need not be immediately inside the table — anywhere inside the table works)|
You get the idea. It’s like a folder structure. / matches the a tag that’s immediately below. // matches a tag that’s somewhere below. You can play around with XPath using the Firefox XPath Checker add-on. Try it — it’s much easier to try it than to read the documentation.
The following XPath matches the IMDb Top 250 exactly.
(It’s a link inside the 3rd column in a table row in a table row in a table row.)
Now, all I need is to get something that converts that to an RSS feed. I couldn’t find anything on the Web, so I wrote my own XPath server. The URL:
When I subscribe to this URL on Google Reader, I get to know whenever there’s a new movie on the IMDb Top 250.
This gives only the names of the movies, though, and I’d like the links as well. The XPath server supports this. It accepts a root XPath, and a bunch of sub-XPaths. So you can say something like:
This says three things:
|//tr//tr//tr||Pick all rows in a row in a row|
|title->./td//a||For each row, set the title to the link text in the 3rd column|
|link->./td//a||… and the link to the link href in the 3rd column|
That provides a more satisfactory RSS feed — one that I’ve subscribed to, in fact. Another one that I track is a list of mininova top seeded movies category.
You can whiff up more complex examples. Give it a shot. Start simple, with something that works, and move up to what you need. Use XPath Checker liberally. Let me know if you have any isses. Enjoy!