Scraping RSS feeds using XPath
If a site doesn't have an RSS feed, your simplest option is to use Page2Rss, which gives a feed of what's changed on a page. My needs, sometimes, are a bit more specific. For example, I want to track new movies on the IMDb Top 250. They don't offer a feed. I don't want to track all the other junk on that page. Just the top 250. There's a standard called XPath. It can be used to search in an HTML document in a pretty straightforward way. Here are some examples: ...