Automating Internet Explorer with jQuery
Most of my screen-scraping so far has been through Perl (typically WWW::Mechanize). The big problem is that it doesn't support Javascript, which can often be an issue: The content may be Javascript-based. For example, Amazon.com shows the bestseller book list only if you have Javascript enabled. So if you're scraping the Amazon main page for the books bestseller list, you won't get it from the static HTML. The navigation may require Javascript. Instead of links or buttons in forms, you might have Javascript functions. Many pages use these, and not all of them degrade gracefully into HTML. (Try using Google Video without Javascript.) The login page uses Javascript. It creates some crazy session ID, and you need Javascript to reproduce what it does. You might be testing a Javascript-based web-page. This was my main problem: how do I automate testing my pages, given that I make a lot of mistakes? There are many approaches to overcoming this. The easiest is to use Win32::IE::Mechanize, which uses Internet Explorer in the background to actually load the page and do the scraping. It's a bit slower than scraping just the HTML, but it'll get the job done. ...