I'm a big fan of Perl, but seem to use it for just one reason these days: reading and processing various HTML files out on the web (ie writing a spider). The LWP Perl modules makes this easy, and can honor robots.txt automatically. I've found, though, especially during development, that I don't want to access the live web page again and again as I develop the rest of the script. This is even more relevant for HTML pages that take a long time to return, or when I'm developing the script offline, or without intranet access.
My solution? Adding some HTML cache code in my object that can save a copy of the HTML locally the first time it is accessed, then automatically use that file each subsequent run.