There's been a lot of fuss over on Stack Overflow, and consequently on Metafilter and on Jeff Atwood's twitter, about people parsing HTML with regular expressions, along with the advice to never do that and tales of how Cthulhu will eat your soul. 
In general, never parsing HTML with regular expressions is good advice. That's good advice in general. 
But sometimes it isn't. I'll give an example case of when you shouldn't. You may find that it's applicable to you. 
A while back, I had over 2Gb of HTML to parse - 77,000 files. Every file was exactly the same structure. I only wanted to extract two pieces of data from each file - the contents of the h1 element and the contents of a div with the class of 'author' or something similar. 
I wrote some Ruby code to parse each page using Nokogiri or Hpricot or whatever was then the preferred HTML parsing library. But this was slow. It was taking about 4 or 5 seconds to parse each file. In general, that's pretty fast, but when you've got 77,000 to do, that's not so good. That means four days. 
I rewrote the code in Java so that it would open each file with a BufferedReader, then readLine on each line of the file, using the String startsWith method to see if it's the right line, then use regexes to extract the stuff we are interested in. I compiled and ran this code: it went from four days to about ten minutes. Which is fine because I made a goof-up in the code that I only discovered after running it - if I had only discovered that goof-up four days later, I would have been a lot more angry than if I'd discovered it after ten minutes. 
I've told this story to people, and there seems to be two possible reactions. There is the "OMG Ruby is so slow, I knew that not learning it and sticking with Java was sensible" reaction, and there's the sensible reaction - I could have re-written it in Ruby and gotten the same performance benefits by using IO rather than the XML/HTML parsing library - it just happens that I know the Java IO library better than I know the Ruby IO library. Part of what was probably taking the time in Ruby was the fact that I was constructing a large number of objects extremely quickly, but Ruby's GC is notoriously painful in a non-generational way compared to the JVM's generational GC. 
The key thing is whether or not you are working with files that are all structured in a broadly similar way. If you've got 77,000 files that are all very similar and you know exactly what you want from them, sometimes for performance, parsing it as a bag of lines and strings is much more sensible than parsing it into a DOM. These very limited circumstances really provide the exception that proves the rule. If you don't have a very good reason to be parsing XML or HTML using an XML or HTML parsing library rather than using regexes, you shouldn't be doing so. (The same is true with RDF: use the right level of abstraction - unless you are logged into the swig IRC room all day every day and know the RDF specs like the back of your hand, you should be using an RDF library not an XML library to parse RDF documents.) 
