Shelley Powers has weighed in on XHTML and strict parsing. At risk of getting flamed, I'll give some personal opinions. Remember: reasonable people can disagree, and I Am Not A Heretic. 
Personally, I think that it's not a clear cut thing. I think that if an XHTML page is not well-formed, it should still render using tag soup, even if it is served as application/xhtml+xml. If I'm reading someone's crappy MySpace page, I don't care about validation. 
Now, if I'm getting financial information that may have a huge significance on, say, whether I have a roof over my head next week - like a legal declaration or contract or something equally important, you bet your arse I want my computer to flash a big red light if the XML isn't well-formed. This is an important piece of context-sensitivity. 
I don't care about the MIME type, I care about the fact that there are a lot of things which really ought to be well-formed to avoid ambiguity. If I get a file containing my student loan agreement over the web, you can bet I want it 'conservatively' parsed. One misplaced apostrophe in the legal world where English is the lingua franca can cost you thousands, and online, one misplaced semi-colon or unquoted ampersand sign can also cost you thousands. XML - whatever it's shortcomings - solves this. It can quite easily be the lingua franca of the Web. 
Why don't we go further? If 'forgiveness' or 'liberal parsing' is acceptable in HTML, maybe even XML, why not JSON? Surely, not everyone gets their JSON right all the time, so shouldn't we be a little bit more tolerant about it? Let's have some smart algorithm that works out how the object should have been serialised and rearrange it for us. And as for C++, let's just let you type anything. Let's have fault-tolerant compilers. That'd make the software experience just dandy for everybody. Don't worry yourself about the fact that leaving that semi-colon off line 42 means that Carl's computer in Accounts now burps on alternate Thursdays. Error? What error? If I don't see an error, it doesn't exist! 
The problem with the discourse about XHTML is that people look at IE, Safari, Firefox and Mozilla and think "that's all there is". There is a long tail of applications which parse the web, both for human consumption - screen readers and other tools to aid accessibility - but for 'web of data' consumption too. 
A healthy platform is one where it's as easy to write a parser as it is to write a document. Parsing HTML is bloody hard work, and we have to use inelegant hacks like Tidy, Hpricot and BeautifulSoup to get data out of HTML, when we should be using XML-based methods like XPath, XQuery and XSLT which are quicker, sexier and lead to far less hair loss. 
The argument that we shouldn't bother with XHTML because there's so much tag soup out there is, frankly, a bad argument. It's a bit like saying "well, we shouldn't bother to punish criminals because there's so much crime out there". The fact that something is either popular or unpopular doesn't mean it's worth or not worth doing. 
Lowering the barrier on making tools to use content on the Web is a valuable thing, but the average web designer doesn't see it as one. Even if we don't get to a well-formed web, it's still worth striving to do so. 
Human authors are only part of the picture. It's quite easy for tool makers to spend an extra few minutes and provide XHTML. The argument about browsers is a distraction from the key part of this. Well-formed XML is good in and of itself because it makes the process of making tools to consume data easier. 
Perhaps there is another motive why the browser manufactuers (whose interests guide the W3C) are kicking their heels on XHTML adoption - because it would make competing with the currently established interests a bit easier. (Then again, as an 'XML guy', perhaps there are motives for me too...) 
When I ask for XHTML, I'm not saying that having pages break if they have a misplaced ampersand is a good thing. I'm saying that the eventual replacement of the SGML-based web with an XML-based one would be a Good Thing because every webpage could become an API and not just a document. There are going to be engineering problems involved in that (and a boatload of clueless folks in suits getting het up about stuff), but not switching to XML is putting off the inevitable. We need some blue sky thinking in the standards area, some vision of where we are going - otherwise we get stuck in the trap of minutiae. 
Comments | TrackBack 