Yesterday, James asked the perfect question: 
Interesting post Tom, but can you explain further why this means that hReview won't scale. I still struggling to get my head around this SemWeb stuff!

I wrote up a long answer yesterday, but I'm still having a few little teething troubles with my laptop's power management and the response got lost. I've subsequently written up a presentation I'm going to give at BarCamp, so hopefully I'll be able to explain it better this time around. 
Microformats don't scale because you can't change the definition of them. Imagine if you were having a conversation with someone, and everytime you talked about a person, they'd say "are they male or female?" the answers could only be "male" or "female" - no scope for "I don't know, I've only spoken to them on IRC" or "they're going through a sex change operation as we speak", or "gender is only a social construct, you unenlightened cretin". 
Similarly, with hReview, the rating attribute can be only a number from zero to five (or ‘best’ or ‘worst’ which map to 5 and 0 respectively). But the way that we rate movies is far more complex than choosing a number between zero and five. 
Let’s look at some of the ways by using the soppy James Cameron drama, Titanic (1997). 
As you can see, there are many different ways of rating a movie. These ratings tell you different things. Most countries have a movie rating system that provides age ratings on movies. The classifications used by different countries are different. In the UK, the British Board of Film Classification (neĆ© British Board of Film Censors) use the following characters to rate movies: UC, U, PG, 12/12A, 15, 18, R18. There are also some films that are not rated - films which haven’t been released in this country are often unrated and require cinemas that are playing imported prints to seek permission from the local council. 
There are other ways of categorising movies, and I’ve included a few of them. Most of the above are to do with age appropriateness or censorship, but some deal with the quality of the movie - Boyo-2’s IMDB review says that the movie is worth 10 stars. The Academy Awards gave the movie eleven Oscars. What if you want to sort reviews of movies on the basis of any of these categories? 
The graph above shows the main problem with microformats - they don’t go far enough. If you want to say something for which there isn’t a microformat category for, you’re screwed. 
And if we ever want to extend microformats to cover some domain specific knowledge, we are equally stuffed. Because microformats are agreed special formats, we can’t just say “Oh, let’s add the BBFC rating to movie reviews”, because we then have to rewrite all our scripts, parsers, stylesheets, generators and so on to include the new category. Once a microformat is agreed, we’re stuck with it. That’s why they can’t scale. 
But this is where RDF comes in. We already have ways of achieving the same goal as microformats. There are a variety of ways of doing it. Firstly, there is eRDF, which uses the same design principles as microformats but allows one to embed domain-specific knowledge in (X)HTML pages. Secondly, there is GRDDL, which allows you to specify a stylesheet to turn your domain-specific class names in to RDF. Finally, there is RDFa which uses new attributes that are being added to XHTML 2 in order to specify meta-data. 
Ironically enough, the main charge against the Semantic Web is that it is trying to totalise all knowledge - that there is one ontology that specifies absolutely everything, and isn’t that ridiculous!? Yes, it is ridiculous that this myth has been perpetuated for so long by so many. The thing about the SemWeb is that you can remix categories. If you don’t like the way that something is done, you just crack open your text editor, write up a new ontology and change it. Perhaps that’s a bit too users-in-charge for the Web 2.0 crowd, who are only in to users-in-charge when they can profit from it.

It is (some of) the microformats advocates - the people who are so against totalising ontologies of everything - that are producing a totalised ontology of everything. If you try and describe a relationship to a friend using a word that isn’t one of the following - contact acquaintance friend met co-worker colleague co-resident neighbor child parent sibling spouse kin muse crush date sweetheart me - you’re breaking the XFN standard (but if you use a namespace in RDF, then you’re free to describe your relationships however you like, and we can figure out the equivalence or disequivalence of relationships as a community). 
Microformats are a good start, but, dude, domain-specific information exists. Pretending it isn’t there doesn’t help anybody. None of the above should be taken as an argument against using microformats. I’m a pragmatist, and if microformats are a way to get people to join the data web, then that’s great. When I say “trousers” and an American says “pants”, we mean the same thing, and our computers should understand that, goddamnit! 
The flipside to the Pareto distribution, that most people only want 80% of the functionality is that almost everyone has a few little somethings in the 20%. 
There are a lot of good things about microformats - the community process and simple specifications are a breath of fresh air when compared to the musty heaps of utter bullshit that many large companies turn out and call specifications (FYI: Microsoft’s Office Open XML format has a specification document that is 6,000 pages long). The focus on getting things working here and now is great. The many implementations is great. The outreach to designers and developers is fantastic. But I do feel that RDF is the way forward. And RDFa is one of the best reasons to get XHTML 2 standardised. One of the first things I’m going to be working on is a whole set of parsers that one can use to turn microformats either in to XML or RDF. 
Anyone going to BarCamp London 2 next weekend, I’m going to be giving a content-rich presentation on the Semantic Web. Hopefully, I’ll record it as an MP3, and the (link-heavy) slides will be available too. 
Update: bengee in #microformats tells me that you can specify a wider range of numerical values by using the ‘best’ and ‘worst’ classes. That’s a bloody silly oversight of mine. There are examples of this on the hReview specification. You still can’t specify other types of ratings like “PG”. 
Another update: Kevin Marks has just told me I’m wrong on IRC. Apparently, there is a solution to doing movie ratings within hReview, which is tags. Doesn’t solve the namespace issue, but it’s at least somewhat extensible. 
Tags: microformats, rdf, erdf, rdfa, semweb, hreview, barcamplondon2 

