I tried to write this up yesterday and got kind of didn't get very far (I was trying to do it in the Apple Store in London which is crazy, mad busy and where coherent thought is made almost impossible by the constant din of people trying out iPod speakers). 
I want to introduce an idea I've been working on which is a simple namespace extension to OPML which I am calling so - short for semantic outlining. 
What semantic outlining does is allows one to add attributes or elements to an OPML file that describe semantic relationships. It’s intentionally a hacky language because OPML is kind of an odd format when you look at it from the perspective of semantics. 
I am going to write a parser in PHP to turn OPML with semantic additions in to a set of RDF triples or XML-serialised RDF. 
There are some native rules that I am writing to make semantic inferences from OPML. 
First of all, so:xmlUrl. This may be familiar if one has looked at OPML code. It’s the same as OPMLs xmlUrl - but it serves a different function. so:xmlUrl is a simple inference that the parser makes on outline elements with the type attribute set to “rss”. It basically asserts “this HTML URL has an XML version at this URL”. 
How will writing OPML change in order to make it semantic? 
Well, first of all, one will be able to set various parser rules in the head of the OPML file. The element will be of the sort so:parserRules and contain sub-elements that define what the parser should do. For instance, I am thinking that if there is an outline element containing the attribute type set to rss or link, then the parser ought to declare a new triple to state that the text attribute of the outline node ought to be set as the dc:title. 
Where am I going with this? Well, the first thing is to make the explication part work. This means basically defining a new element that allows one to add explicit semantic markup to existing elements. This is done by making a subnode of the outline element called so:resource. Various simple inferential rules then figure out what the resource is - basically, it looks at the parent node, figures out whether there is a URL there (looking at url, htmlUrl and xmlUrl attributes in that order) and uses that (the RDF library will give it a “genid” type resource name on the graph when it parses if it doesn’t have a URL). 
The second step is to do semi-explicit markup. This is a way of simply adding extra namespace attributes to an outline element. For instance, if one was linking to a website using a type="link", one could add a dc:language="en" on the end to declare the language of the resource one is linking to. You then declare in the head an element called so:parseNSatts with the namespace prefix in there (eg. ‘dc’) and the parser will then pull those attributes out of the document and apply them. 
The third step is to come up with a way of doing implied description from OPML. Once I get to this stage, I am going to set up a group to discuss the best ways of doing this. Looking at the OPML 2.0 specification, I think that there is a lot of value in pulling in the created attribute, and the little-used category attribute. 
I’ve written up some sample markup, but I’m not very happy with it. I’m hoping to eventually write up a schema in RELAX NG. 
The important question is whether or not what I’m doing meets the OPML 2.0 specification. As far as I can see, it does. Here is what the OPML 2.0 specification says about extending the standard: 
An OPML file may contain elements and attributes not described on this page, only if those elements are defined in a namespace, as specified by the W3C.

Do I expect software like Grazr to read the Semantic Outlining extensions? No. You guys can rest easy. If you are already parsing OPML, and you haven’t got time to parse it, just ignore all the namespace extensions that I am adding. I’ve built a few test files and found that this is basically the behaviour that Grazr exhibits. I need to test a few more test files and test them with a few more OPML readers. 
As for OPML writers - I’d really love to work with the folks who make them to implement Semantic Outlining functions in to your systems. Once I’ve got my JavaScript skills going, I’m hoping to make an outline editor in the browser to help add semantic markup to OPML. This is something I am trying to avoid though. 
Last month, Danny Ayers produced a script to go from SPARQL to OPML. With a bit of work, it will be possible soon to have OPML and all the RDF/SemWeb technology integrated in to one another. 
It is a profound fallacy to think that “SynWeb” and “SemWeb” are contradictory. With a bit of work, they can be partners. 
Perhaps I’m doing this all the wrong way. My basic design goals are simple: add a highly flexible and extensible ability to turn OPML in to RDF triples without breaking anything that OPML is already doing or going against the OPML 2.0 specification. Certain people don’t like the OPML specification, but that is not the issue. We’ve already got an ecosystem of tools that read and write OPML, and, more importantly, we’ve got users who enjoy using them. 
I’m hoping to have a functioning prototype of the parser ready in a few weeks, and at least some sample documents (for me the design process is simple: get something working, get some examples, then write a schema - it may be messy, but it’s far better than doing it the other way around - writing the schema should be the last thing to do, not the first thing). 
Tags: rdf, opml, semanticweb, semweb, outlining 


