Last month, Danny Ayers described some sparql2opml noodling to take some data out of FOAF and turn it in to OPML. It got good reception from us OPML folks - including Adam Green, Richard Edwards, James Corbett and, er, me. 
I've spent the last few weeks playing with RDF stuff, and last Monday bought Shelley Powers' book. I've also got a podcast which I recorded yesterday, which I need to post (there are complications). 
Danny is using the Web to build a pipeline for his SPARQL query and the transformation to OPML. I want to do it on the server side, which is a bit tougher. There are lots of different RDF libraries for different languages. I didn't particularly want to learn Java just to chuck some RDF around, but Jena does provide a way for people who are already using Tomcat/JSP/servlets etc. to join the RDF game. 
No, I decided to use RAP, the RDF API for PHP. Now, it's fully acronym compliant - it supports RDF/XML, N3 and GRDDL. It's not the most intuitive API around, but it all kind of works. I've cursed a little bit less with RAP than I have with the DOM (which must be the most cursed about framework around!). 
I can now do on my server everything that Danny was doing with things bouncing around between sparql.org, w3.org and his XSLT file. And more. The first test project I've done is to try and describe the semantic relationships between public institutions - namely, universities. 
I can write a whole load of RDF (either as XML or as N3) and load it in to a database. The way that I do that is that I will use oXygen on Mac to write the XML (or write triples and convert them using this converter). Then I post them as a file on my web server. Then I tell RAP to read them in to a MySQL database. It grabs the triples from them and stores them all in a table called "statements" and stores the prefix/namespace data in another table. 
Then I write a SPARQL query as basically plain text or plain text with some PHP logic added on top (so, for instance, if I want to change a variable, I can specify it in the URL). This means that instead of sending a SPARQL query as really long encoded data in the URL, I can simply point to a file which will be loaded up, read in and then executed. This makes the system quite a bit more modular. The PHP is there simply so that you can specify things like search queries. 
How am I using this in action? Well, I've written an example script. It is simply a list of colleges within a university. I've added two files to the database - one for the University of Oxford and one for the University of London (and, yes, I know that an Oxford college is different from a London college - that's something I've got on hand when designing the RDF schema which will go public - er - whenever). 
You can see the results of the Oxford and London queries by visiting those links. Warning - they are just XML, remember. You should see a list of all of the colleges ordered alphabetically with the URL for their websites. 
As Danny's XSLT shows, it's really quite easy to turn this kind of thing in to OPML for display in Grazr. 
The ease of development of this approach and the fact that each piece is interchangable makes it so that developing REST APIs is very easy. Of course, there are security issues which one has to deal with, which is why I've only made a limited set of data available (the above two links). 
Now, here's where it gets interesting - imagine if you've got lots of raw data that you want to make available in OPML format so that it can be included in to directories - this approach makes a lot of sense. You store the data as either flat-file RDF or in a relational database (RAP supports MySQL and Microsoft Access - other libraries offer different choices), and then just query the data out. OPML provides the structure and RDF provides the data that gets included within the structure. 
It means that the OPML folks can get what they want, but you don't have to specially code anything. 
The next thing one could do with SPARQL is actually use the variable names in SPARQL (in this case, ?url and ?college) with generic names. Instead of having a manual stylesheet for turning ?url in to an outline component, we define some new standard variable names - I'm thinking ?text, ?htmlUrl, ?xmlUrl, ?linkUrl and ?includeUrl (the latter being for OPML 2.0 only). Text would be required, and the use of the url names would be there instead of type. So, if you just want a type="link", you'd simply specify a ?linkUrl. If you wanted a feed, you'd bash out at least an ?xmlUrl and maybe an ?htmlUrl too. 
Stylesheets or processors would then be able to pick them up without having to have any logic in the stylesheet and churn out flat OPML files. 
This way, the whole process of producing OPML boils down to formulating a SPARQL query and pointing the results in the right direction. And formulating a SPARQL query need only be as complicated as coming up with one standard one and letting the user change a few variables. 
People who aren't XML geeks may read this and think "so what?". I symphatise. What this means (and, again, I don't claim originality - Danny Ayers has prior art on this) is that we can build applications that bridge between the RDF space and the OPML space (or the RSS space), and we can develop them relatively quickly. 
And for the mashup makers, this should mean more public data available - more APIs and the suchlike. 
I'm excited by this stuff. Hopefully soon we'll have a fair few more toys to play with!

Comments | TrackBack 