<?xml version="1.0" encoding="ISO-8859-1"?>
<opml version="2.0">
<head>
<title>24.opml</title>
<dateCreated>Mon, 24 Sep 2007 21:03:04 GMT</dateCreated>
<dateModified>Mon, 24 Sep 2007 21:03:04 GMT</dateModified>
<ownerName>Tom Morris</ownerName>
</head>
<body>
<outline text="Simple RDF Querying with Python" created="Mon, 24 Sep 2007 21:03:04 GMT"><outline text="Oh boy. RDF and Python together. Add an unhealthy dose of chocolate and hyper-paranoid military-strength public key encryption and I'm in ecstasy.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Seriously though. One of the dullest complaints I get is the &quot;RDF is sooooo hard! My poor little head will never cope!&quot; Usually, though, like with all such complaints, it's voiced in the third person. Expressing other people's ignorance is a lot easier than expressing your own. That's why, for instance, we always say &quot;How will people ever cope without religion?&quot; but never put in the key data point that the person saying it gets on quite well without religion, thank you very much. (In addition, the person bemoaning the complexity of RDF is doing so having understood far more cryptic things like getting pixel-perfect CSS in IE5).&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="The thing is that RDF isn't necessarily very complex at all. Done right, RDF can be tremendously simple, and it can also be a very simple way of doing what is otherwise complex. If you can grok the basics of what a directed graph is, you've got most of the way there. There are bits which are slightly more irritating - &lt;a href=&quot;http://www.w3.org/TR/rdf-mt/#ReifAndCont&quot;&gt;reification&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/TR/rdf-mt/#unlabel&quot;&gt;blank nodes&lt;/a&gt;, for instance.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Let's take an example. Taking a list of people on a social networking site and finding out who their friends in common are. You could do this by collecting together a list of all the people from the social networking site's API, decrypting the site-specific XML or JSON format they use and then iterating over the lot and joining them all together. Dull. You write lots of code just to perform a simple query. You have to assign them all places in some internal hierarchy-of-doom. Boring.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="In the case of Twitter, I've done a lot of it for you. I've written an XSLT transformation to take Twitter's API data and make it available as RDF/XML. You send one request to tools.opiumfield.com/twitter/[$username]/rdf and you get back an RDF file with all the stuff you need. You then load that into an internal representation and query it.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Let me walk you through some Python code that demonstrates it. It uses &lt;a href=&quot;http://www.rdflib.net&quot; rev=&quot;vote-for&quot;&gt;RDFLib&lt;/a&gt;. If you are on OS X, you should install the latest version of Python (RDFLib requires 2.4, but if you are on 10.4, you will only have 2.3.5 - run &quot;sudo fink install python-2.5&quot; and then run &quot;easy_install -U rdflib&quot; to add RDFLib).&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Once you've got the upgrade and the library, we can step through the code line-by-line and see what it does:&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;import rdflib, sys&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="This simply imports the sys module and the rdflib module.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;ts = rdflib.ConjunctiveGraph()&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="This creates a new object called 'ts' which is a ConjunctiveGraph. You know all that 'social graph' stuff that people have been waffling about? This is one of them. A graph model - a 'network'. Which is ideal, really, for a network of friends.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;querystring = &quot;&quot;&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="We are just instantiating this as a string as we are going to be appending to it in a loop in a second. God bless dynamic typing, right?&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;for i in sys.argv[1:]:&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Here, we iterate over each item in the list of arguments - except the first one.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;[tab] ts.parse(&quot;http://tools.opiumfield.com/twitter/&quot; + i + &quot;/rdf&quot;)&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Here, we load in the RDF data for each user.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;[tab] querystring = querystring + &quot;&amp;lt;http://twitter.com/&quot; + i + &quot;&amp;gt; &amp;lt;http://xmlns.com/foaf/0.1/knows&amp;gt; ?person . &quot;&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Here we construct part of what will become the WHERE clause of the query. It basically says that the query string should have added to it the triple of the username, then the foaf:knows property and finally the variable 'person'. When we run the query, it looks for all the triples in the graph which contain these, and returns the variable. As we are iterating over it, it'll add all of them. Each 'clause' is ended with a full stop and a space.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;res = ts.query(&quot;SELECT DISTINCT ?person WHERE { &quot; + querystring + &quot; }&quot;)&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="This is where we run the query. It pulls in the querystring variable, and runs it in a SELECT query looking for a DISTINCT ?person (a non-distinct would mean that if both A and B were friends with C, it would list C twice - whereas here it only returns each distinct entry) WHERE the querystring - each name.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="The res variable then becomes a list containing all the results. What is there left to do? Print 'em out, of course. Since we are just doing a demo, we'll print them to the shell.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;python&quot;&gt;for i in res:&lt;br /&gt;[tab] print str(i[0])&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="The reason it's i[0] is because inside each list component is an object serialization of the triple. If you run it interactively, you'll see.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Let's see this script in action:&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;code class=&quot;shell&quot;&gt;darwin:~/bin tom$ python friendscmp.py tommorris adactio t&lt;/code&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="We invoke the script with a list of arguments - in this case, &lt;a href=&quot;http://www.adactio.com&quot; rel=&quot;met friend&quot;&gt;Jeremy Keith&lt;/a&gt;, &lt;a href=&quot;http://tantek.com&quot; rel=&quot;met acquaintance&quot;&gt;Tantek &amp;#x00C7;elik&lt;/a&gt; and myself. The script goes off, gathers the RDF representation of their friends list and then queries it for people who we all know (that is, who we have all followed on Twitter).&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&lt;samp class=&quot;shell&quot;&gt;http://twitter.com/BenWard&lt;br /&gt;http://twitter.com/cackhanded&lt;br /&gt;http://twitter.com/cubicgarden&lt;br /&gt;http://twitter.com/arielwaldman&lt;br /&gt;http://twitter.com/drewm&lt;br /&gt;http://twitter.com/codepo8&lt;br /&gt;http://twitter.com/briansuda&lt;/samp&gt;&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="Consider this a kind of Hello World of RDF querying. Where to go from here? Well, you can beef up your SPARQL-fu so you can make more elaborate queries. I'd suggest you start with &lt;a href=&quot;http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-semantic-web-tutorial.html&quot; rev=&quot;vote-for&quot;&gt;Leigh Dodd's tutorial on XML.com&lt;/a&gt;, and then maybe punish yourself with the &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-query/&quot; rev=&quot;vote-for&quot;&gt;specification document&lt;/a&gt; if that's your kind of thing.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="What else is there to query? Well, today I've been working on mapping last.fm data. For instance, here's &lt;a href=&quot;http://rdf.opiumfield.com/lastfm/friends/tommorris&quot;&gt;my friends on last.fm in RDF/XML&lt;/a&gt;. You could play about with mashing up data between services. How about &lt;a href=&quot;http://www.dbpedia.org&quot; rev=&quot;vote-for&quot;&gt;dbPedia&lt;/a&gt;? Just as you can query Twitter friends lists, you can do the same for - oh - the whole of Wikipedia. If you are playing with dbPedia, be sure to do it interactively in the Python shell so you can discover things like the language construct built in to RDF and used heavily in the dbPedia dataset. Yep. I18n is built-in for every string literal. And Unicode. Unicode rocks. And if you are a Pythonista, you can grok Unicode quite a lot easier than everybody else since your language of choice has native Unicode support.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="This is all well and good for data which we've published explicitly as RDF. But what about &lt;a href=&quot;http://www.microformats.org&quot; rev=&quot;vote-for&quot;&gt;Microformats&lt;/a&gt;? Microformats embed data in to the HTML of web pages. Well, if you've got well-formed XHTML, you can run it through &lt;a href=&quot;http://triplr.org&quot;&gt;Triplr&lt;/a&gt; and get data out. RDFLib-compatible GRDDL is something I may work on soon.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="As for what &lt;em&gt;I&lt;/em&gt; want? I'd really like someone to port RDFLib to Ruby. Come on, we've all got a deep, burning &lt;a href=&quot;http://www.railsenvy.com&quot; rev=&quot;vote-for&quot;&gt;Rails envy&lt;/a&gt;. There's lots of Rails developers we can infect with this sordid, evil RDF stuff.&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="&#13;" created="Mon, 24 Sep 2007 21:03:04 GMT"/><outline text="You can download the source code for the script used here: &lt;a href=&quot;http://tommorris.org/code/friendscmp_py.txt&quot;&gt;friendscmp.py&lt;/a&gt; (consider it GPLed)" created="Mon, 24 Sep 2007 21:03:04 GMT"/></outline></body>
</opml>
