Tom Morris



2009.08.31

  No. 988 

What Murdoch calls unfair competition, I call just being sensible. The BBC are paying journalists to write stories for TV and radio. It makes no sense to not also publish that on the web. I do wish the Murdoch companies would get on with it and start charging for their online properties. It'll be good to see how few people choose to pay for Murdoch's dreck on the web. 2009-08-31T12:50:59ZUntitled entry permalink

Time in RDF - An Opinion 2009-08-31T10:30:04ZPermalink

Ian Davis has a series of blog posts about different ways of modelling temporal relationships in RDF. It's a tough problem, but it's something that needs to be solved. Either you solve it at the semantic level, or maybe the syntactic level - it needs to be solved, otherwise you just get a big fucking sticky mess left. My own opinion is that temporal problems need to be solved in the same way that spatial problems have been solved in RDF: by common agreement. With spatial things, it's by creation of the SpatialThing class and the WGS84 vocabulary.

This seems to match to Ian's Time Slice solution. I prefer that on technical grounds: named graphs and reification are a bit of a clumsy solution, and creating new classes seems like a real pain.

Let's look at the practicality of this by taking a popular ontology and seeing what's likely to change. FOAF: properties are accountName, accountServiceHomepage, aimChatID, based_near, birthday, currentProject, depiction, depicts, dnaChecksum, family_name, firstName, fundedBy, geekcode, gender, givenname, holdsAccount, homepage, icqChatID, img, interest, isPrimaryTopicOf, jabberID, knows, logo, made, maker, mbox, mbox_sha1sum, member, membershipClass, msnChatID, myersBriggs, name, nick, openid, page, pastProject, phone, plan, primaryTopic, publications, schoolHomepage, sha1, surname, theme, thumbnail, tipjar, title, topic, topic_interest, weblog, workInfoHomepage, workplaceHomepage, yahooChatID.

Which of those aren't changeable? I'd say pretty much all of them are changeable. depiction/depicts may not be actually - if a resource depicts you, it does depict you at a particular place and time, but so long as that resource continues to exist, and some idiot doesn't go and change the image to something else, it will continue to depict you. Same for schoolHomepage - you are allowed to have as many as you like, and so long as you spent at least some fraction of time enrolled as a student in that school (or college or university), you pretty much have your schoolHomepage for life. Same for workplaceHomepage.

dnaChecksum is dependent on what kind of checksum process you use: retroviruses exist and a checksum procedure that doesn't account for them would mean that a DNA checksum might actually change over time.

How do we represent this? Philosophically, I lean towards a model based on events. Basically: I think about the world like this: you have particular things, and those particular things have properties - some of those properties are relations to other things, others are just instantiations of some universal. The particular properties that exist do so in some state of affairs. Facts are just expressions of those states of affairs. If you have some particular p and at time t1 (at which state-of-affairs s1 obtains) it has properties W, X, Y then at time t2 (at which state-of-affairs s2 now obtains) it stops having property W and instead has properties X, Y, Z, then between t1 and t2 there is some event e. Event e is a relation between s1 and s2, but e has no relationship to our properties W, X, Y or Z. Certain events might have commonalities, even properties. In the context of a person, say, we might say that some events are of a similar type: for instance, changing one's name ("deed poll"), changing one's gender ("gender reassignment surgery"). We might identify those changes based on certain properties: a Resignation-event has as a property of the type of event it is that it effects only certain relations. Other events may be knock-on effects of a first event. There may be some causal relationship between one event and another, which would be understood best by reference to a counter-factual - namely, if e did not happen, f would not have happened.

There seems to be two ways we can represent this in RDF (and two ways in natural language): either we represent the states of affairs and infer the events from the temporal ordering of the states of affairs, or we represent the events and infer the state of affairs from the events. Which one better fits with the open world assumption of the Semantic Web? As I said, I lean towards events philosophically. Practically, I'm not totally sure. I'll leave it to others to work it out. I put together some examples: state of affairs v. event-driven.

Have I got anywhere? No, I've just stated my philosophical opinion. I hope you find that vaguely interesting. Maybe it'll help you if you are trying to solve the temporality problem in RDF. And, well, let's actually call it what it is: a problem. Not a big problem, but a problem that needs fixing.

A scanner fit for a busy person 2009-08-31T19:20:35ZPermalink

Scanners are a pain in the arse. Someone should really design one that isn't.

The problem with scanners is they are designed for every purpose, and so suck at whatever purpose you put them to. In my ideal world, I'd love to have a really simple flatbed document scanner. Here's how it would work:

1. You'd approach the scanner, hit a button on it that says "New Job".

2. You'd load anything you like onto the platter, and press "Go" just like with a photocopier.

3. You'd keep on doing this until you have done the document.

4. You'd hit "Job Complete". It would then compile the job into a PDF and host it on a local HTTP server.

5. It would also e-mail a link to you with the URI of the job. You open up the e-mail, do an HTTP GET request and a PDF would be returned to you.

6. Once you have finished, you would send an HTTP DELETE request with the relevant pre-shared authentication and the PDF would be removed from the server.

7. In addition, an RDF/XML file would be prepared that would be available using Content Negotiation (or by sticking ".rdf" on the end of the URI) which would contain all the metadata about the scans: when they were done, by whom, and how many individual pages are contained in the document and when each of them was scanned. This would serve as a manifest. It would also be available in plain text by sticking ".txt" instead of ".rdf".

8. As the individual scans would be stored in a neutral, raw format, you could access each individual scan by HTTP in whatever format you like, also done using HTTP content negotiation and file extension mapping.

9. In addition, you could insert a USB thumb drive and the scanner would attempt to load a folder onto it containing all the scans that have been done in that session.

Of course, all of this could be made transparent by having some software available for Windows/Mac/Linux desktop machines. But having the HTTP infrastructure there makes it possible for anyone to write really simple clients for the scanner. Call it Open Scanner if you like. Heh.

It would be as fast as a photocopier, and it would be used for the same purpose as a photocopier: making copies of documents, sometimes ones you are only allowed transitory access to the printed copy of. Only difference is that it shoves it all in a PDF file rather than printing it onto paper.

In addition, hopefully super-smart open source OCR software would be available, so you could go to ".xml", ".xhtml" or ".html" and get back a reasonably rich-text document format that you could store along with the PDF so you can search it.

What it would not have: a requirement that you push a button on a computer to scan anything. The scanner would sit completely free-standing and interact with other computers on the network or across the Internet. Such authorisation could be managed using a built-in web management system or using SSH.

A scanner that's as easy to use as a photocopier. That would be truly excellent. Until it comes along, I shall struggle by with this thing and all the dreadfully shit scanner software that's churned out by manufacturers. And, no, I still don't touch inkjet printers with a barge pole.

Links from del.icio.us

Comments
blog comments powered by Disqus


Tom Morris 9f4907d871750fd4c9b9bad7086701b51d6abd10 bd9f81a05283ed85e699175ed057b4a497f20b77 802c68123e12bf69d99a25a87cef360f18813fe4
Currently in: East Sussex, England
Usually in: East Sussex, United Kingdom
AIM: tommorris
YIM: tom.morris

I am a , an , like to code in and (and Java, but let’s not talk about that), and noodle about with and the .

I have an MA in philosophy from Heythrop College, University of London. My philosophical interests are in analytic metaphysics, ontology, modality, the work of , , , and . I have a strange, unfulfilled interest in . I’ve been influenced by Gadamer, by , , and .

Musically, I like jazz fusion, soul and P-Funk. My musical nirvana would be a mixture of Beethoven, Miles Davis and George Clinton topped with a side-serving of Erykah, Jill and Angie.

I also write for the Citizendium, an online encyclopedia project. If you know about stuff, you should join in. I occasionally produce audio recordings for The Pod Delusion.

Elsewhere:

  • GPG Key
  • del.icio.us
  • Flickr
  • Twitter
  • Jaiku
  • LinkedIn
  • ma.gnolia
  • blip.tv
  • upcoming.org
  • MetaFilter
  • LiveJournal
  • CiteULike
  • Technorati Profile

RSS Feed Subscribe:

RDF

« August 2009 »
SuMoTuWeThFrSa
 1
2345678
9101112131415
16171819202122
23242526272829
3031 

View in month context

On this day in: 2003 2006 2007