tommorris.org

Discussing software, the web, politics, sexuality and the unending supply of human stupidity.


On Attention

Ian Forrester has been putting together an idea he calls Lightweight Attention Preference Markup, using RDFa. I'm glad to see Ian is experimenting and building with RDF.

The problem I see is that I am not sure what the point is of attention formats. I can see the point of attention, sure. That's easy. But for me, attention is a set of algorithms which sit above the data layer. When building applications, you try hard to separate out the business process from the database. Attention is broadly about aggregating someone's actions online and then inferring things based on them.

The sort of things we are aggregating are, for instance, pages which a person has viewed, actions taken inside 'attention' applications like RSS feed readers, e-mail and IM clients and now social networks like Facebook, and object-direct social networks like last.fm (for music) and del.icio.us (for bookmarks). Similarly, sites across the network often have common actions like blog posts, posting comments or voting or purchasing. Ideally, an attention engine would be able to pull in data like who I'm talking to, what products I've bought on sites like Amazon, what music I'm listening to, who and when I add people to social networking services, and then make rules-based guesses as to how to direct my attention to further my goals.

That's the idea. But I don't think that a particular format will solve it. Not a microformat, not Attention.XML, not APML, not a new RDF format, not Ian's new format. I think that each service will be a new battle (in terms of adoption) and will map to different parts of the semantic/data stack (which includes the full range: JSON/YAML/language-specific array/object representations, 'plain old' XML, official microformats, domain-specific and use-specific microformats, the RDF stack, GRDDL etc.).

This is one of the reasons that I've been working on RDFizing Twitter and Last.fm, and putting together things like the OpenID detector for FOAF profiles. The co-ordination cost for getting Twitter, Last.fm and OpenID to all talk a common 'attention' format is an impossibly high one. Whether you do that work in a W3C committee or a new-fangled place like a wiki, it's going to be a sucky process.

Instead, in RDF, we have a way to represent all the data in a format that could quite feasibly scale up. Through GRDDL, XSLT and microformats, we have a relatively straight-forward process to move data in. What we get for very little work is the potential of a relational database where all the relationships are URLs.

The problem with hitching data formats to specific use cases is that nobody knows what the use cases will be. Technology doesn't exist in a vacuum. Look at SMS. Wikipedia puts it quite aptly: "few [people writing the GSM spec] believed that SMS would be used as a means for sending text messages from one mobile user to another". If the GSM specification writers had written a spec specifically for what they envisioned the primary use case was - namely, operators sending notifications to users - then a future use case could potentially have been walled off.

Nobody foresees the eventual use of technology. Alan Turing could probably not predict the IBM PC, the Macintosh, Quake or the World Wide Web, even though the work of Turing had an enormous effect on making all of the subsequent history of computing possible.

Therefore, when we design specifications, we need to make them open and extensible - at least until we reach omniscience. This is something I think is a flaw in attention metadata formats. I already publish data about which songs I listen to. Last.fm have an API available so you can get the data in XML, or RDF. Why do we need another format to represent that?

The attention 'value' that is placed on particular sources or feeds is usually represented as a floating point rating. This is almost like a cryptographic hash - it's one way. If software A thinks that based on my actions I like one feed more than another and pops that ranking into an APML feed, say, then I still sit unenlightened. Why the ranking? It's a one-way hash. A different attention tracker is meant to trust this, even though the process that is used to calculate it may as well have been Mystic Meg's bloody tarot cards.

We can own our attention data all we like, but we need open attention algorithms too, if we want to do anything truly useful with it. If Particls ranks my RSS reading in one way and BlogBridge another, how do I make a decision? Just based on gut feeling? By open attention algorithms, I mean inference, not algorithms in any complex mathematical sense. But statements like "If you read TechCrunch, you are probably interested in Web 2.0, Web 2.0 Startups and Silicon Valley" etc. - we need a common format for these. ("Could be" is an important part of these statements, remember "my TiVo thinks I'm gay"!)

A summary, then. I publish a shedload of data in machine-readable form. My site contains lots of microformats dotted around - from simple things like tags on up. I publish a signed FOAF file. I publish an RSS feed. My pages support GRDDL, and are mostly written in XHTML 1.0. I have an OpenID. I publish more data through third party services like Flickr, last.fm, del.icio.us, Twitter, Jaiku and so on. If you want to play the attention game, work from that. We don't need to build a new attention infrastructure. We've got HTTP, (X)HTML, RDF and microformats. Make your attention inferences from that lot.