tommorris.org

Discussing software, the web, politics, sexuality and the unending supply of human stupidity.


Hari-gate: behind the scenes at Wikipedia

I was asked by David Allen Green, the writer behind the Jack of Kent blog, to write about the situation with Johann Hari who recently apologised for various acts of journalistic malpractice including substituting interview copy with background material, and editing articles on Wikipedia using at least one pseudonymous account (User:David r from meth productions - hereafter ‘David r’). I was going to write various things about it when the story originally broke. I originally had some doubts about some of the evidence that was presented linking accounts on two different wikis with an IP address, but further evidence turned up to show that there indeed was a link.

Instead, I thought I’d give a more general introduction as to how the Wikipedia administrative system works in cases like this. I should first make a disclaimer: I’m just interested in showing the issues as they relate to Wikipedia policies, rather than making a political point about Hari or whatnot. That said, it would be pointless to deny that my politics line up very much with the political views of Johann Hari and indeed David Allen Green.

I’m just using the Hari/David r case to show roughly how these things work out, because behind the scenes at Wikipedia, there’s quite an interesting system for how these kinds of disputes and cases are resolved. There’s a whole hidden iceberg of complexity that the average web user doesn’t see. For those who keep track of things like the Hari case, it’s worth having some background on how it works so they can have a better shot at uncovering wrongdoing in the future.

The first thing to understand about Wikipedia is that there is an important difference between blocking and banning. A block is a technical measure designed to prevent someone from editing Wikipedia. If you turn up and start vandalising pages, adding the typical “Jimmy in year six at Suchandsuch school is gay!”-type vandalism or whatever, they are likely to be warned a few times and blocked. This is normal, everyday practice on Wikipedia: I’ve been responsible for at least 80 or so of these kind of blocks. Blocks are not supposed to be punitive: the first level of justice on Wikipedia is always preventative.

Users are blocked only long enough to stop them from vandalising and no longer. Of course, repeated violations earn one a progressively longer block. But the block is just that: a technical measure. Let’s say you are at Suchandsuch school and someone on your school network goes to Wikipedia and starts informing all the readers of the article on giraffes or whatever that poor Jimmy is in fact a homosexual, and the school gets blocked, there’s nothing to stop you from going home and editing. Or getting an account and editing. That’s all fine and dandy. A block is not necessarily a big deal: it’s more like an ASBO.

David r was banned in July. Banning is a social measure where the community decides that the person’s invitation to edit the site has been rescinded. You can read David r/Hari’s ban discussion: there were fifteen users who supported a ban (myself included), and three who opposed the ban. David r is banned indefinitely from editing anything on Wikipedia. As we now have confirmation that David r is Johann Hari, Johann Hari is indefinitely banned from Wikipedia.1 This means that if he pops up with a new account and someone can confirm that the account is a “sockpuppet” used by Hari, that account will be blocked indefinitely on sight.

Who does all this? Volunteers. As I said, I supported the ban when it was proposed in July. I’m not an administrator, just an experienced user of Wikipedia. The blocking is done by administrators, who are sort of the caretakers and cops of Wikipedia. If we wanted to map existing categories of political governance onto Wikipedia, the legislature is everybody (anyone can propose changes to policy, and then we try to use consensus to develop that into a policy or into guidelines). The administrators are really magistrate, policeman and executioner rolled into one. They are the ones who hand out the immediate justice. If you get blocked, you can appeal that by making an unblock request which gets handled by a different administrator. Complex cases end up at the Arbitration Committee, an elected panel of seventeen experienced users who hear cases and then have the ability to determine bans, blocks and other measures.

Bans, of the sort that David r/Johann Hari got, can be given out through three processes: by the Arbitration Committee (as described above), by Jimmy Wales (who still holds the power in a constitutional monarchist type of situation but is generally expected to not use it in the same way as the Queen is expected not to send people to the Tower of London), and a “community ban” (which is what David r got). In the latter, the community is presented with the option to ban someone and then has a consensus-based discussion and provide arguments. These look like votes, but aren’t. On Wikipedia, we hold that polls are evil. If you looked at that ban discussion, it looks like a vote, but you’ll notice that in addition to saying “Support” or “Oppose”, each user provides reasons. Once the discussion has run for a certain period, an administrator “closes” the discussion, sums up all the arguments made for and against, weighs them up and comes to a decision.

That’s how we administer justice, then, but what about evidence collection?

Here there are some other special powers worth knowing about. There’s two in particular: Oversight and CheckUser. These are so-called “advanced permissions” and are held by a very small number of people. Those people have to identify with the Wikimedia Foundation: that is, they have to send proof of their real-life identity to the Foundation. The powers are given only very rarely, and users with those powers have significant levels of community trust. Oversighters have the ability to delete content and wipe it from the historical log. This sounds very much like the “memory hole” from Orwell’s 1984, right? Yes, but it’s a very rare thing. I’ve had to use it only once or twice in the last year. The time that sticks out is when a kid from a school in the United States vandalised a page and inserted the phone number of a schoolfriend along with, of course, the claim that he was gay. Ordinary vandalism, sure. But it is ordinary vandalism that potentially reveals personal information about a minor. That’s kind of a big problem. I fixed the vandalism very quickly and got a admin to block the school’s IP address to prevent them from posting any more. But anyone who checks through the historical log could see this poor kid’s phone number. I e-mailed an oversighter and within an hour or so the edit was wiped from history.

The other power is more important for our purposes: the CheckUser. CheckUsers have the ability to see what IP addresses a particular user has been using for a period of a few months after they used that IP. A CheckUser can see if two users have been using the same computer. The power that CheckUsers and Oversighters have is kept in check by an auditing process. Every time they run a CheckUser on someone or Oversight something, ideally someone from the Audit Subcommittee checks that the use of the advanced permissions is handled well. Here we have the same issues we have with law and police processes in real life: to get a CheckUser to potentially infringe on someone’s privacy, someone from the community needs to present some reasonable suspicion that two users are in fact one and the same. If you’ve watched legal shows on TV, there’s the same kinds of language that goes on with use of CheckUser. CheckUsers aren’t supposed to go on a fishing expedition. I’ve been involved in one or two situations where a CheckUser had to look into something, and the only information you get back is basically a yes or no: they either tell you that your suspicions have been “confirmed” or that the check didn’t turn anything up.

Part of the problem with doing this kind of investigation is that not all of the evidence is always available to all users: if someone has had pages deleted, only admins can see that. If someone has had edits oversighted, admins won’t be able to see that. If they have been misusing multiple accounts, we need reasonable grounds before a CheckUser will do an investigation. Wikipedia’s social processes are interesting here: in ten years, the community has crafted a system that has some elements of a formal legal system.

What are the practical lessons from this that non-Wikipedians should take to heart to help keep powerful people (journalists, politicians and others) from abusing Wikipedia in the same way they often manipulate other media?

  1. Look at editing patterns overall. Individual edits aren’t great signifiers of guilt, but edits over extended periods of time.
  2. Collect “diffs”: diffs are the individual edits made to the page. To give an example, here’s a diff of an edit I made a week or so back. The section marked in green is the text I added to the page. If there are sections marked in red, that is what has been removed from the page. But be aware that very occasionally the stream of diffs gets manipulated (by the Oversighters and through revision deletion for privacy and other reasons.
  3. When you see vandalism and problematic edits, please report it. Post about it on Editor Assistance, explain what is going wrong, what needs doing and provide links to diffs and user pages. If you don’t understand that, you can always leave me a message although I can’t promise to respond quickly, if action needs to be taken, I’ll try and poke the relevant people into action.
  4. If you see someone inserting pseudo-scientific bunk, post about it on the Fringe Theories Noticeboard. There’s a community of people who deal with pseudoscience and fringe theories.
  5. Johann Hari edited lots of articles about living people: if you see someone adding unsourced or potentially defamatory material about living people, report it to the biographies of living persons noticeboard. We have processes to handle this.
  6. The sooner we learn about these things, the sooner we can handle them and the less complicated the process becomes. If you’ve got suspicions, don’t wait. CheckUser data is kept for a certain period: after a certain amount of time, the data the CheckUsers can get to is thrown away.
  7. Learn how to read Wikipedia’s Logs. The Log is where all non-editing actions are recorded. If someone moves a page or blocks someone or gets blocked or whatever, it all goes in the log. Here’s the log of all actions done to my user account and here’s the log of all actions I’ve done. The latter is rather boring. It’s mostly me uploading images of think tanks and public policy organisations and renaming pages and files. The former is far more interesting: it has all the user rights I’ve been granted (rollback is an anti-vandalism tool, reviewer is a hang-over from the pending changes trial, and file mover is exactly what it says it is), the fact that my user page was protected because people have vandalised it (a common enough occurrence if you fight vandalism), and the fact that I was blocked for five minutes back in March (I was hit by friendly fire in an anti-vandalism shootout!). Now, mine aren’t that contentious. But look at David r’s. There are numerous blocks and unblocks in there. If you were trying to investigate the history of the David r account, you might want to email those admins or seek out exactly what it was around those dates that led to the blocks. They are likely to be the more interesting and sordid bits of the user’s editing history.
  8. Check the time and date of the edits and graph them out. Remember that all times on Wikipedia are displayed as GMT by default, so you need to account for time zone and daylight savings. But see if the edits are done the whole week or just during office hours. If someone is editing just from the office, you shouldn’t expect to see edits at the weekend or in the evenings. On the other hand, if there are lots of edits with no discernible pattern of when they are sleeping, you might have a situation with more than one person using the same account (or just someone with a lot of time on their hands).
  9. Check contributions to the other projects. There’s a tool that does just that. You put their username in and it’ll show you what edits they’ve done to other Wikimedia projects. This will help when snooping out problem users. I’ve had users who have been blocked or banned from English Wikipedia and who have then gone on to edit at other projects. I haven’t seen any evidence that Hari/David r has been editing other projects or other language versions of Wikipedia. As the other projects aren’t used as much, it isn’t so much of a concern about them influencing the content (“Johann Hari vandalised Wiktionary!” isn’t quite as sexy a headline as “Johann Hari vandalised Wikipedia!”) but the cross-wiki edits can be useful evidence in working out who someone is.
  10. Check out LinkSearch. It is a very useful tool. You pop in a web address and it’ll show you all the pages which have links to that page or domain. If you are interested in where Hari’s edits have been discussed, the LinkSearch pages for Jack of Kent’s blog and for johannhari.com make interesting reading.

Given all of that, you may want to see what is currently going on in the Hari/David r affair: there’s a debate on the Administrator’s Noticeboard for Incidents called Johann Hari sockpuppetry which alleges (although not with much evidence in my view) that there are more accounts belonging to David r/Johann Hari.

If anyone interested in the Johann Hari affair following the published apology have any questions about the Wikipedia side of it, feel free to post a comment or send me a tweet, or post on my Wikipedia user talk page.

P.S. If that wasn’t enough Johann Hari-on-Wikipedia action, you might also want to read William Beutler’s post on the same topic.

P.P.S. I just looked through Velvet Glove, Iron Fist’s post about the Hari/David r affair. One interesting thing: David r made an off-wiki legal threat to someone on SourceWatch. If that had been on English Wikipedia rather than SourceWatch, he could have been blocked under Wikipedia’s no legal threats policy.

Coverage: Adam Tinworth at One Man and His Blog.

  1. If Hari wishes to make the community aware of problems with the page about him, he can still e-mail OTRS, essentially Wikipedia’s confidential volunteer-run customer service department, at info-en@wikimedia.org