Discussing software, the web, politics, sexuality and the unending supply of human stupidity.

Wikipedia: ten years on, it's more like a library than you think

Today and yesterday, I’ve been at the British Library for the Editathon. We are celebrating ten years of Wikipedia and also supporting the GLAM Project.

Yesterday, I helped a curator from the British Library learn about editing Wikipedia, and tried to give them a flavour of what it’s like to edit Wikipedia and be involved in other similar projects. People were interested and excited and showed a real willingness to work around the bureaucracy that prevents people from doing what has become the cultural and academic version of open source. Just as IBM and even (in suitably small and safe doses) Microsoft do open source now, museums and universities are doing Wikipedia and other free culture projects more and more.

When you think of Wikipedia, you may think of someone like this:


That’s the image anyway. An anonymous mob of Internet geeks who are making an encyclopedia? And, well, you’d probably be right. Research on systemic bias in Wikipedia has shown that the average Wikipedian tends to be white, male Internet geeks from rich northern hemisphere countries. And this leads to all sorts of ‘wikigroaners’ where the article on some important topic is significantly smaller than the article on some pop culture nonsense.1

But something that librarians and people in educational and cultural institutions should be aware of is how despite the systemic bias and despite the obvious content bias this inadvertently introduces is how similar the goals and values of Wikipedia are with their institutions. But more: Wikipedia really isn’t that different from traditional knowledge institutions.

If you focus too much on who writes Wikipedia, you’ll miss how Wikipedia structures itself. It would probably take a lifetime to really be aware of exactly what goes on behind the scenes at Wikipedia. As numerous sociologists and social scientists have observed, there’s a thriving culture here. And when you start looking, even though people like Clay Shirky, David Weinberger and Yochai Benkler2 see it as a profoundly new and significant thing from anything that has gone before, underneath there is a whole system that makes it a lot less anarchic than it seems.

While smart folk like Benkler may talk about this as being “emergent” behaviour, and while not so interesting people (social media consultants, futurists and so on) talk about Wikipedia as being an example of crowdsourcing, the more one jumps into Wikipedia, the less radical it seems.

Sure, there’s egalitarianism, there’s a radical commitment to openness (and a very strong resistance to censorship as we saw playing out in the scandal following the publication of the cartoons of Muhammed in Denmark).

If you’ve never edited Wikipedia, you’ve probably never clicked over to the Community Portal. You’ve probably never seen the discussions that go on over at the Village Pump. Did you know there’s a whole community of people who do nothing but welcome new people to Wikipedia? They’re called the Welcoming Committee. They go around putting welcome notices and plates of cookies on people’s user talk pages. When our party of Wikimedians3 were given a tour around the British Library today, we were shown round by someone from the ‘welcome team’. Walmart famously has people whose job is to say “welcome to Walmart”.4

There’s a little swarm of people who produce templates. All those infoboxes and navboxes don’t just magically come into existence by themselves. Your company has an IT department, right? Well, Wikipedia has a tech team, and developers who work on MediaWiki.

And what about branch offices? Wikimedia has chapters around the world, with dedicated volunteers taking on the burden of accountancy, company registration and other such bureaucracy. This is partly so they can reach out to institutions that can help with the mission of producing a giant encyclopedia. Wikimedia UK has been doing this with museums and other cultural institutions. The work that has been done with the British Museum has become a template for how other cultural institutions can get involved with Wikipedia.

I’ve barely scratched the surface. There’s a whole cluster of people who take photographs to post on Commons. There are people who do nothing but add IPA symbols to articles to show how you pronounce words. Then there’s the Spoken Wikipedia crowd who record everything from pronunciations all the way up to full articles. There’s people working hard to bring Wikipedia content to those without Internet access through Wikipedia 1.0, which hopes to edit down important Wikipedia articles into the form of books, CDs/DVDs and so on. What’s the point, eh? Well, go to sub-Saharan Africa and you’ll find plenty of schoolrooms who could greatly benefit from an encyclopedia. And iPads and Kindles don’t tend to work in countries where potable water is still an issue. Yes, I’m talking to you, ‘First World Problem’ crowd.

There’s translators too. Pages needing translation into English get listed in a central place. And if you need someone who can speak a particular language, Babel boxes allow users to clearly show who speaks what, and there’s a whole set of user categories which show you who speaks what and with what degree of fluency (wait, Wikipedia tells me I’m wrong: I meant proficiency). Today at the British Library event, there was a guy who was translating the article on the British Library into Chinese. Wikipedians are doing that every day. I’m not, but that’s because I’m part of that 62% of the British population who are monolingual. I’m not proud of that.

Wikipedia has a Reference Desk, which helps people find things—on Wikipedia and in other sources. It’s just like a library reference desk. The difference is that unlike your local library reference desk, they sort of hope you’ve searched Wikipedia and Google first!

I hear scepticism. Anyone can edit it, right? Well, yes, but there are people who take it upon themselves to fix that too. If I’m watching television, I tend to do new page patrolling. This is an initial first check on articles to make sure that they meet the most basic criteria for continuing to exist as Wikipedia articles. Many will not. Many are just vandalism or attack pages: my favourites being teenagers putting up Wikipedia pages saying that their classmates are “gay”, or the bands nobody has heard of launching themselves out into the world not by producing great music but by putting up a Wikipedia article (there’s a policy about that called, rather cruelly, Nobody cares about your garage band).

If your new page makes it past the new page patroller without someone CSDing it—that is, nominating it for deletion under one of the “Criteria for speedy deletion” (the aforementioned attack pages, patent nonsense or, worse, complete bollocks and a few other criteria)—it will face the wrath of the cleanup taggers. New page patrollers often do this, but so do recent change patrollers. If the article isn’t complete bollocks, someone will add a cleanup template to the top of the page. You’ve probably seen them: “This article may require cleanup to meet Wikipedia’s quality standards”. It shouts out to you, the reader, as a stark warning: here lie dragons. But Wikipedians don’t just do it for fun: behind the scenes, these templates are there to help contributors fix those problems. Will they ever go away? Maybe, maybe not.

Your article may get forgotten. In come the Categories WikiProject who try to ensure all articles get put in a category, and that the categories follow some kind of sane organisation. They do an amazing job: Wikipedia has 3,529,634 articles, and there are only 6,226 articles that haven’t been placed in a category. To add articles to categories, people often use HotCat which makes it a lot easier.

But what about the hordes, the great unwashed, who’ll come and make a mess. Well, there are plans to solve that too. Firstly, there’s the many anti-vandalism bots which are getting more and more sophisticated, employing machine learning. There are programming competitions to build better anti-vandal bots. There’s even a project to study Wikipedia vandalism. Unlike a few years ago, Wikipedia bots seems to handle vandals very well these days. Wikipedia bots? Yes. There’s hundreds doing all sorts of useful tasks. If you don’t sign a comment on a talk page, SineBot will come and add your signature for you. SineBot has made over a million contributions to English Wikipedia, but he doesn’t get RSI or want to chill out and watch House. SmackBot has done over 3 million edits doing such important tasks as formatting ISBNs. ClueBot NG edits thousands of articles every minute to remove vandalism. But to do this, it has people who have to tell it what is and isn’t vandalism (you can report false positives). All these bots require programmers, and policy people and admins to keep an eye on them.

Bots can’t catch everything. That’s why more and more Wikipedia language versions and sister projects are running patrolled edits, which makes it so that people can mark that a particular edit has been approved. If you are just reading Wikipedia, this is completely invisible to you, but is a very useful tool in ensuring that rubbish doesn’t get onto Wikipedia. Currently, English Wikipedia is just doing this for New Pages, but maybe one day it’ll run on Recent Changes too.

Finally, there’s Pending Changes which was tried on English Wikipedia last year. The point of this is to make it so that the reader never ends up seeing the sort of thing which made it into the Metro this week as examples of hilarious Wikipedia vandalism like Tony Blair having “posters of Adolf Hitler on his bedroom wall as a teenager”. With Pending Changes, problem articles get protected from edit by anonymous and new editors. Instead, they submit changes for approval and someone with a special “reviewer” status allows them to go on the site. Anyone can still edit, but not all edits will ever see the light of day. I’m firmly in favour of this, and it’s already being used on some Wikipedia language versions and on sister projects. This seems to provide a lot of what Citizendium was intended to do. Currently, Pending Changes is still not running on English Wikipedia though.

While Wikipedia may be a bit like The Borg (although as a Slashdot reader of old, Microsoft was one of the best Borgs of all time—OF ALL TIME!), I’d like to think that it is more like an amazing Escher painting:

Or a bustling city. Or a game of Twister for octupuses. Aged ten, it has grown to be far more like a real encyclopedia than anyone could ever have imagined. It’s much closer to Britannica than it is to the Time Cube. The Internet would be a poorer place without it. And if humanity all bloody well abided by WP:AGF, WP:EQ, WP:CITE, WP:LOL, WP:CIVIL and, of course, Don’t Be A Dick (or jerk as is now preferred, despite ‘jerk’ being an ugly Americanism compared to exact British English word that is required, namely ‘arsehole’), the world would quite probably be a better place.

One last thing. I hope you join me in saying three cheers for ten years of Wikipedia![original research?]

  1. There is a sigificant problem with this kind of criticism: often it doesn’t take into account article splits. Yes, you may go to the article on ‘Halo’ (the Xbox game) and then go to the article on ‘Judaism’, run it through word count and find that the article on the former is longer than the latter. But the problem is those articles of significance often end up getting split out into a wide variety of articles. You’ll then go to a short summary section and there’ll be a link to a longer article. So if you were to go to an article on, say, Judaism, you might find that there’s a page that’s been split out on Jewish culture, and only a brief summary of issues about Israel because that’s all been split out into Israel and Zionism and countless articles on all aspects of two-state solutions and resolutions and peace accords. Sometimes people putting these wikigroaners together take account of this, but often they don’t.

  2. See the magisterial The Wealth of Networks, an allusion, of course, to Smith’s The Wealth of Nations. Benkler’s basic interest is in producing a theory of what he calls commons-based peer production and then describe how it has is really significantly and interestingly different from either the market-based production of capitalism or the economics of state actors. At the SXSW Interactive conference a few years back, Bruce Sterling described Benkler’s book as being a sort of Das Kapital of the Internet.

  3. Wikimedians are made up of Wikipedians and contributors to other Wikimedia projects like Wikinews, Wikibooks, Wikiversity, Commons, Wikisource, Wikiquote, Wikispecies and so on. Apparently, being a Wikimedian is decided solely on a personal basis: if you think of yourself as a Wikimedian, congratulations, you are!

  4. It’s an American thing. It’d never work over here. We’re too cynical.