tommorris.org

Discussing software, the web, politics, sexuality and the unending supply of human stupidity.


python


Django: automatically testing admin pages are working

Today, I was working on a simple Django app. I was cranking away on something, then went to the admin panel and… something wasn’t working. I had made a typo and written foriegn rather than foreign in formfield_for_foreignkey.

And I hadn’t noticed. Computers are supposed to notice these things. Test suites and CI servers are supposed to catch my errors.

I realised then that having something that just checks to make sure that the admin panel is working is useful.

Something like this.

You could automate this some more: have it so it probes through your admin panel and clicks links for you. This will do for now though.

One could also go further and have the tests put data in the forms and so on. But this is good enough. It’s likely to blow up if you’ve made a mistake when you are writing the Python code that defines the admin panels.


Jeremy Keith has smart things to say about browser support.

I’ve been screwed over in the past by saying “we’re going to support IE9, IE10 and all versions of Firefox, Safari and Chrome that people actually use”. I’ve done this based on solid data—basically, going to Google Analytics, dumping the data out and applying the Pareto Principle and a craptastic Python script I have that extracts the information I need out of the morass of Google Analytics mess.

The problem occurs because then (a) we don’t build the back-end in a sane, reasonable, agile1 way (like, say, having good programmers building it in Django, instead management decides that it needs to be built by enterprise Java devs in some monstrosity of a CMS that refers to itself as an “enterprise portal engine” or some other bullshit) and (b) the front-end gets built in some crack-addled, buzzword-compliant JavaScript framework picked because it is on Hacker News and is sexy. If we built websites in a sane and rational way, this shit would be so much less complicated.

  1. By ‘agile’ in this context, I mean simply that it is built with technologies that make it easy to adapt to change based on feedback from design and front-end developers. Like, say, Django or Rails rather than Spring, and deployed on Heroku (at least during development) rather than some half-baked ops process.


amatch, Jaro-Winkler and dataset merging

amatch looks like a ludicrously useful Ruby library for string matching. It implements a variety of algorithms including Levenshtein, Hamming, longest subsequence, longest substring and Jaro/Jaro-Winkler. Jaro-Winkler is particularly useful for short strings like business names because it weighs matches at the start of the string higher than at the end. Amatch also lets you set up options for matching including custom case sensitivity.

Here’s the difference illustrated:

Imagine you have your target string “McDonalds”, here are the results Amatch will give you using Levenshtein and Jaro-Winkler.

Match stringLevenshtein similarityJaro-Winkler
McDonald's0.90.98
McDonald's Restaurant0.42857142857142860.8857142857142858
Burger King0.00.0

Note the difference in the second case.

HP Labs have a very interesting paper on company name matching. Their problem space is somewhat different from mine: they are trying to match company names solely based on strings and then clustering, while I have other factors I can use (namely, address, postcode and geographic location). A Jaro-Winkler similarity of >0.8 combined with a distance of under 50m is pretty much all I need to conclude a match.

Update: Jellyfish looks like a good Python library for the same job.


Can you hack offline?

Here’s a challenge.

Take your mouse or trackpad or whatever, slide it over to that little wi-fi control in the corner of your screen, click and choose “Turn Wi-Fi Off” or whatever the equivalent is.

Now start a new project in your favourite programming language, framework, IDE or whatever. Can you do it?

This isn’t some macho “how much of an awesome hacker dude are you?” test. It’s not a test of you, it’s a test of your tools.

In most cases, the answer will be “not as easy as it should be”.

In an ideal world, I shouldn’t have to have Google at my beck and call in order to hack.

IDEs should make this easier. But the IDEs are not yet quite as simple as they could be. This evening, I’ve been working on a project to take two XML files, churn through them and basically merge the two (yeah, the XML community sold everyone a bill of goods on that; try RDF if you want a data model that trivially merges together). This shouldn’t be too difficult, right?

I have IntelliJ installed and I have Eclipse installed. I fire up Eclipse and realise that I don’t have the Scala module installed. I’m on a lousy 3G connection. I fire up IntelliJ, and everything has been buggered around with since I last used it for Scala development, and it doesn’t want to properly talk to Maven. Hate.

I could start a Java project, but I actually want to spend my time using the XML I’ve parsed rather than setting up an XML parser.

I eventually resort to just firing up MacVim and a Scala REPL and cmd-tabbing between the two. I’ll turn the results into a proper Maven project when I get home. But it’s still sort of a compromise.

I could have used Ruby and Nokogiri. The query I’m running on this XML is complex and slow enough as it is without wanting the penalty of Ruby’s shitty GC.

Ruby has lost a lot of the offline hackability that it once had. It used to be when you installed gems, RubyGems would generate RDocs. You could fire up gem serve and browse them in your browser offline.

And documentation is necessary in dynamic land. Consider something like this:

XML.load(file)

That seems pretty straightforward, right? Well, what does it take? A file name? A file handler object or reference? Something that satisfies some arbitrary and unspecified file-like interface (it has a readlines method, maybe)? A URL that it will proceed to load from the Internet? Damnit, documentation needed. If we were in a language like Java or Scala, we’d get documentation in the form of the type annotation:

XML.load(file: File)

XML.load(file: URL)

But without that type annotation, we need documentation. So where is it?

On the Internet, obviously. You know, the Internet you can’t access reliably when you aren’t either at home or in the office. The Internet we rely on for all our cloud services but which stops working when we’re inside train tunnels.

If I run gem serve, almost all of the gems have no documentation. I bet Bundler has been set by default not to store documentation because the process of downloading or compiling the RDoc is “too slow” or something. Well, what’s more slow than that? Not being able to do any programming because you can’t actually figure out the calling syntax of a method you are trying to use.

Python gets a lot closer. The help() function in Python’s REPL is pretty damn compelling. If you are hacking away in the REPL, you don’t have to shift context to use the help function.

Is it too much to ask that I can read documentation for the standard libraries of the programming languages I’m using on my own computer when I’m connected to the Internet? Is it too much to ask that offline documentation should be a default?

You can turn wi-fi back on now.


HackDiary: Scala now more comfortable than Ruby?

A few years back, Python was the first programming language I’d pull out of the toolbox. It was the second programming language I learned that is actually still used (BBC BASIC, AMSBASIC, PHP, then Python). Then I learned how to use Ruby, and while it has some shortcomings compared to Python, it has some real advantages as a hacking language. The Perl regex heritage was one, as was the functional programming constructs like map and filter, and the REPL.

But tonight, I think I may have reached the point where the balance has tipped towards Scala.

The problem was simple. I had a text file consisting of times in the format HH:MM. I wanted to get an idea of the distribution of those times across the day. I opened a Ruby REPL:

a = "17:05
16:53
16:29
16:13
14:19
14:14
14:09
14:08
14:02
13:39
13:38
13:36
13:28
13:26
13:26
13:25
13:11
13:08
12:57
12:44
12:40
12:37
12:32
12:21
12:11
12:02
11:45
11:28
11:25"

Err, now what. I guess I need to reverse it. Okay.

a.reverse!

No, no, I shouldn’t be using bang methods. Stateful is hateful or some other Haskelly slogan. Naughty me.

a = a.reverse

I know. I’m fussy.

Err, now what. I want to group ‘em.

a.methods.sort

What? Ruby doesn’t have a grouping method? It must have. I’m not writing a for loop. It’s 2011. Fuck that.

Tab over to Google Chrome.

Wait, no Internet connection.

Open up a Scala shell. Paste the same lot in with some Python-style triple-quotes around them. Yeah, Scala has that. That’s pretty much one enormous reason to use it as opposed to Java, right? REPL assigns them to res0

var times = res0.split("\n").reverse.toList

There we go. A List[java.lang.String]. Scala’s immutable lists rock.

And then doing the data mangling is pretty easy. Type times., press tab, look down the method list. The Scala REPL puts all the methods in an alphabetical, pretty-printed list—like ls does in the bash shell.

“groupBy”. That looks right. I haven’t used this before. Better check the Scaladoc. Oh, wait, offline.

cp /usr/local/Cellar/scala/2.8.0/src/scala-library-src.jar ~/tmp/scaladocs/
cd ~/tmp/scaladocs
unzip scala-library-src.jar
scaladoc scala/collection/immutable/List.scala
open index.html

(That’s for Scala on OS X installed using Homebrew. The location of scala-library-src.jar will vary on other systems.)

Click around. Look at List. Here we go “groupBy”. What’s it tell me?

def groupBy[K](f: (A) ⇒ K): Map[K, Repr]

That’s pretty cryptic, right? It makes sense eventually but it takes time to learn how to read it. It says that it takes a function (f) which takes something of any type and returns a key (K), then gives you a back a Map[K, Repr]. Basically you hand it your list, and a function which takes each member of that list and produces a key from it. Then it gives you back a map with the key and all the members that represent that key.

times.groupBy(x => x.split(":").first)

Now I got a map of like “12” with a list of all the stuff that begins with “12” and it’s in, you know, res1 or something

res1 toList

Now we’ve got a List[(java.lang.String, List[java.lang.String])] in res2. For Scala-newbs, that’s a List of a 2-tuple made up of a String and a list of Strings.

So, where were we?

res2 sortBy(_._1) foreach {x => println(x._1 + ": " + x._2.length + x._2.map(x => "#").mkString)}

And here’s the result:

11: 3 ###
12: 8 ########
13: 9 #########
14: 5 #####
16: 3 ###
17: 1 #

Not the prettiest graph, but, you know, it’s a good start. It wouldn’t be very hard to turn this kind of data into a Google Charts API graph and open it in your web browser. Yeah, this is the JVM: there are ways. Horrible, evil ways.

The sort of code I’ve written isn’t pretty. It won’t win any awards from those clean code people. But that’s not the point. Nobody’s REPL code is pretty: that’s why it’s REPL code, not bloody production code.

The point is quick hacks and data processing. How does Scala do for that? Let’s see: type safety, Java libraries, a ridiculously awesome collections library, a permissive and enjoyable syntax (especially for higher-order functions—filter { _ > 5 }, anyone?). The only thing I really miss is some of the Perl inheritance in Ruby ($_, regex syntax). There’ll be tasks that Ruby will probably be better for, and I’ll probably still write a lot of scripts in Ruby (a large chunk of my life consists of crappy little Ruby scripts talking to each other, and typically filling up my inbox with a bunch of shit when they don’t). Scala may have tipped over into being the first thing I type into my terminal when starting to hack.

Or you could take another reading: all you need to do to get me to use your programming language is build a better collections library and a better REPL. I’m just fickle, a higher-order function whore (as opposed to a functioning whore of a higher order). If that’s so, great. Get on with building better REPLs and better collections libraries for your language and you’ll get my total, unerring loyalty until someone builds something better.


Loosely-coupled tag metadata: I bet your Tumblr blog can't do this!

I’ve been feeling really crap this week. I’ve had a throbbing headache and my teeth have been playing up again. I’ve been taking a wide variety of things for it: ibuprofen for the pain, lots of orange juice for the vitamin C and some surprisingly effective home remedies. But then I saw a tweet from fowlduck:

sometimes when you’re not feeling awesome the only cure is to do something awesome. time to code

Hell yeah. So yesterday, in addition to some work code I started on a little bit of personal coding. I was up most of the night doing this. And I think it is pretty cool.

Want to see it in action? Take a look at http://blog.tommorris.org/tagged/metaphysics

It is a tag page on this, my Tumblr blog, but look – it has metadata!

And not just any metadata, but RDF-backed metadata, and the ability to turn out JSON metadata for the RDF-phobic. (I haven’t decided whether I want to help said people yet.)

Of course, it is loosely coupled. Here’s the publishing process:

On my computer, I have a folder called “blog-annotations”. This is a version controlled repository of data written in Notation-3. I’ve got a few shell scripts to help manage this: one to copy a template into place and open up Vim, for instance.

The N3 is turned into RDF/XML using rapper, the command line utility that comes with Redland. This was a bit of a fiddle to get right, but has the dramatic upside of being ridiculously fast. I’ve used Cwm for converting N3 into RDF/XML before, but Redland/rapper is so much faster. I mean, noticeably faster even with only a small amount of data. So that rocks.

Once the RDF/XML is sitting there in a folder, I have a Python script which parses the RDF/XML and produces HTML. Basically, what that does is build up a Python dictionary of the various bits of the RDF/XML I want. Why this intermediate step? Well, I’m publishing the RDF as RDF/XML, so I need the data in RDF. But, also, the RDF contains pointers across the web. My script can follow those pointers if I want it to and retrieve up-to-date information and build up the HTML description using that. Each of these steps is made ridiculously easy by the Python library “rdflib”. It’s a bit of a fiddle to learn how to use it, and the code you end up writing isn’t exactly pretty, but it works pretty well.

Then the rest of the process is uploading the .rdf and .html files to an agreed place. I use scp -r for this. Some may prefer rsync or unison or WebDAV even – god forbid – crappy old insecure FTP.1

Then it is pretty easy to get the HTML onto the website: Ajax. Except the cross-domain policy stuff: the HTML is hosted on “tommorris.org” while the blog is on “blog.tommorris.org”. So, I setup a proxy that would take the HTML and serve it up as JSON-P. The folks at Yahoo! call this process JSON-P-X. I used a script called Simple PHP Proxy on the server and a bit of jQuery on the client side.

Now, what sort of data am I using to get the metadata on tags? First of all, I’m adding rdfs:comment properties (with an @en language tag, of course) which provide a short description of the topic in question.

To link to other entities on the web of linked data, I’m using owl:sameAs. This is primarily to link to things like dbpedia and dewey.info. I might also start adding Library of Congress Subject Headings, and maybe Freebase and Yago, OpenCyc and whatever else comes to mind. For people, obviously, there are a variety of databases to link to: MusicBrainz, Bibliographica, OpenLibrary and LIBRIS (for books and music), BBC (for TV people) and so on. And for academic and scientific stuff, there’s plenty of databases to link to.

Now, one of the real benefits of this kind of approach is that I can use a tag in my own unique way. Take the tag “pepsi”. I don’t drink cola drinks like Pepsi Cola or Coca-Cola. In the few instances I’m likely to ever use the word “pepsi”, it is probably regarding the actions of the makers of Pepsi Cola, PepsiCo. So if you look at the tag page for pepsi, it links to the Wikipedia page for PepsiCo. The RDF that sits underneath it says that the skos:Concept that accompanies the tag page is the owl:sameAs the dbpedia resource for PepsiCo. I say “pepsi”, I mean PepsiCo.

Unlike some of the folksonomy-are-awesome crew I don’t think that when I say “pepsi” I necessarily mean the same thing as you mean when you say “pepsi”. If this was a blog reviewing cola drinks, I might say “pepsi” to mean the actual drink rather than the manufacturer of the drink. In that case, I’d point it to a different meaning. And unlike on Wikipedia, I don’t have to do all this “Pepsi (20th century drinks manufacturing company in the United States)” (a slight exaggeration of course). I use the most natural phrase when tagging the stuff, but underneath it points to an exact meaning.

To link to documents about the skos:Concept (i.e. the tag) I am using dbpprop:reference, which is what Dbpedia uses to link resources to documents about the resource. That seems easy enough. The Python script then decides which of these are going on the tag page. Currently, it looks for certain whitelisted domains but eventually it will probably use some slightly more complex rule system for determining this. I’m not currently storing link titles, which is probably a bit silly. I’m looking into using POWDER as a way of storing these rules about which domains I link to.

What else am I doing? I’m using SKOS to link together tags on broader and narrower categories, and related categories. I’m not yet surfacing this in the HTML, but soon if you go to “metaphysics”, it’ll have a link up the hierarchical chain to “philosophy” and one down to “ontology”. If you then go to “idiocy”, you may see a related link to “bullshit”. I’m not sure whether to present the skos:broader, skos:narrower and skos:related categories as just an amalgam of “related tags” or to separate them out. I’m not sure what the right design is.

I’m not currently doing any inference on the data. If A has a skos:narrower relationship to B, I’m not inferring that B has a skos:narrower relationship to A. I may start doing this in the future, but then I’d have to basically add an OWL reasoner to the build process which I don’t really want to do. If I do this, I’m not sure whether I want to use something like Pellet or FaCT++ or just write a little Python script that does it for me.

One thing I’m not sure about also is there are some sort of ‘category’ tags I use like “hackdiary”. Unlike something like “pepsi”, saying something is “hackdiary” isn’t saying it is related to some entity, and is barely even a concept. It is just really a label for a set of documents that group them together. At most, it may be possible to say that (∀x)(x has tag “hackdiary” → Fx ∧ Gx ∧ Hx etc.) but I’m not sure that makes something a skos:Concept. But to encode that sort of thing I’d need to use something like RIF which is too heavyweight for my blog’s tags! Instead, I’ll probably just have a rather bare skos:Concept for those, pretty much just defining an rdfs:comment and that’s about it. Much as lots of predicate logic is fun for the philosopher in me, the programmer in me is saying “no, stop, too much already”.2

I should note that this is all very alpha at the moment: it is part of a plan to add lots more metadata to my blog. Eventually, I may make it so that I can add metadata to posts as well as tags.

The code for the project is up at:

http://code.tommorris.org/blog-annotations/

I’d like to express how awesome Python is again. Although I haven’t used Python for so long that I still need about four or five tabs open in my browser to docs.python.org to get anything done, the language really is quite superb.

  1. Stop. Please just stop with the FTP thing now. SCP/SFTP is what you need. FTP needs to die.

  2. I know, critics of the Semantic Web will find the idea that there is a limit to the amount of complexity people like me will take is a laughable idea. I don’t touch WS-* with a bargepole. And I steer cleer of reasoners, complex OWL logic and much else. This makes me pretty much standard for Linked Data hackers.


Hack diary: Vim, Command-T and the Linux escape hatch

So, Apple announced today that Mac OS is going to get an ‘app store’. It already has one, of course: homebrew. This is one of those signs that Apple are going to start fiddling with the Mac platform, and so I thought I’d check out how well I can get on with my Linux setup (netbook: running Xubuntu 10.04).

I managed to get the Fn+F3 command working so that I can type without the trackpad jiggling around, following the instructions found here.

The one thing I’ve been missing from Vim is TextMate’s Cmd+T functionality. I looked at PeepOpen which isn’t open source and is Mac only, but very very cool (except, gah, it uses menu bar icons, my personal bugbear with OS X). That doesn’t help me if I want to run away from OS X if Steve Jobs decides to become the tyrant he seems always on the borderline of doing.

Instead, there’s Command-T which isn’t quite as pretty, but is open source and is cross-platform.

There’s some interesting discussion about Command-T/PeepOpen/FuzzyFinder Vim plugins and Mac-to-Linux cross-platform issues in this Hacker News thread.

I also came across an interesting post called Configuring Vim right. I haven’t done all of the things in it, nor do I endorse all the advice. But there’s some good suggestions in there that’ll probably make its way into my vimrc file.

This is all getting a bit complicated now: I’ve got a Mac laptop, a Linux desktop machine, an account on a shared Hackintosh, a Linux netbook and so on. So, I’ve started working on a little project called ‘buildup’. It is a set of post-OS-install setup scripts that are there to be run on either OS X or Ubuntu/Debian Linux. Using apt-get and/or homebrew, it installs all the packages I need, and has all my dotfiles which can be installed by running a script called something imaginative like “dotfiles”. It has been a fun little project because it has allowed me to offload a lot of tacit knowledge about the crapness of all build/packaging systems into a trusted, version controlled script.

It has also been fun because it has let me experience two tools I haven’t really used much: Python and Mercurial. I’m usually a Git user and I generally prefer using Ruby over Python (despite a number of significant shortcomings I think Ruby has). The reason I’ve done the project in Python is because Python has become a sort of lowest common denominator scripting language on all computers. Both OS X and most sane Linux distros come with some version of Python pre-installed. It may be a crappy out-of-date one with a completely arse-over-backwards relationship between the system package manager and the language package manager, but it is there nonetheless. Ruby is there on OS X, but Python is more universal in Linuxland (partly because a lot of the default stuff that comes in GNOME and Ubuntu is written in Python, for instnace).

It is slightly annoying because the sort of thing I’m doing would really match what Rake does. But if Ruby isn’t installed by default on the machine, you can bet RubyGems and Rake aren’t. So, Python. I might look into Waf but I’m actually quite happy with using just plain Python.

The other thing I’m doing with Buildup is using Mercurial. As I said, I’m a Git user mostly, but I’m trying Mercurial on this project because it is a private project and since Atlassian bought BitBucket, they are now offering unlimited free private Mercurial repositories while GitHub charge for hosting private repos. Mercurial is okay. It is doing what I need, but my mind has been so warped into the ways of Git that I’m finding it a bit hard to think in the way Mercurial is making me. This is not a problem because Buildup is not very complicated, nor am I collaborating with anyone. The things which annoy me about Mercurial are really the lack of an equivalent thing to git commit --amend and the lack of an index. I really like the index. I’m told you can get something like an index back by installing some plugin or the other, but I can’t be bothered. I’m using Git for public projects and for work, and I’m just using Mercurial and BitBucket for a few small private projects. I’m not dissing Mercurial at all though: it is an excellent piece of software, but I’m just a bit too familiar with Git and I haven’t learned Mercurial in any depth.

One of the other things I may be using Mercurial for is for dissertation work if and when that starts happening. Private repositories plus LaTeX, BibTeX and possibly LyX make a happy philosopher. Much happier than some nonsense with Word or OpenOffice or whatnot. I really need to learn LaTeX properly: LyX is okay, but it’d be nice to churn it out in Vim (or Emacs, I hear it is good for writing LaTeX) rather than be reliant on an often rather temperamental GUI app.

Anyway, I’m learning lots in Linux using it on the netbook. As I said, I’m using Xubuntu, and Xfce is really the least unpleasant desktop environment I’ve found. I tried Awesome a while back, and Xfce is about halfway between Awesome and GNOME. It reminds me of Windows 95: it is pretty lightweight and uncluttered. I’ve already learned about a whole stash of keyboard commands. Alt+F10 (maximize) and Alt+F11 (full screen) are worth knowing. I’m using GNOME Do, but I seem to have two ‘competing’ terminal emulators (the Xfce Terminal and GNOME Terminal) installed both called ‘Terminal’ in GNOME Do, one of which isn’t actually set up properly. That has been a bit confusing. It’d be nice if I could set it to just always load Xfce Terminal. (I got help from some Xfce developers on IRC a while back on how to make Xfce Terminal have really short tabs, which is kind of useful on the netbook.)

As for the other stuff on Linux: I can’t stand Empathy. I’ll probably replace it with something that runs in the shell. The annoying thing is that the open source community can be proud that the best IM app ever made is in fact open source. It just happens to run on OS X and be called Adium. Heh. And Transmission too: that was on OS X before other platforms if I recall correctly.

Will I end up moving to Linux? Maybe. Maybe not. But having it there as an option is important to me. It is an escape hatch if Mac OS X veers too far off into App Store, DRM land. If being an Apple customer becomes more of a burden than it is being a joy, it is very, very useful to have an escape hatch. For me, it is like assisted dying. I once read that a lot of people who seek the services of euthanasia clinics like Dignitas never use them, but it serves as a very useful psychological tool to know that if they want to commit suicide, they have the right to do so. A slightly less gloomy example perhaps: in the argument over social contracts, I think Hume said that only when you can emigrate can you claim the possibility of a tacit social contract. If you can’t leave, you can’t really be said to be consenting (tacitly or otherwise) to being there.

I’m planning to work over the next six months or so to see where I am tied into OS X and then either find or build escape hatches. Not that I necessarily plan to stop using OS X, but just in case. The first of these is an application I use called CDFinder which is a “digital asset manager”. I’ve looked and tried to find a good open source alternative to CDFinder but I just can’t. I’ve looked into building my own, but I couldn’t really find a good way of storing the data. These days, I think the answer may be here in all these NoSQL stores. I can see something like MongoDB or CouchDB being a good example of a reasonably lightweight way of storing an index of CD and DVD volumes. The key to something like this is making it nice and flexible: an open source backend on a platform like MongoDB and then provide a simple web front-end with something like Sinatra, and on Linux maybe a GTK app either written in C/C++, Python or C#. It may be possible to separate out the client side and the server side: the client side simply needs to run the indexing process on the disk and produce a data file representing the disk’s contents. Then it pushes it up to the server which could be running on another machine. You could then query it over the web, or from a terminal or anywhere.

I’m obviously tied into iTunes, which is a bit of an annoyance. Everyone bitches and moans about iTunes, but I think it is actually one of the better things Apple does. The people who complain about user experience don’t seem to actually provide many reasons why iTunes is such a terrible piece of software, they just sort of expect me to know by osmosis that it sucks. The lack of iTunes-style resume support is one of the reasons I’m still stuck into iTunes. It looks like a new bug has been raised for Banshee requesting this. One day it’ll happen: probably the time when I learn C# properly and code it myself damnit.

There’s other stuff I’ve been hacking on: some of it is behind the company firewall, some of it is turning up in the Open Plaques SVN. All good fun.


Learning Python the Hard Way is Zed Shaw's introduction to Python for non-programmers. Think of it as being like the Poignant Guide to Ruby but written by someone who will tell you to shut the fuck up and write some code rather than throw OMG SO RANDOM memes at you. I approve wholeheartedly.

I already know how to code, and although I'm a bit out of practice with Python, I have no need of this personally. But it is worth scanning through for anyone who ever needs to teach others to code: it'll give you some valuable insights into how to communicate various programming concepts to newbies. In the same category, I have to say Chris Pine's Ruby intro Learn to Program isn't bad, and while not quite as absolute beginner friendly as other texts, David Pollak's Beginning Scala is good too.

(Personally, my introduction to programming was the BBC BASIC manual. Fortunately, early exposure to LOGO should negate any damage that BASIC may have wrought, if you believe Dijkstra.)


Things.app, AppleScript and Ruby

Let me make a controversial statement: Apple should get rid of AppleScript and just replace it with Ruby (or Python, maybe - if it was Python, I'd be perfectly happy, although I'd prefer Ruby). Really, there is no contest. AppleScript is an atrocious mess of a language. But it's also bloody slow.

Here is my example. I use a task management app on Mac called Things.app. It's one of those Getting Things Done-inspired app. It's pretty nice, and it syncs well with the iPod touch. They have an AppleScript API, and so a lot of people have started building little AppleScripts that do stuff with it. I started using Today Reminder. I downloaded it, and had to hack around a bit with it to get it to work. But here was the main problem: it's slow and intrusive. I set it to run every half an hour. Every half hour, an application would spawn up, grab focus away from whatever I was doing, replace my menu bar with it's own and then do whatever it needs to do. I mean, all it does is pop up some Growl notifications. Why does it need to do that and spawn it's own application chrome? It also had a problem with crashing.

It's been bugging me for weeks. I've told myself that I'm going to sort it out. I finally have, and here are the results - things_growl_today.rb. As a point of comparison, here is the original AppleScript version. It's a perfect case study in the unadulterated shitness of AppleScript. The Ruby version is less than half the length and a hell of a lot more readable. Okay, the Ruby version does something slightly different from what the AppleScript version does - but the Ruby version does precisely what I want, so that doesn't matter. I'll probably hack it a bit later to use proper process objects (.NET kids got this right: System.Diagnostics.Process - if you are Apple or Sun or anyone else maintaining a programming platform, steal this fucking thing and implement the same API. If you are using an OO language, ps hacking ought to be a thing of the past.)

A lot of people seem to think that because AppleScript is written in pseudo-English, it's easier to understand. I don't buy this. But I may just be used to reading Ruby and have developed some particular neurosis about AppleScript. Are my intuitions right? The Ruby version is a lot nicer, right? Is there anyone out there who really understands the AppleScript version but doesn't understand the Ruby version? The only thing that seems particularly complex about the Ruby version is the regex on line 4. You'd have to know about ps, I guess. But in AppleScript land, you have to know about System Events and the dictionary around application processes and so on.

If my intuitions about AppleScript are correct, is there any good reason why AppleScript is not dead? The language is shit. The editor sucks (seriously: you can't save if the code doesn't compile? What kind of retardation is that?). The runtimes it creates are slow. Why don't Apple just kill it? There are plenty of awesome languages you could use instead: Ruby, Python, Lua. Whatever. Just get on with it and stop moaning. I firmly believe in the idea of AppleScript: users should be able to easily hack their applications and desktop environment together in a better way than the designers of Windows/Mac OS/GNOME etc. originally planned. The best way for Apple to pursue the AppleScript dream is to kill AppleScript and build something good.

And, yes, I know I'm wrong. You don't have to remind me.


Shit, it’s 1:15am and I’ve got to get up in the morning for OverTheAir. Need paracetamol and for Python to behave.