Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

On javascript, python, security, django, xss, ...

 

Recent entries

jQuery style chaining with the Django ORM 16 days ago

Django’s ORM is, in my opinion, the unsung gem of the framework. For the subset of SQL that’s used in most web applications it’s very hard to beat. It’s a beautiful piece of API design, and I tip my hat to the people who designed and built it.

Lazy evaluation

If you haven’t spent much time with the ORM, two key features are lazy evaluation and chaining. Consider the following statement:

entries = Entry.objects.all()

Assuming you have created an Entry model of some sort, the above statement will create a Django QuerySet object representing all of the entries in the database. It will not result in the execution of any SQL—QuerySets are lazily evaluated, and are only executed at the last possible moment. The most common situation in which SQL will be executed is when the object is used for iteration:

for entry in entries:
    print entry.title

This usually happens in a template:

<ul>
{% for entry in entries %}
  <li>{{ entry.title }}</li>
{% endfor %}
</ul>

Lazy evaluation works nicely with template fragment caching—even if you pass a QuerySet to a template it won’t be executed if the fragment it is used in can be served from the cache.

You can modify QuerySets as many times as you like before they are executed:

entries = Entry.objects.all()
today = datetime.date.today()
entries_this_year = entries.filter(
    posted__year = today.year
)
entries_last_year = entries.filter(
    posted__year = today.year - 1
)

Again, no SQL has been executed, but we now have two QuerySets which, when iterated, will produce the desired result.

Chaining

Chaining comes in when you want to apply multiple modifications to a QuerySet. Here are blog entries from 2006 that weren’t posted in January:

Entry.objects.filter(
    posted__year = 2006
).exclude(posted__month = 1)

And here’s entries from that year posted to the category named “Personal”, ordered by title:

Entry.objects.filter(
    posted__year = 2006
).filter(
    category__name = "Personal"
).order_by('title')

The above can also be expressed like this:

Entry.objects.filter(
    posted__year = 2006,
    category__name = "Personal"
).order_by('title')

Chaining in jQuery

The parallels to jQuery are pretty clear. The jQuery API is built around chaining, and the jQuery animation library even uses a form of lazy evaluation to automatically queue up effects to run in sequence:

jQuery('div#message').addClass(
	'borderfade'
).animate({
   'borderWidth': '+10px'
}, 1000).fadeOut();

One of the neatest things about jQuery is the plugin model, which takes advantage of JavaScript’s prototype inheritance and makes it trivially easy to add new chainable methods. If we wanted to package the above dumb effect up as a plugin, we could do so like this:

jQuery.fn.dumbBorderFade = function() {
    return this.addClass(
        'borderfade'
    ).animate({
       'borderWidth': '+10px'
    }, 1000).fadeOut();
};

Now we can apply it to an element like so:

jQuery('div#message').dumbBorderFade();

Custom QuerySet methods in Django

Django supports adding custom methods for accessing the ORM through the ability to implement a custom Manager. In the above examples, Entry.objects is the Manager. The downside of this approach is that methods added to a manager can only be used at the beginning of the chain.

Luckily, Managers also provide a hook for returning a custom QuerySet. This means we can create our own QuerySet subclass and add new methods to it, in a way that’s reminiscent of jQuery:

from django.db import models
from django.db.models.query import QuerySet
import datetime

class EntryQuerySet(QuerySet):
    def on_date(self, date):
        next = date + datetime.timedelta(days = 1)
        return self.filter(
            posted__gt = date,
            posted__lt = next
        )

class EntryManager(models.Manager):
    def get_query_set(self):
        return EntryQuerySet(self.model)

class Entry(models.Model):
    ...
    objects = EntryManager()

The above gives us a new method on the QuerySets returned by Entry.objects called on_date(), which lets us filter entries down to those posted on a specific date. Now we can run queries like the following:

Entry.objects.filter(
    category__name = 'Personal'
).on_date(datetime.date(2008, 5, 1))

Reducing the boilerplate

This method works fine, but it requires quite a bit of boilerplate code—a QuerySet subclass and a Manager subclass plus the wiring to pull them all together. Wouldn’t it be neat if you could declare the extra QuerySet methods inside the model definition itself?

It turns out you can, and it’s surprisingly easy. Here’s the syntax I came up with:

from django.db.models.query import QuerySet

class Entry(models.Model):
   ...
   objects = QuerySetManager()
   ...
   class QuerySet(QuerySet):
       def on_date(self, date):
           return self.filter(
               ...
           )

Here I’ve made the custom QuerySet class an inner class of the model definition. I’ve also replaced the default manager with a QuerySetManager. All this class does is return the QuerySet inner class for the current model from get_query_set. The implementation looks like this:

class QuerySetManager(models.Manager):
    def get_query_set(self):
        return self.model.QuerySet(self.model)

I’m pretty happy with this; it makes it trivial to add custom QuerySet methods and does so without any monkeypatching or deep reliance on Django ORM internals. I think the ease with which this can be achieved is a testament to the quality of the ORM API.

wikinear.com, OAuth and Fire Eagle one month ago

I’m pleased to announce wikinear.com. It’s a simple site that does just one thing: show you a list of the five Wikipedia pages that are geographically closest to your current location. It’s designed (or not-designed) to be used mainly from mobile phones.

You’ll need a Fire Eagle invitation code to use the site. I’ve got four spare; the first four comments to ask for one can have them my invites are all accounted for. If you don’t have a Fire Eagle account you’ll have to make do with this screenshot instead.

The idea for the site came from living in Oxford for a year. The city is full of beautiful old historic buildings (many of them colleges), but very few of them are labelled or signposted. With wikinear.com and a GPS hooked up to Fire Eagle, I can pull out my phone and see a list of the closest points of interest, plotted on a handy map.

Under the hood the site combines a number of interesting technologies: OAuth, Fire Eagle, GeoNames and the new Google Static Maps API.

OAuth

OAuth was originally designed to solve a problem with OpenID: in an authentication protocol based on browser redirects, how do you authenticate a desktop or command-line application? As it turned out, the solution to that problem solved a bunch of other problems that are unrelated to OpenID, so OAuth now exists as very much its own thing. In essence, it lets users delegate permission to perform actions on their behalf, without having to hand their regular authentication credentials (e.g. username and password) over to a third-party piece of software.

If you’ve ever used a Flickr application that sends you back to Flickr to ask permission to view your private photos you’ll understand what OAuth does straight away. Before OAuth, sites had to invent their own solutions to this problem—complete with smart security measures, their own UI flow and libraries for developers wishing to access their protected APIs. OAuth provides a ready-made solution, complete with tested libraries in a bunch of languages.

If you want to securely expose your user’s private data via an API, OAuth is a no-brainer. I expect to see a lot more of it over the next year.

Fire Eagle

Launched at ETech a few weeks ago, Fire Eagle is a service with enormous potential. You can watch Tom Coates explain it in ten minutes in this video from the conference, but the short version is that Fire Eagle acts as a location broker. It consists of two key OAuth-protected APIs: one for setting the geographical location of a user, and another for retrieving that location.

This leads to a neat separation of concerns. On the one hand are the applications that attempt to figure out your location—GPS receivers, WiFi maps, mobile phones that triangulate nearby cell towers, or even sites that know where you are because you told them (Dopplr and Upcoming, for example, or the Fire Eagle site itself). On the other hand are the applications that do something useful with your location—from restaurant review sites, traffic alert services, friend finders and ARGs down to trivial applications like wikinear.com.

As a developer, this is really exciting. I can build location-based services without having to solve the much bigger problem of figuring out where my users are. Even better, wikinear.com becomes incrementally more useful every time someone builds a new tool for passing location information to Fire Eagle, without me having to do anything at all.

Obviously privacy is a huge concern when dealing with this kind of data. That’s where the Fire Eagle application itself comes in: it provides a simple suite of tools for users to manage the applications that can access their location. Applications can be permitted to access different levels of accuracy or disabled entirely, and there’s a “Hide” button for disabling all applications at once.

Disclaimer: I worked on an early prototype of Fire Eagle as my last project at Yahoo! before leaving in January 2007, but the product that has launched has changed enormously and is entirely the work of the current Fire Eagle team. wikinear.com is inspired by part of that early prototype.

Wikipedia and GeoNames

Wikipedia has a thriving community of geo-hackers, mainly focused around the Maps, Geographical coordinates and Wikipedia-World wiki projects. Many Wikipedia pages (Brighton, for example) have their co-ordinates in the top-right, added using a bewildering array of macros and markup extensions. You can browse through the huge collection of geotagged pages using this KML-powered Google Maps tool—zoom in and wait a few seconds to load in more markers.

The wonderful GeoNames (also used on djangopeople.net) includes an API for querying Wikipedia by location, based on 610,000 articles extracted from a Wikipedia data dump. This was a huge relief when I found it, as “order by distance from X” is actually pretty tricky to do efficiently; I’ve used expanding bounding box searches in the past but I’d love to hear about more effective solutions.

Google Static Maps

A long-term criticism of the Google Maps API is that it requires JavaScript to display anything at all—once you’ve committed to using it, you’re going to have trouble implementing unobtrusive scripting (although you can work around the problem to some extent). Yahoo! Maps has long been better in this regard, but their map image API is a bit of a pain to use—you have to do an initial call to get back the URL to an image embedded in an XML file, then extract that URL and send it to the browser.

Launched last month, Google’s Static Maps API is a big improvement. As with Google Charts, you need only construct a URL to the image to have it dynamically generated on the fly. You can also specify markers, and optionally omit the initial latitude/longitude/zoom to indicate that you want a best fit for the markers you are displaying. There’s even a flag for a “mobile optimised” image which I’m using for wikinear.com.

Mixing it all together

Excluding templates, the entire application comes in at less than 200 lines of code and took around two hours to build. The only persistence is a couple of cookies for storing Fire Eagle tokens; Django’s database layer isn’t even configured (and user locations aren’t logged anywhere, which is great from a privacy point of view). I suppose it’s a classic mashup—Fire Eagle + OAuth + Wikipedia + GeoNames + Google Static Maps = wikinear.com. Despite its simplicity (or maybe because if it), I think it’s a neat demonstration of the kind of applications Fire Eagle enables.

Django People: OpenID and microformats three months ago

In hindsight, it was a mistake to launch Django People without support for OpenID. It was on the original feature list, but in the end I decided to cut any feature that wasn’t completely essential in order to get the site launched before it drowned in an ocean of “wouldn’t-it-be-cool-ifs”.

I thought that, once launched, the site would see a small amount of activity from a few interested parties and I’d have plenty of time to catch up on the feature backlog. What I didn’t expect was that over 750 people would create profiles within the first 24 hours!

So, I spent a few hours this evening integrating my current development version of django-openid, which thankfully had about 80% of the code needed to integrate with Django’s built-in authentication mechanism already written. Sadly the other 20% is either incomplete or a bit of a mess, but I’ve checked it in to a branch on Google Code for anyone who’s interested.

Anyway, there are a few new features on the site of interest to OpenID users:

  1. When signing up for a new account, you now have the option to start by signing in with an OpenID. If you do this, you’ll be able to complete the signup form without having to pick a password. If your OpenID provider supports simple registration the name, e-mail address and username fields will be filled in for you.
  2. If you already have an existing account, you can associate one or more OpenIDs with that account. You’ll then be able to use any of them to sign in to the account. Why multiple OpenIDs instead of just one? Two reasons: firstly, it opens the potential for doing interesting things with multiple OpenIDs from different providers in the future; secondly, it gives you a fallback for if one of your OpenID providers becomes unavailable.
  3. You can freely add and remove OpenIDs from your associations, with one exception: the site won’t let you delete your last OpenID if your account doesn’t also have a password associated with it, to prevent you from locking yourself out.
  4. While I decided that I didn’t want Django People to become yet another OpenID provider, I do want to give people the ability to use their profile page on the site as an OpenID—so that they can prove that they own it (see my recent post on identity projection). To that end, the new account settings page lets advanced OpenID users set up an openid.server and openid.delegate for their profile page, as described in my blog entry from just over a year ago.

One caveat: the site only supports OpenID 1.1, at least for the moment. I had originally planned to go for OpenID 2.0, but demand was such that I decided to get what I had up and running rather than digging in to the OpenID 2.0 libraries.

Microformats

While I was messing around with OpenID, Natalie was updating the site’s templates to clean up the crufty code I’d introduced and add some microformatted goodness. The site now uses hCard where you would expect it (country listing pages, skill listing pages and the new search interface) and the profile pages have been updated with a healthy dose of XFN (just rel=“me”, since there isn’t a relevant microformat for “people who live nearby”) and Rel-Tag. On Jeremy Keith’s suggestion, the profile pages also use hResume—all the more reason to add the Django projects you’ve worked on to your profile’s portfolio.

As usual, post feedback and bug reports as comments on this entry.

Elsewhere

Today

HSBC internet banking is a joke

  • A McAfee spokeswoman said the company rates XSS vulnerabilities less severe than SQL injections and other types of security bugs. “Currently, the presence of an XSS vulnerability does not cause a web site to fail HackerSafe certification,” she said. “When McAfee identifies XSS, it notifies its customers and educates them about XSS vulnerabilities.”

    Dan Goodin 0

Yesterday

  • Firebug Command Line API. Another thing I didn’t know about Firebug: you can set a breakpoint at the start of a function with “debug(fn)” and log all calls to it with “monitor(fn)”. 0

15th May 2008

  • Using Git as a versioned data store in Python. gitshelve supports the same interface as Python’s built-in shelve module but stores things to a versioned Git repository instead of just a pickled dictionary. I’ve been casually wondering what a Git-powered CMS would look like. 5
  • Cubescape. Beautiful isometric cube building tool by Cameron Adams, written in JavaScript and jQuery. 0
  • Crossdomain.xml Invites Cross-site Mayhem. A useful reminder that crossdomain.xml files should be treated with extreme caution. Allowing access from * makes it impossible to protect your site against CSRF attacks, and even allowing from a “circle of trust” of domains can be fatal if just one of those domains has an XSS hole. 0
  • Engineering @ Facebook: Facebook Chat. The new Facebook Chat uses Comet (long polling with a hidden iframe) against a custom web / chat server written in Erlang, designed to handle a launch to all 70 million users at once. It was tested using a “dark launch” period where live pages simulated chat request traffic without showing any visible UI. 0

14th May 2008

  • goog/useragent/iphoto.js. The Goog library includes code to detect the user’s installed version of iPhoto, based on reverse engineering the Mac.com Gallery RSS feeds. This has Mark Pilgrim written all over it. 0
  • Doctype: /trunk/goog. Google’s newly released JavaScript library (pure JavaScript, so more along the lines of YUI and jQuery than GWT). I haven’t found the documentation for it yet, but the code is extremely well commented. UPDATE: The documentation is spread throughout Doctype. 0
  • Doctype on Google Code. Alternative way of browsing Google Doctype—if you link to articles here instead of using the permalinks in the official version non-JavaScript user agents will be able to access the content you’ve linked to. 0
  • Google Doctype. So now we know what Mark Pilgrim’s been doing at Google... heading up a project to create an encyclopaedia of web development. The JavaScript UI for browsing it is a bit weird (though you do at least get real pages if you disable JavaScript in your browser). 3
  • Google Maps now shows photos and Wikipedia articles. Click the “More...” button. My first thought was “how do they get so many photo markers on the map?”—Firebug shows that they’re generating tiles on the server containing multiple photo markers, then when you click on one an Ajax call checks which photo is in that particular spot. 0
  • Django: security fix released. XSS hole in the Admin application’s login page—updates and patches are available for trunk, 0.96, 0.95 and 0.91. 0

13th May 2008

  • Session variables without cookies. Brilliant but terrifying hack—you can store up to 2 MB of data in window.name and it persists between multiple pages, even across domains. Doesn’t work with new tabs though, and storing JSON in it and eval()ing it is a bad idea—a malicious site could populate it before sending the user to you. 1
  • Graffletopia. Huge collection of free OmniGraffle stencils. 0
  • Django admin OmniGraffle stencil. Alex Lee put together a beautiful stencil for OmniGraffle containing all of the common UI elements seen in the Django admin interface, as a tool for wireframing. 1
  • Hey Google: any chance we can all build the social web together without requiring JavaScript?

    Me 8

  • Persevere adds Comet Support. Persevere sounds neat: a RESTful HTTP/JSON data store (the interface reminds me of CouchDB) which recently gained the ability to “subscribe” to a resource and receive notifications of updates via comet. 1
  • django-db-log. Middleware that logs Django exceptions to the database, using a clever scheme based on an MD5 of the traceback text to group duplicate errors in to batches. 0
  • Something you had, Something you forgot, Something you were

    Nick Mathewson 0

A django site