Building a higher-level query API: the right way to use Django's ORM

This blog post is based on a talk given at the Brighton Python User Group on April 10th, 2012.

Note: this post refers to a very old version of Django. The functionality provided by the libraries below is now built in to Django.

Summary

In this article, I'm going to argue that using Django's low-level ORM query methods (filter, order_by etc) directly in a view is (usually) an anti-pattern. Instead, we should be building custom domain-specific query APIs at the level of the model layer, where our business logic belongs. Django doesn't make this particularly easy, but by taking a deep-dive into the internals of the ORM, I'll show you some neat ways to accomplish it.

Overview

When writing Django applications, we're accustomed to adding methods to our models to encapsulate business logic and hide implementation details. This approach feels completely natural and obvious, and indeed is used liberally throughout Django's built-in apps:

>>> from django.contrib.auth.models import User
>>> user = User.objects.get(pk=5)
>>> user.set_password('super-sekrit')
>>> user.save()

Here set_password is a method defined on the django.contrib.auth.models.User model, which hides the implementation details of password hashing. The code looks something like this (edited for clarity):

from django.contrib.auth.hashers import make_password

class User(models.Model):

    # fields go here..

    def set_password(self, raw_password):
        self.password = make_password(raw_password)

We're building a domain-specific API on top of the generic, low-level object-relational mapping tools that Django gives us. This is basic domain modelling: we're increasing the level of abstraction, making any code that interacts with this API less verbose. The result is more robust, reusable and (most importantly) readable code.

So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?

A toy problem: the Todo List

To illustrate the approach, we're going to use a simple todo list app. The usual caveats apply: this is a toy problem. It's hard to show a real-world, useful example without huge piles of code. Don't concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.

Here's our application's models.py:

from django.db import models

PRIORITY_CHOICES = [(1, 'High'), (2, 'Low')]

class Todo(models.Model):
    content = models.CharField(max_length=100)
    is_done = models.BooleanField(default=False)
    owner = models.ForeignKey('auth.User')
    priority = models.IntegerField(choices=PRIORITY_CHOICES,
                                   default=1)

Now, let's consider a query we might want to make across this data. Say we're creating a view for the dashboard of our Todo app. We want to show all of the incomplete, high-priority Todos that exist for the currently logged in user. Here's our first stab at the code:

def dashboard(request):

    todos = Todo.objects.filter(
        owner=request.user
    ).filter(
        is_done=False
    ).filter(
        priority=1
    )

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

(And yes, I know this can be written as request.user.todo_set.filter(is_done=False, priority=1). Remember, toy example!)

Why is this bad?

First, it's verbose. Seven lines (depending on how you prefer to deal with newlines in chained method calls) just to pull out the rows we care about. And, of course, this is just a toy example. Real-world ORM code can be much more complicated.
It leaks implementation details. Code that interacts with the model needs to know that there exists a property called is_done, and that it's a BooleanField. If you change the implementation (perhaps you replace the is_done boolean with a status field that can have multiple values) then this code will break.
It's opaque - the meaning or intent behind it is not clear at a glance (which can be summarised as "it's hard to read").
Finally, it has the potential to be repetetive. Imagine you are given a new requirement: write a management command, called via cron every week, to email all users their list of incomplete, high-priority todo items. You'd have to essentially copy-and-paste these seven lines into your new script. Not very DRY.

Let's summarise this with a bold claim: using low-level ORM code directly in a view is (usually) an anti-pattern.

So, how can we improve on this?

Managers and QuerySets

Before diving into solutions, we're going to take a slight detour to cover some essential concepts.

Django has two intimately-related constructs related to table-level operations: managers and querysets.

A manager (an instance of django.db.models.manager.Manager) is described as "the interface through which database query operations are provided to Django models." A model's Manager is the gateway to table-level functionality in the ORM (model instances generally give you row-level functionality). Every model class is given a default manager, called objects.

A queryset (django.db.models.query.QuerySet) represents "a collection of objects from your database." It is essentially a lazily-evaluated abstraction of the result of a SELECT query, and can be filtered, ordered and generally manipulated to restrict or modify the set of rows it represents. It's responsible for creating and manipulating django.db.models.sql.query.Query instances, which are compiled into actual SQL queries by the database backends.

Phew. Confused? While the distinction between a Manager and a QuerySet can be explained if you're deeply familiar with the internals of the ORM, it's far from intuitive, especially for beginners.

This confusion is made worse by the fact that the familiar Manager API isn't quite what it seems...

The `Manager` API is a lie

QuerySet methods are chainable. Each call to a QuerySet method (such as filter) returns a cloned version of the original queryset, ready for another method to be called. This fluent interface is part of the beauty of Django's ORM.

But the fact that Model.objects is a Manager (not a QuerySet) presents a problem: we need to start our chain of method calls on objects, but continue the chain on the resulting QuerySet.

So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of the QuerySet methods are reimplemented on the Manager. The versions of these methods on the Manager simply proxy to a newly-created QuerySet via self.get_query_set():

 class Manager(object):

    # SNIP some housekeeping stuff..

    def get_query_set(self):
        return QuerySet(self.model, using=self._db)

    def all(self):
        return self.get_query_set()

    def count(self):
        return self.get_query_set().count()

    def filter(self, *args, **kwargs):
        return self.get_query_set().filter(*args, **kwargs)

    # and so on for 100+ lines...

To see the full horror, take a look at the Manager source code.

We'll return to this API sleight-of-hand shortly...

Back to the todo list

So, let's get back to solving our problem of cleaning up a messy query API. The approach recommended by Django's documentation is to define custom Manager subclasses and attach them to your models.

You can either add multiple extra managers to a model, or you can redefine objects, maintaining a single manager but adding your own custom methods.

Let's try each of these approaches with our Todo application.

Approach 1: multiple custom Managers

class IncompleteTodoManager(models.Manager):
    def get_query_set(self):
        return super(TodoManager, self).get_query_set().filter(is_done=False)

class HighPriorityTodoManager(models.Manager):
    def get_query_set(self):
        return super(TodoManager, self).get_query_set().filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = models.Manager() # the default manager

    # attach our custom managers:
    incomplete = models.IncompleteTodoManager()
    high_priority = models.HighPriorityTodoManager()

The API this gives us looks like this:

>>> Todo.incomplete.all()
>>> Todo.high_priority.all()

Unfortunately, there are several big problems with this approach.

The implementation is very verbose. You need to define an entire class for each custom piece of query functionality.
It clutters your model's namespace. Django developers are used to thinking of Model.objects as the "gateway" to the table. It's a namespace under which all table-level operations are collected. It'd be a shame to lose this clear convention.
Here's the real deal breaker: it's not chainable. There's no way of combining the managers: to get todos which are incomplete and high-priority, we're back to low-level ORM code: either Todo.incomplete.filter(priority=1) or Todo.high_priority.filter(is_done=False).

I think these downsides completely outweigh any benefits of this approach, and having multiple managers on a model is almost always a bad idea.

Approach 2: Manager methods

So, let's try the other Django-sanctioned approach: multiple methods on a single custom Manager.

class TodoManager(models.Manager):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = TodoManager()

Our API now looks like this:

>>> Todo.objects.incomplete()
>>> Todo.objects.high_priority()

This is better. It's much less verbose (only one class definition) and the query methods remain namespaced nicely under objects.

It's still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can't then call Todo.objects.incomplete().high_priority(). We're stuck with Todo.objects.incomplete().filter(is_done=False). Not much use.

Approach 3: custom QuerySet

Now we're in uncharted territory. You won't find this in Django's documentation...

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class TodoManager(models.Manager):
    def get_query_set(self):
        return TodoQuerySet(self.model, using=self._db)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = TodoManager()

Here's what this looks like from the point of view of code that calls it:

>>> Todo.objects.get_query_set().incomplete()
>>> Todo.objects.get_query_set().high_priority()
>>> # (or)
>>> Todo.objects.all().incomplete()
>>> Todo.objects.all().high_priority()

We're nearly there! This is not much more verbose than Approach 2, gives the same benefits, and additionally (drumroll please...) it's chainable!

>>> Todo.objects.all().incomplete().high_priority()

However, it's still not perfect. The custom Manager is nothing more than boilerplate, and that all() is a wart, which is annoying to type but more importantly is inconsistent - it makes our code look weird.

Approach 3a: copy Django, proxy everything

Now our discussion of the "Manager API lie" above becomes useful: we know how to fix this problem. We simply redefine all of our QuerySet methods on the Manager, and proxy them back to our custom QuerySet:

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class TodoManager(models.Manager):
    def get_query_set(self):
        return TodoQuerySet(self.model, using=self._db)

    def incomplete(self):
        return self.get_query_set().incomplete()

    def high_priority(self):
        return self.get_query_set().high_priority()

This gives us exactly the API we want:

>>> Todo.objects.incomplete().high_priority() # yay!

Except that's a lot of typing, and very un-DRY. Every time you add a new method to your QuerySet, or change the signature of an existing method, you have to remember to make the same change on your Manager, or it won't work properly. This is a recipe for problems.

Approach 3b: django-model-utils

Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app called django-model-utils. Just run pip install django-model-utils, then..

from model_utils.managers import PassThroughManager

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = PassThroughManager.for_queryset_class(TodoQuerySet)()

This is much nicer. We simply define our custom QuerySet subclass as before, and attach it to our model via the PassThroughManager class provided by django-model-utils.

The PassThroughManager works by implementing the \_\_getattr\_\_ method, which intercepts calls to non-existing methods and automatically proxies them to the QuerySet. There's a bit of careful checking to ensure that we don't get infinite recursion on some properties (which is why I recommend using the tried-and-tested implementation supplied by django-model-utils rather than hand-rolling your own).

Note: this functionality is now built in to Django.

How does this help?

Remember that view code from earlier?

def dashboard(request):

    todos = Todo.objects.filter(
        owner=request.user
    ).filter(
        is_done=False
    ).filter(
        priority=1
    )

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

With a bit of work, we could make it look something like this:

def dashboard(request):

    todos = Todo.objects.for_user(
        request.user
    ).incomplete().high_priority()

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

Hopefully you'll agree that this second version is much simpler, clearer and more readable than the first.

Can Django help?

Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there's an associated ticket. Zachary Voase proposed the following:

class TodoManager(models.Manager):

    @models.querymethod
    def incomplete(query):
        return query.filter(is_done=False)

This single decorated method definition would make incomplete magically available on both the Manager and the QuerySet.

Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a QuerySet subclass (rather than a Manager subclass) is a better, simpler approach.

Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?

I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.

So, to recap:

Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating custom QuerySet APIs and attaching them to your models with a PassThroughManager from django-model-utils gives you the following benefits:

Makes code less verbose, and more robust.
Increases DRYness, raises abstraction level.
Pushes business logic into the domain model layer where it belongs.

Thanks for reading!

Note: this post refers to a very old version of Django. The functionality provided by django-model-utils is now built in to Django.

If you're interested in getting your teeth into some big Django projects (as well as all sorts of other interesting stuff), we're hiring.

View all insights