This blog post is based on a talk given at the Brighton Python User Group on April 10th, 2012.
Note: this post refers to a very old version of Django. The functionality provided by the libraries below is now built in to Django.
In this article, I’m going to argue that using Django’s low-level ORM query methods (filter, order_by etc) directly in a view is (usually) an anti-pattern. Instead, we should be building custom domain-specific query APIs at the level of the model layer, where our business logic belongs. Django doesn’t make this particularly easy, but by taking a deep-dive into the internals of the ORM, I’ll show you some neat ways to accomplish it.
When writing Django applications, we’re accustomed to adding methods to our models to encapsulate business logic and hide implementation details. This approach feels completely natural and obvious, and indeed is used liberally throughout Django’s built-in apps:
>>> from django.contrib.auth.models import User
>>> user = User.objects.get(pk=5)
>>> user.set_password('super-sekrit')
>>> user.save()
Here set_password is a method defined on the django.contrib.auth.models.User model, which hides the implementation details of password hashing. The code looks something like this (edited for clarity):
from django.contrib.auth.hashers import make_password
class User(models.Model):
# fields go here..
def set_password(self, raw_password):
self.password = make_password(raw_password)
We’re building a domain-specific API on top of the generic, low-level object-relational mapping tools that Django gives us. This is basic domain modelling: we’re increasing the level of abstraction, making any code that interacts with this API less verbose. The result is more robust, reusable and (most importantly) readable code.
So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?
To illustrate the approach, we’re going to use a simple todo list app. The usual caveats apply: this is a toy problem. It’s hard to show a real-world, useful example without huge piles of code. Don’t concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.
Here’s our application’s models.py:
from django.db import models
PRIORITY_CHOICES = [(1, 'High'), (2, 'Low')]
class Todo(models.Model):
content = models.CharField(max_length=100)
is_done = models.BooleanField(default=False)
owner = models.ForeignKey('auth.User')
priority = models.IntegerField(choices=PRIORITY_CHOICES,
default=1)
Now, let’s consider a query we might want to make across this data. Say we’re creating a view for the dashboard of our Todo app. We want to show all of the incomplete, high-priority Todos that exist for the currently logged in user. Here’s our first stab at the code:
def dashboard(request):
todos = Todo.objects.filter(
owner=request.user
).filter(
is_done=False
).filter(
priority=1
)
return render(request, 'todos/list.html', {
'todos': todos,
})
(And yes, I know this can be written as request.user.todo_set.filter(is_done=False, priority=1). Remember, toy example!)
is_done, and that it’s a BooleanField. If you change the implementation (perhaps you replace the is_done boolean with a status field that can have multiple values) then this code will break.cron every week, to email all users their list of incomplete, high-priority todo items. You’d have to essentially copy-and-paste these seven lines into your new script. Not very DRY.Let’s summarise this with a bold claim: using low-level ORM code directly in a view is (usually) an anti-pattern.
So, how can we improve on this?
Before diving into solutions, we’re going to take a slight detour to cover some essential concepts.
Django has two intimately-related constructs related to table-level operations: managers and querysets.
A manager (an instance of django.db.models.manager.Manager) is described as “the interface through which database query operations are provided to Django models.” A model’s Manager is the gateway to table-level functionality in the ORM (model instances generally give you row-level functionality). Every model class is given a default manager, called objects.
A queryset (django.db.models.query.QuerySet) represents “a collection of objects from your database.” It is essentially a lazily-evaluated abstraction of the result of a SELECT query, and can be filtered, ordered and generally manipulated to restrict or modify the set of rows it represents. It’s responsible for creating and manipulating django.db.models.sql.query.Query instances, which are compiled into actual SQL queries by the database backends.
Phew. Confused? While the distinction between a Manager and a QuerySet can be explained if you’re deeply familiar with the internals of the ORM, it’s far from intuitive, especially for beginners.
This confusion is made worse by the fact that the familiar Manager API isn’t quite what it seems…
Manager API is a lieQuerySet methods are chainable. Each call to a QuerySet method (such as filter) returns a cloned version of the original queryset, ready for another method to be called. This fluent interface is part of the beauty of Django’s ORM.
But the fact that Model.objects is a Manager (not a QuerySet) presents a problem: we need to start our chain of method calls on objects, but continue the chain on the resulting QuerySet.
So how is this problem solved in Django’s codebase? Thus, the API lie is exposed: all of the QuerySet methods are reimplemented on the Manager. The versions of these methods on the Manager simply proxy to a newly-created QuerySet via self.get_query_set():
class Manager(object):
# SNIP some housekeeping stuff..
def get_query_set(self):
return QuerySet(self.model, using=self._db)
def all(self):
return self.get_query_set()
def count(self):
return self.get_query_set().count()
def filter(self, *args, **kwargs):
return self.get_query_set().filter(*args, **kwargs)
# and so on for 100+ lines...
To see the full horror, take a look at the Manager source code.
We’ll return to this API sleight-of-hand shortly…
So, let’s get back to solving our problem of cleaning up a messy query API. The approach recommended by Django’s documentation is to define custom Manager subclasses and attach them to your models.
You can either add multiple extra managers to a model, or you can redefine objects, maintaining a single manager but adding your own custom methods.
Let’s try each of these approaches with our Todo application.
class IncompleteTodoManager(models.Manager):
def get_query_set(self):
return super(TodoManager, self).get_query_set().filter(is_done=False)
class HighPriorityTodoManager(models.Manager):
def get_query_set(self):
return super(TodoManager, self).get_query_set().filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = models.Manager() # the default manager
# attach our custom managers:
incomplete = models.IncompleteTodoManager()
high_priority = models.HighPriorityTodoManager()
The API this gives us looks like this:
>>> Todo.incomplete.all()
>>> Todo.high_priority.all()
Unfortunately, there are several big problems with this approach.
Model.objects as the “gateway” to the table. It’s a namespace under which all table-level operations are collected. It’d be a shame to lose this clear convention.Todo.incomplete.filter(priority=1) or Todo.high_priority.filter(is_done=False).I think these downsides completely outweigh any benefits of this approach, and having multiple managers on a model is almost always a bad idea.
So, let’s try the other Django-sanctioned approach: multiple methods on a single custom Manager.
class TodoManager(models.Manager):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = TodoManager()
Our API now looks like this:
>>> Todo.objects.incomplete()
>>> Todo.objects.high_priority()
This is better. It’s much less verbose (only one class definition) and the query methods remain namespaced nicely under objects.
It’s still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can’t then call Todo.objects.incomplete().high_priority(). We’re stuck with Todo.objects.incomplete().filter(is_done=False). Not much use.
Now we’re in uncharted territory. You won’t find this in Django’s documentation…
class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class TodoManager(models.Manager):
def get_query_set(self):
return TodoQuerySet(self.model, using=self._db)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = TodoManager()
Here’s what this looks like from the point of view of code that calls it:
>>> Todo.objects.get_query_set().incomplete()
>>> Todo.objects.get_query_set().high_priority()
>>> # (or)
>>> Todo.objects.all().incomplete()
>>> Todo.objects.all().high_priority()
We’re nearly there! This is not much more verbose than Approach 2, gives the same benefits, and additionally (drumroll please…) it’s chainable!
>>> Todo.objects.all().incomplete().high_priority()
However, it’s still not perfect. The custom Manager is nothing more than boilerplate, and that all() is a wart, which is annoying to type but more importantly is inconsistent – it makes our code look weird.
Now our discussion of the “Manager API lie” above becomes useful: we know how to fix this problem. We simply redefine all of our QuerySet methods on the Manager, and proxy them back to our custom QuerySet:
class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class TodoManager(models.Manager):
def get_query_set(self):
return TodoQuerySet(self.model, using=self._db)
def incomplete(self):
return self.get_query_set().incomplete()
def high_priority(self):
return self.get_query_set().high_priority()
This gives us exactly the API we want:
>>> Todo.objects.incomplete().high_priority() # yay!
Except that’s a lot of typing, and very un-DRY. Every time you add a new method to your QuerySet, or change the signature of an existing method, you have to remember to make the same change on your Manager, or it won’t work properly. This is a recipe for problems.
Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app called django-model-utils. Just run pip install django-model-utils, then..
from model_utils.managers import PassThroughManager
class TodoQuerySet(models.query.QuerySet):
def incomplete(self):
return self.filter(is_done=False)
def high_priority(self):
return self.filter(priority=1)
class Todo(models.Model):
content = models.CharField(max_length=100)
# other fields go here..
objects = PassThroughManager.for_queryset_class(TodoQuerySet)()
This is much nicer. We simply define our custom QuerySet subclass as before, and attach it to our model via the PassThroughManager class provided by django-model-utils.
The PassThroughManager works by implementing the \_\_getattr\_\_ method, which intercepts calls to non-existing methods and automatically proxies them to the QuerySet. There’s a bit of careful checking to ensure that we don’t get infinite recursion on some properties (which is why I recommend using the tried-and-tested implementation supplied by django-model-utils rather than hand-rolling your own).
Note: this functionality is now built in to Django.
Remember that view code from earlier?
def dashboard(request):
todos = Todo.objects.filter(
owner=request.user
).filter(
is_done=False
).filter(
priority=1
)
return render(request, 'todos/list.html', {
'todos': todos,
})
With a bit of work, we could make it look something like this:
def dashboard(request):
todos = Todo.objects.for_user(
request.user
).incomplete().high_priority()
return render(request, 'todos/list.html', {
'todos': todos,
})
Hopefully you’ll agree that this second version is much simpler, clearer and more readable than the first.
Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there’s an associated ticket. Zachary Voase proposed the following:
class TodoManager(models.Manager):
@models.querymethod
def incomplete(query):
return query.filter(is_done=False)
This single decorated method definition would make incomplete magically available on both the Manager and the QuerySet.
Personally, I’m not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little “hacky”. My gut feeling is that adding methods to a QuerySet subclass (rather than a Manager subclass) is a better, simpler approach.
Perhaps we could go further. By stepping back and re-examining Django’s API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?
I’m fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.
Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating custom QuerySet APIs and attaching them to your models with a PassThroughManager from django-model-utils gives you the following benefits:
Thanks for reading!
Note: this post refers to a very old version of Django. The functionality provided by django-model-utils is now built in to Django.
If you’re interested in getting your teeth into some big Django projects (as well as all sorts of other interesting stuff), we’re hiring.