Building a higher-level query API: the right way to use Django's ORM

Jamie Matthews

This blog post is based on a talk given at the Brighton Python User Group on April 10th, 2012.

Note: this post refers to a very old version of Django. The functionality provided by the libraries below is now built in to Django.

Summary

In this article, I'm going to argue that using Django's low-level ORM query methods (filter, order_by etc) directly in a view is (usually) an anti-pattern. Instead, we should be building custom domain-specific query APIs at the level of the model layer, where our business logic belongs. Django doesn't make this particularly easy, but by taking a deep-dive into the internals of the ORM, I'll show you some neat ways to accomplish it.

Overview

When writing Django applications, we're accustomed to adding methods to our models to encapsulate business logic and hide implementation details. This approach feels completely natural and obvious, and indeed is used liberally throughout Django's built-in apps:

>>> from django.contrib.auth.models import User
>>> user = User.objects.get(pk=5)
>>> user.set_password('super-sekrit')
>>> user.save()

Here set_password is a method defined on the django.contrib.auth.models.User model, which hides the implementation details of password hashing. The code looks something like this (edited for clarity):

from django.contrib.auth.hashers import make_password

class User(models.Model):

    # fields go here..

    def set_password(self, raw_password):
        self.password = make_password(raw_password)

We're building a domain-specific API on top of the generic, low-level object-relational mapping tools that Django gives us. This is basic domain modelling: we're increasing the level of abstraction, making any code that interacts with this API less verbose. The result is more robust, reusable and (most importantly) readable code.

So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?

A toy problem: the Todo List

To illustrate the approach, we're going to use a simple todo list app. The usual caveats apply: this is a toy problem. It's hard to show a real-world, useful example without huge piles of code. Don't concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.

Here's our application's models.py:

from django.db import models

PRIORITY_CHOICES = [(1, 'High'), (2, 'Low')]

class Todo(models.Model):
    content = models.CharField(max_length=100)
    is_done = models.BooleanField(default=False)
    owner = models.ForeignKey('auth.User')
    priority = models.IntegerField(choices=PRIORITY_CHOICES,
                                   default=1)

Now, let's consider a query we might want to make across this data. Say we're creating a view for the dashboard of our Todo app. We want to show all of the incomplete, high-priority Todos that exist for the currently logged in user. Here's our first stab at the code:

def dashboard(request):

    todos = Todo.objects.filter(
        owner=request.user
    ).filter(
        is_done=False
    ).filter(
        priority=1
    )

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

(And yes, I know this can be written as request.user.todo_set.filter(is_done=False, priority=1). Remember, toy example!)

Why is this bad?

  • First, it's verbose. Seven lines (depending on how you prefer to deal with newlines in chained method calls) just to pull out the rows we care about. And, of course, this is just a toy example. Real-world ORM code can be much more complicated.
  • It leaks implementation details. Code that interacts with the model needs to know that there exists a property called is_done, and that it's a BooleanField. If you change the implementation (perhaps you replace the is_done boolean with a status field that can have multiple values) then this code will break.
  • It's opaque - the meaning or intent behind it is not clear at a glance (which can be summarised as "it's hard to read").
  • Finally, it has the potential to be repetetive. Imagine you are given a new requirement: write a management command, called via cron every week, to email all users their list of incomplete, high-priority todo items. You'd have to essentially copy-and-paste these seven lines into your new script. Not very DRY.

Let's summarise this with a bold claim: using low-level ORM code directly in a view is (usually) an anti-pattern.

So, how can we improve on this?

Managers and QuerySets

Before diving into solutions, we're going to take a slight detour to cover some essential concepts.

Django has two intimately-related constructs related to table-level operations: managers and querysets.

A manager (an instance of django.db.models.manager.Manager) is described as "the interface through which database query operations are provided to Django models." A model's Manager is the gateway to table-level functionality in the ORM (model instances generally give you row-level functionality). Every model class is given a default manager, called objects.

A queryset (django.db.models.query.QuerySet) represents "a collection of objects from your database." It is essentially a lazily-evaluated abstraction of the result of a SELECT query, and can be filtered, ordered and generally manipulated to restrict or modify the set of rows it represents. It's responsible for creating and manipulating django.db.models.sql.query.Query instances, which are compiled into actual SQL queries by the database backends.

Phew. Confused? While the distinction between a Manager and a QuerySet can be explained if you're deeply familiar with the internals of the ORM, it's far from intuitive, especially for beginners.

This confusion is made worse by the fact that the familiar Manager API isn't quite what it seems...

The Manager API is a lie

QuerySet methods are chainable. Each call to a QuerySet method (such as filter) returns a cloned version of the original queryset, ready for another method to be called. This fluent interface is part of the beauty of Django's ORM.

But the fact that Model.objects is a Manager (not a QuerySet) presents a problem: we need to start our chain of method calls on objects, but continue the chain on the resulting QuerySet.

So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of the QuerySet methods are reimplemented on the Manager. The versions of these methods on the Manager simply proxy to a newly-created QuerySet via self.get_query_set():

class Manager(object):

    # SNIP some housekeeping stuff..

    def get_query_set(self):
        return QuerySet(self.model, using=self._db)

    def all(self):
        return self.get_query_set()

    def count(self):
        return self.get_query_set().count()

    def filter(self, *args, **kwargs):
        return self.get_query_set().filter(*args, **kwargs)

    # and so on for 100+ lines...

To see the full horror, take a look at the Manager source code.

We'll return to this API sleight-of-hand shortly...

Back to the todo list

So, let's get back to solving our problem of cleaning up a messy query API. The approach recommended by Django's documentation is to define custom Manager subclasses and attach them to your models.

You can either add multiple extra managers to a model, or you can redefine objects, maintaining a single manager but adding your own custom methods.

Let's try each of these approaches with our Todo application.

Approach 1: multiple custom Managers

class IncompleteTodoManager(models.Manager):
    def get_query_set(self):
        return super(TodoManager, self).get_query_set().filter(is_done=False)

class HighPriorityTodoManager(models.Manager):
    def get_query_set(self):
        return super(TodoManager, self).get_query_set().filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = models.Manager() # the default manager

    # attach our custom managers:
    incomplete = models.IncompleteTodoManager()
    high_priority = models.HighPriorityTodoManager()

The API this gives us looks like this:

>>> Todo.incomplete.all()
>>> Todo.high_priority.all()

Unfortunately, there are several big problems with this approach.

  • The implementation is very verbose. You need to define an entire class for each custom piece of query functionality.
  • It clutters your model's namespace. Django developers are used to thinking of Model.objects as the "gateway" to the table. It's a namespace under which all table-level operations are collected. It'd be a shame to lose this clear convention.
  • Here's the real deal breaker: it's not chainable. There's no way of combining the managers: to get todos which are incomplete and high-priority, we're back to low-level ORM code: either Todo.incomplete.filter(priority=1) or Todo.high_priority.filter(is_done=False).

I think these downsides completely outweigh any benefits of this approach, and having multiple managers on a model is almost always a bad idea.

Approach 2: Manager methods

So, let's try the other Django-sanctioned approach: multiple methods on a single custom Manager.

class TodoManager(models.Manager):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = TodoManager()

Our API now looks like this:

>>> Todo.objects.incomplete()
>>> Todo.objects.high_priority()

This is better. It's much less verbose (only one class definition) and the query methods remain namespaced nicely under objects.

It's still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can't then call Todo.objects.incomplete().high_priority(). We're stuck with Todo.objects.incomplete().filter(is_done=False). Not much use.

Approach 3: custom QuerySet

Now we're in uncharted territory. You won't find this in Django's documentation...

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class TodoManager(models.Manager):
    def get_query_set(self):
        return TodoQuerySet(self.model, using=self._db)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = TodoManager()

Here's what this looks like from the point of view of code that calls it:

>>> Todo.objects.get_query_set().incomplete()
>>> Todo.objects.get_query_set().high_priority()
>>> # (or)
>>> Todo.objects.all().incomplete()
>>> Todo.objects.all().high_priority()

We're nearly there! This is not much more verbose than Approach 2, gives the same benefits, and additionally (drumroll please...) it's chainable!

>>> Todo.objects.all().incomplete().high_priority()

However, it's still not perfect. The custom Manager is nothing more than boilerplate, and that all() is a wart, which is annoying to type but more importantly is inconsistent - it makes our code look weird.

Approach 3a: copy Django, proxy everything

Now our discussion of the "Manager API lie" above becomes useful: we know how to fix this problem. We simply redefine all of our QuerySet methods on the Manager, and proxy them back to our custom QuerySet:

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class TodoManager(models.Manager):
    def get_query_set(self):
        return TodoQuerySet(self.model, using=self._db)

    def incomplete(self):
        return self.get_query_set().incomplete()

    def high_priority(self):
        return self.get_query_set().high_priority()

This gives us exactly the API we want:

>>> Todo.objects.incomplete().high_priority() # yay!

Except that's a lot of typing, and very un-DRY. Every time you add a new method to your QuerySet, or change the signature of an existing method, you have to remember to make the same change on your Manager, or it won't work properly. This is a recipe for problems.

Approach 3b: django-model-utils

Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app called django-model-utils. Just run pip install django-model-utils, then..

from model_utils.managers import PassThroughManager

class TodoQuerySet(models.query.QuerySet):
    def incomplete(self):
        return self.filter(is_done=False)

    def high_priority(self):
        return self.filter(priority=1)

class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    objects = PassThroughManager.for_queryset_class(TodoQuerySet)()

This is much nicer. We simply define our custom QuerySet subclass as before, and attach it to our model via the PassThroughManager class provided by django-model-utils.

The PassThroughManager works by implementing the \_\_getattr\_\_ method, which intercepts calls to non-existing methods and automatically proxies them to the QuerySet. There's a bit of careful checking to ensure that we don't get infinite recursion on some properties (which is why I recommend using the tried-and-tested implementation supplied by django-model-utils rather than hand-rolling your own).

Note: this functionality is now built in to Django.

How does this help?

Remember that view code from earlier?

def dashboard(request):

    todos = Todo.objects.filter(
        owner=request.user
    ).filter(
        is_done=False
    ).filter(
        priority=1
    )

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

With a bit of work, we could make it look something like this:

def dashboard(request):

    todos = Todo.objects.for_user(
        request.user
    ).incomplete().high_priority()

    return render(request, 'todos/list.html', {
        'todos': todos,
    })

Hopefully you'll agree that this second version is much simpler, clearer and more readable than the first.

Can Django help?

Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there's an associated ticket. Zachary Voase proposed the following:

class TodoManager(models.Manager):

    @models.querymethod
    def incomplete(query):
        return query.filter(is_done=False)

This single decorated method definition would make incomplete magically available on both the Manager and the QuerySet.

Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a QuerySet subclass (rather than a Manager subclass) is a better, simpler approach.

Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?

I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.

So, to recap:

Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating custom QuerySet APIs and attaching them to your models with a PassThroughManager from django-model-utils gives you the following benefits:

  • Makes code less verbose, and more robust.
  • Increases DRYness, raises abstraction level.
  • Pushes business logic into the domain model layer where it belongs.

Thanks for reading!

Note: this post refers to a very old version of Django. The functionality provided by django-model-utils is now built in to Django.

If you're interested in getting your teeth into some big Django projects (as well as all sorts of other interesting stuff), we're hiring.

  • Noah Yetter

    There is no "right" way to use an ORM. Put it down and step away.

    • david_a_r_kemp

      It's really nice when people contribute constructive comments. Calling Django's Data Mapper an ORM is perhaps stretching the point, but show me how your DAL does this.

      • Andrey Popp

        Django's ORM isn't Data Mapper, it's Active Record.

  • Sakti Dwi Cahyono

    Nice post, thanks for sharing.

  • Nishad Musthafa

    I am convinced it is not a good idea to have query logic in views at all. Particularly for reusability. But components like class based views which are provided by django encourage adding queries to the view. Is there anything that django provides which can help structure your project in a way to separate query logic from views. Like 1. Defining an api set for your models 2. Writing all the required queries here 3. Calling these api's and retrieving data in the views.

    I know it is possible to use something like tasty pie(which I am really fond of now). But i was just wondering if there is something baked into django's core libraries.

  • Thierry Schellenbach

    Love the approach.
    Will use some of this for Fashiolista.com

    Django 1.5 anyone?

  • Jamespic

    Very nice, thanks for publishing this article.

  • Federico Mendez

    awesome post, screw the haters. I know it's easy to just say "use SQLAlchemy and stop fooling around", but I found the article very interesting from a design point of view and a good way to tackle a problem within a framework that is much more than just a ORM where 10% of the things are difficult to do. I liked how you explained the whole problem and the process to solve it. Keep up the awesome work

  • Henrique

    "It's still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can't then call Todo.objects.incomplete().high_priority(). We're stuck with Todo.objects.incomplete().filter(is_done=False). Not much use."

    You miss the fact you can work with QuerySets just like set() objects, so:

    incomplete_and_high_priority = Todo.objects.incomplete() & Todo.objects.hight_priority()

    • Doug

      Confirmed the SQL output from both techniques is exactly the same.

      Nice catch, Henrique.

    • Brandon Rhodes

      But the set operations DO require you to repeat the table you are searching, and otherwise break the chaining convention that the query set tries so hard to make convenient.

  • A Person

    I applaud the effort here. I've seen these techniques elsewhere, but nice to see them all in one place.

  • Julien

    I would love to see a solution for chainable managers in the next Django release.

  • Alex Ehlke

    Have a look at http://djangosnippets.org/s... for a solution I've been using for a couple years.

    • A Person

      Did you notice that snippet was incorporated into django-model-utils?

      • Alex Ehlke

        I did not, thanks!

  • Andrey Popp

    This post has the single right point hidden behind all the code snippets — "use SQLAlchemy", by the way using it with Django is straightforward and just easy.

  • leehinde

    in the 'tada' example:
    todos = Todo.objects.for_user(
    request.user
    ).incomplete().high_priority()

    where's objects.for_user come from?

  • Guest

    Excuse me for asking a stupid question but if I'm already capable of writing SQL, why even use an ORM and incur the additional computational overhead? Isn't ORM just a crutch? What's the benefit of using it? Thanks.

    • Guest

      Absolutely no benefit. It will only waste your time when you try to figure out why the generated SQL doesn't do what you think it should.

    • YurkshireLad

      ORMs let you work with instances of classes, instead of anonymous rows. It's more for convenience.

      (World record for digging up and old post? ;) )

  • Dave A

    This is a neat solution. An additional bonus is that the custom manager shows through relations, so in your example "request.user.todo_set.incomplete().high_priority()" would work too.

  • MyNameIss

    Thanks for post. But i have a question - how do you test these methods?

    simple example:

    def filter_by_published(self):
    return self.filter(is_published=True)

    Thats ok, i just create instance with is_published==True, is_pubslished==False and then check, which of instance in these queryset method response.

    more complecated

    def filter_by_ready_for_sending(self):
    return self.filter_by_published().filter(date_send=now())

    How i do it now. I just mock filter_by_published with QuerySet.all method and testing second filter.

    class TestQuerySetMethods(TestCase):
    def test_filter_by_ready_for_sending(self):
    with patch('filter_by_published', QuerySet.all):
    # test that filtering by date works right
    # i dont care, how filter_by_published works,
    # because i've already tested it before

    even more complecated

    def filter_by_ready_for_buying(self):
    return (self.filter_by_published()
    .filter(price__isnull=False)
    .filter_by_delivered())

    And now i have to mock default .filter, and custom filter_by_published, filter_by_delivered:

    with patch('filter_by_published', QuerySet.all):
    with patch('filter', QuerySet.all):
    with patch('filter_by_delivered', Mock(return_value=SomeExpectedSet)):
    response = Books.objects.filter_by_ready_for_buying()
    self.assertEqual(set(response), set(SomeExpectedSet))

    I create 3 test methods to test every queryset method, which appeares in filter_by_ready_for_buying.
    But what if i'll add new filtering to this method? I'll have to add new mock for every test method.

    What if it will be more than 1 default filter method? My mock will mock them all and SomeExpectedSet will not
    be representative.

  • Martin Siniawski

    Great post, thanks for taking the time to explain the pattern, and the pros/cons of the alternatives!

    What about using custom manager methods to encapsulate the required functionality? In your example, it would mean creating a high_priority_incomplete() method on the TodoManager that would do the proper filtering on the fields. This method could be used all over the code that requires access to high priority incomplete todo items, so it would comply with DRY. What do you think?

    The downside of such an approach I think is that you might end up having as many manager methods as there are combinations of the fields (if you need to access that data). With the QuerySet approach, it seems you'd end up with less methods and would instead pick and match on the view code.

  • hackerluddite

    Here's a blog post I wrote exploring how to preserve high-level queries such as these across relations: https://hackerluddite.wordp...

  • Larry Pearson

    Thanks - this was really useful.

  • Guest

    Ok, a practical problem.

  • Łukasz Haze

    Ok, a practical problem: how to restrict common queries, i.e. filter out soft-deleted objects. In the vanilla ORM, you could just override get_query_set of your custom Manager to return super.get_query_set().filter(deleted=False).

    In django-model-utils flavor, QuerySet query methods have no central point which you could override. So, how to do that without nasty hacking?

  • Cal Leeming

    This is actually really neat.. thanks for sharing.

  • design/build syracuse ny

    Quite nice post.....

  • Hou GuoChen

    Great post Jamie! I can't believe I found this elegant solution only in 2014!

  • gordon

    there is another method, by using mixin.
    http://hunterford.me/django...

  • Tess

    Thank you very much for this. It'll make my life as a budding Django developer a lot easier.

  • F(log)

    extremely nice writing skill. Smooth read from the begin to the end. And it makes very clear the idea to where the code belong (manager / queryset / other).
    I'm very interested by splitting the code / logic etc. you should definitely check the video: http://mauveweb.co.uk/posts... by alex gaynor.
    I would split everything Django related (ORM), and the application logic.

  • Ellen Lippe

    Nice post, thanks for very valuable help :-)

  • Matthew Schinckel

    Since django 1.7 landed, we now have the equivalent of 3b in django itself. Any chance of an update to mention this? (I link to this post at least once a week on IRC)

  • Done Zero

    tks for the article, It helps me a lot.

  • Chatz

    This is a neat approach.

    If you want a 100% Opensource API Management solution please visit the following link.

    http://wso2.com/products/ap...

  • robertf57

    I always lose interest when the optimal solution involves a third-party library. That means that if a new Django release breaks that library, I have to wait until the maintainers update their library, IF they even update it. I'd rather take approach 3a. and have some additional control over my code, even if it requires a little more work.

  • Primož Kerin

    There is now an official solution for this with: "objects = TodoQuerySet.as_manager()" so you can skip using django-model-utils

  • Karanja Denis

    This might be 4 years old but a master piece

  • Sergey Istomin

    Awesome! Very useful feature. Thank you

  • thinknirmal

    I mean, just beautiful!

Commenting is now closed