Building a higher-level query API: the right way to use Django's ORM
Jamie Matthews
This blog post is based on a talk given at the Brighton Python User Group on April 10th, 2012.
Note: this post refers to a very old version of Django. The functionality provided by the libraries below is now built in to Django.
Summary
In this article, I'm going to argue that using Django's low-level ORM query methods (filter
, order_by
etc) directly in a view is (usually) an anti-pattern. Instead, we should be building custom domain-specific query APIs at the level of the model layer, where our business logic belongs. Django doesn't make this particularly easy, but by taking a deep-dive into the internals of the ORM, I'll show you some neat ways to accomplish it.
Overview
When writing Django applications, we're accustomed to adding methods to our models to encapsulate business logic and hide implementation details. This approach feels completely natural and obvious, and indeed is used liberally throughout Django's built-in apps:
>>> from django.contrib.auth.models import User >>> user = User.objects.get(pk=5) >>> user.set_password('super-sekrit') >>> user.save()
Here set_password
is a method defined on the django.contrib.auth.models.User
model, which hides the implementation details of password hashing. The code looks something like this (edited for clarity):
from django.contrib.auth.hashers import make_password class User(models.Model): # fields go here.. def set_password(self, raw_password): self.password = make_password(raw_password)
We're building a domain-specific API on top of the generic, low-level object-relational mapping tools that Django gives us. This is basic domain modelling: we're increasing the level of abstraction, making any code that interacts with this API less verbose. The result is more robust, reusable and (most importantly) readable code.
So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?
A toy problem: the Todo List
To illustrate the approach, we're going to use a simple todo list app. The usual caveats apply: this is a toy problem. It's hard to show a real-world, useful example without huge piles of code. Don't concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.
Here's our application's models.py
:
from django.db import models PRIORITY_CHOICES = [(1, 'High'), (2, 'Low')] class Todo(models.Model): content = models.CharField(max_length=100) is_done = models.BooleanField(default=False) owner = models.ForeignKey('auth.User') priority = models.IntegerField(choices=PRIORITY_CHOICES, default=1)
Now, let's consider a query we might want to make across this data. Say we're creating a view for the dashboard of our Todo app. We want to show all of the incomplete, high-priority Todos that exist for the currently logged in user. Here's our first stab at the code:
def dashboard(request): todos = Todo.objects.filter( owner=request.user ).filter( is_done=False ).filter( priority=1 ) return render(request, 'todos/list.html', { 'todos': todos, })
(And yes, I know this can be written as request.user.todo_set.filter(is_done=False, priority=1)
. Remember, toy example!)
Why is this bad?
- First, it's verbose. Seven lines (depending on how you prefer to deal with newlines in chained method calls) just to pull out the rows we care about. And, of course, this is just a toy example. Real-world ORM code can be much more complicated.
- It leaks implementation details. Code that interacts with the model needs to know that there exists a property called
is_done
, and that it's aBooleanField
. If you change the implementation (perhaps you replace theis_done
boolean with astatus
field that can have multiple values) then this code will break. - It's opaque - the meaning or intent behind it is not clear at a glance (which can be summarised as "it's hard to read").
- Finally, it has the potential to be repetetive. Imagine you are given a new requirement: write a management command, called via
cron
every week, to email all users their list of incomplete, high-priority todo items. You'd have to essentially copy-and-paste these seven lines into your new script. Not very DRY.
Let's summarise this with a bold claim: using low-level ORM code directly in a view is (usually) an anti-pattern.
So, how can we improve on this?
Managers and QuerySets
Before diving into solutions, we're going to take a slight detour to cover some essential concepts.
Django has two intimately-related constructs related to table-level operations: managers and querysets.
A manager (an instance of django.db.models.manager.Manager
) is described as "the interface through which database query operations are provided to Django models." A model's Manager
is the gateway to table-level functionality in the ORM (model instances generally give you row-level functionality). Every model class is given a default manager, called objects
.
A queryset (django.db.models.query.QuerySet
) represents "a collection of objects from your database." It is essentially a lazily-evaluated abstraction of the result of a SELECT
query, and can be filtered, ordered and generally manipulated to restrict or modify the set of rows it represents. It's responsible for creating and manipulating django.db.models.sql.query.Query
instances, which are compiled into actual SQL queries by the database backends.
Phew. Confused? While the distinction between a Manager
and a QuerySet
can be explained if you're deeply familiar with the internals of the ORM, it's far from intuitive, especially for beginners.
This confusion is made worse by the fact that the familiar Manager
API isn't quite what it seems...
The Manager
API is a lie
QuerySet
methods are chainable. Each call to a QuerySet
method (such as filter
) returns a cloned version of the original queryset, ready for another method to be called. This fluent interface is part of the beauty of Django's ORM.
But the fact that Model.objects
is a Manager
(not a QuerySet
) presents a problem: we need to start our chain of method calls on objects
, but continue the chain on the resulting QuerySet
.
So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of the QuerySet
methods are reimplemented on the Manager
. The versions of these methods on the Manager
simply proxy to a newly-created QuerySet
via self.get_query_set()
:
class Manager(object): # SNIP some housekeeping stuff.. def get_query_set(self): return QuerySet(self.model, using=self._db) def all(self): return self.get_query_set() def count(self): return self.get_query_set().count() def filter(self, *args, **kwargs): return self.get_query_set().filter(*args, **kwargs) # and so on for 100+ lines...
To see the full horror, take a look at the Manager
source code.
We'll return to this API sleight-of-hand shortly...
Back to the todo list
So, let's get back to solving our problem of cleaning up a messy query API. The approach recommended by Django's documentation is to define custom Manager
subclasses and attach them to your models.
You can either add multiple extra managers to a model, or you can redefine objects
, maintaining a single manager but adding your own custom methods.
Let's try each of these approaches with our Todo application.
Approach 1: multiple custom Managers
class IncompleteTodoManager(models.Manager): def get_query_set(self): return super(TodoManager, self).get_query_set().filter(is_done=False) class HighPriorityTodoManager(models.Manager): def get_query_set(self): return super(TodoManager, self).get_query_set().filter(priority=1) class Todo(models.Model): content = models.CharField(max_length=100) # other fields go here.. objects = models.Manager() # the default manager # attach our custom managers: incomplete = models.IncompleteTodoManager() high_priority = models.HighPriorityTodoManager()
The API this gives us looks like this:
>>> Todo.incomplete.all() >>> Todo.high_priority.all()
Unfortunately, there are several big problems with this approach.
- The implementation is very verbose. You need to define an entire class for each custom piece of query functionality.
- It clutters your model's namespace. Django developers are used to thinking of
Model.objects
as the "gateway" to the table. It's a namespace under which all table-level operations are collected. It'd be a shame to lose this clear convention. - Here's the real deal breaker: it's not chainable. There's no way of combining the managers: to get todos which are incomplete and high-priority, we're back to low-level ORM code: either
Todo.incomplete.filter(priority=1)
orTodo.high_priority.filter(is_done=False)
.
I think these downsides completely outweigh any benefits of this approach, and having multiple managers on a model is almost always a bad idea.
Approach 2: Manager methods
So, let's try the other Django-sanctioned approach: multiple methods on a single custom Manager.
class TodoManager(models.Manager): def incomplete(self): return self.filter(is_done=False) def high_priority(self): return self.filter(priority=1) class Todo(models.Model): content = models.CharField(max_length=100) # other fields go here.. objects = TodoManager()
Our API now looks like this:
>>> Todo.objects.incomplete() >>> Todo.objects.high_priority()
This is better. It's much less verbose (only one class definition) and the query methods remain namespaced nicely under objects
.
It's still not chainable, though. Todo.objects.incomplete()
returns an ordinary QuerySet
, so we can't then call Todo.objects.incomplete().high_priority()
. We're stuck with Todo.objects.incomplete().filter(is_done=False)
. Not much use.
Approach 3: custom QuerySet
Now we're in uncharted territory. You won't find this in Django's documentation...
class TodoQuerySet(models.query.QuerySet): def incomplete(self): return self.filter(is_done=False) def high_priority(self): return self.filter(priority=1) class TodoManager(models.Manager): def get_query_set(self): return TodoQuerySet(self.model, using=self._db) class Todo(models.Model): content = models.CharField(max_length=100) # other fields go here.. objects = TodoManager()
Here's what this looks like from the point of view of code that calls it:
>>> Todo.objects.get_query_set().incomplete() >>> Todo.objects.get_query_set().high_priority() >>> # (or) >>> Todo.objects.all().incomplete() >>> Todo.objects.all().high_priority()
We're nearly there! This is not much more verbose than Approach 2, gives the same benefits, and additionally (drumroll please...) it's chainable!
>>> Todo.objects.all().incomplete().high_priority()
However, it's still not perfect. The custom Manager
is nothing more than boilerplate, and that all()
is a wart, which is annoying to type but more importantly is inconsistent - it makes our code look weird.
Approach 3a: copy Django, proxy everything
Now our discussion of the "Manager API lie" above becomes useful: we know how to fix this problem. We simply redefine all of our QuerySet
methods on the Manager
, and proxy them back to our custom QuerySet
:
class TodoQuerySet(models.query.QuerySet): def incomplete(self): return self.filter(is_done=False) def high_priority(self): return self.filter(priority=1) class TodoManager(models.Manager): def get_query_set(self): return TodoQuerySet(self.model, using=self._db) def incomplete(self): return self.get_query_set().incomplete() def high_priority(self): return self.get_query_set().high_priority()
This gives us exactly the API we want:
>>> Todo.objects.incomplete().high_priority() # yay!
Except that's a lot of typing, and very un-DRY. Every time you add a new method to your QuerySet
, or change the signature of an existing method, you have to remember to make the same change on your Manager
, or it won't work properly. This is a recipe for problems.
Approach 3b: django-model-utils
Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app called django-model-utils
. Just run pip install django-model-utils
, then..
from model_utils.managers import PassThroughManager class TodoQuerySet(models.query.QuerySet): def incomplete(self): return self.filter(is_done=False) def high_priority(self): return self.filter(priority=1) class Todo(models.Model): content = models.CharField(max_length=100) # other fields go here.. objects = PassThroughManager.for_queryset_class(TodoQuerySet)()
This is much nicer. We simply define our custom QuerySet
subclass as before, and attach it to our model via the PassThroughManager
class provided by django-model-utils
.
The PassThroughManager
works by implementing the \_\_getattr\_\_
method, which intercepts calls to non-existing methods and automatically proxies them to the QuerySet
. There's a bit of careful checking to ensure that we don't get infinite recursion on some properties (which is why I recommend using the tried-and-tested implementation supplied by django-model-utils
rather than hand-rolling your own).
Note: this functionality is now built in to Django.
How does this help?
Remember that view code from earlier?
def dashboard(request): todos = Todo.objects.filter( owner=request.user ).filter( is_done=False ).filter( priority=1 ) return render(request, 'todos/list.html', { 'todos': todos, })
With a bit of work, we could make it look something like this:
def dashboard(request): todos = Todo.objects.for_user( request.user ).incomplete().high_priority() return render(request, 'todos/list.html', { 'todos': todos, })
Hopefully you'll agree that this second version is much simpler, clearer and more readable than the first.
Can Django help?
Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there's an associated ticket. Zachary Voase proposed the following:
class TodoManager(models.Manager): @models.querymethod def incomplete(query): return query.filter(is_done=False)
This single decorated method definition would make incomplete
magically available on both the Manager
and the QuerySet
.
Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a QuerySet
subclass (rather than a Manager
subclass) is a better, simpler approach.
Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?
I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.
So, to recap:
Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating custom QuerySet
APIs and attaching them to your models with a PassThroughManager
from django-model-utils
gives you the following benefits:
- Makes code less verbose, and more robust.
- Increases DRYness, raises abstraction level.
- Pushes business logic into the domain model layer where it belongs.
Thanks for reading!
Note: this post refers to a very old version of Django. The functionality provided by django-model-utils is now built in to Django.
If you're interested in getting your teeth into some big Django projects (as well as all sorts of other interesting stuff), we're hiring.
Noah Yetter
There is no "right" way to use an ORM. Put it down and step away.
david_a_r_kemp
It's really nice when people contribute constructive comments. Calling Django's Data Mapper an ORM is perhaps stretching the point, but show me how your DAL does this.
Andrey Popp
Django's ORM isn't Data Mapper, it's Active Record.
Sakti Dwi Cahyono
Nice post, thanks for sharing.
Nishad Musthafa
I am convinced it is not a good idea to have query logic in views at all. Particularly for reusability. But components like class based views which are provided by django encourage adding queries to the view. Is there anything that django provides which can help structure your project in a way to separate query logic from views. Like 1. Defining an api set for your models 2. Writing all the required queries here 3. Calling these api's and retrieving data in the views.
I know it is possible to use something like tasty pie(which I am really fond of now). But i was just wondering if there is something baked into django's core libraries.
Thierry Schellenbach
Love the approach.
Will use some of this for Fashiolista.com
Django 1.5 anyone?
Jamespic
Very nice, thanks for publishing this article.
Federico Mendez
awesome post, screw the haters. I know it's easy to just say "use SQLAlchemy and stop fooling around", but I found the article very interesting from a design point of view and a good way to tackle a problem within a framework that is much more than just a ORM where 10% of the things are difficult to do. I liked how you explained the whole problem and the process to solve it. Keep up the awesome work
Henrique
"It's still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can't then call Todo.objects.incomplete().high_priority(). We're stuck with Todo.objects.incomplete().filter(is_done=False). Not much use."
You miss the fact you can work with QuerySets just like set() objects, so:
incomplete_and_high_priority = Todo.objects.incomplete() & Todo.objects.hight_priority()
Doug
Confirmed the SQL output from both techniques is exactly the same.
Nice catch, Henrique.
Brandon Rhodes
But the set operations DO require you to repeat the table you are searching, and otherwise break the chaining convention that the query set tries so hard to make convenient.
A Person
I applaud the effort here. I've seen these techniques elsewhere, but nice to see them all in one place.
Julien
I would love to see a solution for chainable managers in the next Django release.
Alex Ehlke
Have a look at http://djangosnippets.org/s... for a solution I've been using for a couple years.
A Person
Did you notice that snippet was incorporated into django-model-utils?
Alex Ehlke
I did not, thanks!
Andrey Popp
This post has the single right point hidden behind all the code snippets — "use SQLAlchemy", by the way using it with Django is straightforward and just easy.
leehinde
in the 'tada' example:
todos = Todo.objects.for_user(
request.user
).incomplete().high_priority()
where's objects.for_user come from?
Guest
Excuse me for asking a stupid question but if I'm already capable of writing SQL, why even use an ORM and incur the additional computational overhead? Isn't ORM just a crutch? What's the benefit of using it? Thanks.
Guest
Absolutely no benefit. It will only waste your time when you try to figure out why the generated SQL doesn't do what you think it should.
YurkshireLad
ORMs let you work with instances of classes, instead of anonymous rows. It's more for convenience.
(World record for digging up and old post? ;) )
Dave A
This is a neat solution. An additional bonus is that the custom manager shows through relations, so in your example "request.user.todo_set.incomplete().high_priority()" would work too.
MyNameIss
Thanks for post. But i have a question - how do you test these methods?
simple example:
def filter_by_published(self):
return self.filter(is_published=True)
Thats ok, i just create instance with is_published==True, is_pubslished==False and then check, which of instance in these queryset method response.
more complecated
def filter_by_ready_for_sending(self):
return self.filter_by_published().filter(date_send=now())
How i do it now. I just mock filter_by_published with QuerySet.all method and testing second filter.
class TestQuerySetMethods(TestCase):
def test_filter_by_ready_for_sending(self):
with patch('filter_by_published', QuerySet.all):
# test that filtering by date works right
# i dont care, how filter_by_published works,
# because i've already tested it before
even more complecated
def filter_by_ready_for_buying(self):
return (self.filter_by_published()
.filter(price__isnull=False)
.filter_by_delivered())
And now i have to mock default .filter, and custom filter_by_published, filter_by_delivered:
with patch('filter_by_published', QuerySet.all):
with patch('filter', QuerySet.all):
with patch('filter_by_delivered', Mock(return_value=SomeExpectedSet)):
response = Books.objects.filter_by_ready_for_buying()
self.assertEqual(set(response), set(SomeExpectedSet))
I create 3 test methods to test every queryset method, which appeares in filter_by_ready_for_buying.
But what if i'll add new filtering to this method? I'll have to add new mock for every test method.
What if it will be more than 1 default filter method? My mock will mock them all and SomeExpectedSet will not
be representative.
MyNameIss
I just reposted question here http://pastebin.com/SdrK38Gc
Martin Siniawski
Great post, thanks for taking the time to explain the pattern, and the pros/cons of the alternatives!
What about using custom manager methods to encapsulate the required functionality? In your example, it would mean creating a high_priority_incomplete() method on the TodoManager that would do the proper filtering on the fields. This method could be used all over the code that requires access to high priority incomplete todo items, so it would comply with DRY. What do you think?
The downside of such an approach I think is that you might end up having as many manager methods as there are combinations of the fields (if you need to access that data). With the QuerySet approach, it seems you'd end up with less methods and would instead pick and match on the view code.
hackerluddite
Here's a blog post I wrote exploring how to preserve high-level queries such as these across relations: https://hackerluddite.wordp...
Larry Pearson
Thanks - this was really useful.
Guest
Ok, a practical problem.
Łukasz Haze
Ok, a practical problem: how to restrict common queries, i.e. filter out soft-deleted objects. In the vanilla ORM, you could just override get_query_set of your custom Manager to return super.get_query_set().filter(deleted=False).
In django-model-utils flavor, QuerySet query methods have no central point which you could override. So, how to do that without nasty hacking?
Cal Leeming
This is actually really neat.. thanks for sharing.
design/build syracuse ny
Quite nice post.....
Hou GuoChen
Great post Jamie! I can't believe I found this elegant solution only in 2014!
gordon
there is another method, by using mixin.
http://hunterford.me/django...
Tess
Thank you very much for this. It'll make my life as a budding Django developer a lot easier.
F(log)
extremely nice writing skill. Smooth read from the begin to the end. And it makes very clear the idea to where the code belong (manager / queryset / other).
I'm very interested by splitting the code / logic etc. you should definitely check the video: http://mauveweb.co.uk/posts... by alex gaynor.
I would split everything Django related (ORM), and the application logic.
Ellen Lippe
Nice post, thanks for very valuable help :-)
Matthew Schinckel
Since django 1.7 landed, we now have the equivalent of 3b in django itself. Any chance of an update to mention this? (I link to this post at least once a week on IRC)
amacfie
link: https://docs.djangoproject....
Done Zero
tks for the article, It helps me a lot.
Chatz
This is a neat approach.
If you want a 100% Opensource API Management solution please visit the following link.
http://wso2.com/products/ap...
robertf57
I always lose interest when the optimal solution involves a third-party library. That means that if a new Django release breaks that library, I have to wait until the maintainers update their library, IF they even update it. I'd rather take approach 3a. and have some additional control over my code, even if it requires a little more work.
Primož Kerin
There is now an official solution for this with: "objects = TodoQuerySet.as_manager()" so you can skip using django-model-utils
Karanja Denis
This might be 4 years old but a master piece
Sergey Istomin
Awesome! Very useful feature. Thank you
thinknirmal
I mean, just beautiful!
Commenting is now closed