At DjangoCon US 2018 in San Diego, I’ll be speaking about Code Review Skills for Pythonistas.
During the talk, I’ll be deep diving into best practices for all Python projects, but I’d like to highlight the top three gotchas to look out for while code reviewing Django applications. By catching these problems early, you’ll save yourself from massive future headaches.
1. Inefficient Database Queries
The Django ORM can be both a blessing and a curse. It makes it easier and more intuitive to make queries and work with complex data sets. Unfortunately, it also obfuscates what’s going on under the hood, leaving an opening for inefficient queries to make their way into your application.
You might not notice these slowdowns at first, but as your application starts to scale these minor inefficiencies may come back to haunt you. Code review is the best time to catch this type of slow-down before it ever makes its way into production code.
QuerySets
Optimizing QuerySets
involves a basic understanding of how they work. A primer:
QuerySets
are Lazy - the act of instantiating one doesn’t cause database access.- They only cause database access when they’re evaluated. You evaluate a
QuerySet
by taking action on it - such as looping over it, slicing it, printing it out, or callinglen()
on it either in server-side code or via a template. - Django tries to cache each
QuerySet
by optimizing for minimizing database calls. See the docs for best practices to ensure you’re hitting the cache.
During code review, keep an eye out for these three common query optimizations you can make:
1. Access Foreign Keys
If you need the ID of a foreign key, and not the associated object use the implicit tweet.user_id
instead of tweet.user.id
, because the second call will run an additional query.
2. Fetch Related Objects in a Single Query
If you’re working with objects that have a one-to-one, or many-to-one relationship, make sure you’re using prefetch_related()
and select_related()
where appropriate.
For example, let’s say you’re working on a site called Tweeter. We’d like to work with all the tweets written by a single user. If we use prefetch_related()
we can let Django know to pre-fill the cache with the relevant results.
User.objects.all().prefetch_related('tweets')
3. Offload Complex Filtering to the Database
A good rule of thumb is, offload complex filtering work to the database - instead of to python - because the database will be far more efficient in this role. An easy way to pass complex queries to the database is by using the Django Q
Object. Q
objects can be combined to form very complex queries.
For example, if I wanted to find all the tweets about PyCon or EuroPython, I could write:
Tweet.objects.filter(Q(text__contains='pycon') | Q(text__contains='europython'))
Keep an Eye Out for Queries During Development
When writing Django code, there are a few ways that you can keep an eye out for inefficient queries.
If you’re using the Django shell via python manage.py shell
you can see the raw queries Django runs with the following code and DEBUG
set to True
:
In [2]: tweet = Tweet.objects.get(id=8)
In [3]: from django.db import connection
In [4]: connection.queries
Out[4]:
[{'sql': 'SELECT "tweeter_tweet"."id", "tweeter_tweet"."user_id", "tweeter_tweet"."text", "tweeter_tweet"."timestamp" FROM "tweeter_tweet" WHERE "tweeter_tweet"."id" = 8',
'time': '0.001'}]
If you want to know what’s going on under the hood, Django can also log every database query made in DEBUG
mode with the django.db.backends
logger. Warning: this can quickly produce a large log file. To quickly set up query logging to a file, add the following to your settings and change /path/to/django
to your project path.
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.FileHandler',
'filename': '/path/to/django/debug.log',
},
},
'loggers': {
'django.db.backends': {
'handlers': ['file'],
'level': 'DEBUG',
},
},
}
In production, you can set up custom middleware to keep an eye on database performance. See more tips on database optimization in the Django blog.
2. Check Your Configuration
You never want to commit sensitive information to your code base, and you also want to keep it out of logs, tracebacks, and error reports. Django has a few features that help you solve this problem, but it’s up to you as the developer to properly implement them.
In DEBUG
mode, Django displays helpful information about errors and any associated tracebacks. In this view, Django will filter out the value of settings if the name matches the following list:
API
KEY
PASS
SECRET
SIGNATURE
TOKEN
That means that if you deviate from this naming scheme by having a typo in a setting like MY_PASWORD
(notice the missing S) — you’ll be opening a window for sensitive information to be exposed on your screen during development.
When you’re running in production and DEBUG
is set to False
, you can still leak sensitive information. If the ADMINS
setting is present, Django will send an email report to the listed email addresses on server errors. To filter sensitive information contained in tracebacks and error reports in views, you can use the sensitive_variables()
and sensitive_post_parameters()
decorators.
from django.views.decorators.debug import sensitive_variables
@sensitive_variables('user', 'pw', 'cc')
def process_info(user):
pw = user.pass_word
cc = user.credit_card_number
name = user.name
...
In this example code, the sensitive values will be hidden and replaced with asterisks (*). The Django documentation provides these examples, and more.
Lastly, keep an eye on what makes it into your application server logs. These days, GDPR compliance means caring about what’s in your log files. The less customer data you store, the lower the risk will be to your organization. Use logging filters to keep control over what’s stored.
3. Know Your Dependencies
Each time you introduce a new dependency into your project, you also introduce a new vector for insecure code.
Django Rest Framework is convenient and easy to use, but it does come with a few gotchas, especially around permissions.
Let’s say you’re working on a Twitter clone, and you’d like posts to only be created by the logged in author. You allow the user_id
to be passed into your API endpoint. You might write a permission like this:
class IsAuthorOrReadOnly(permissions.BasePermission):
def has_object_permission(self, request, view, obj):
if request.method in permissions.SAFE_METHODS:
return True
else:
return obj.user == request.user
Looks good, but unfortunately, there’s a gaping bug. We want to catch unauthorized users attempting to create tweets for others, but the way it’s currently written, this permission will let them breeze right through. Can you see why?
It’s because at the time of creation, there is no obj
yet, and therefore, the has_object_permission
method doesn’t get called. If we wanted to gate-keep authoring new tweets, we’d have to write a permission that looked like this:
class IsAuthorOrReadOnly(permissions.BasePermission):
"""
Permission that allows only the author to create
or edit tweets attributed to them
"""
def has_permission(self, request, view):
# If trying to create a new post, check that the
# user creating the tweet is also the author
# of the post
if request.method == 'POST' and request.data:
input_user_id = int(request.data.get('user'))
return input_user_id == request.user.id
return True
def has_object_permission(self, request, view, obj):
if request.method in permissions.SAFE_METHODS:
# Allow read only permissions to view the object
return True
else:
# Check that the request user owns the object
# being edited
return obj.user == request.user
In a real program, you’d probably catch this sort of gotcha with a thorough unit test or by setting the author to the user associated with the request instead of accepting a user_id
for the author.
However, we as programmers aren’t infallible, so you may also make a simple mistake. Make sure you know the in and outs of the frameworks that you’re using. Writing strong and thorough tests will catch this sort of edge-case. As a reviewer, you can help by keeping an eye out for thorough unit tests that check for edge cases.
Conclusion
When done right, code reviews can be an incredibly productive process that squashes subtle bugs in their tracks. When developing Django applications, remember to keep your eye out for the three pain points mentioned above. Also, don’t forget to catch my talk Code Review Skills for Pythonistas live in San Diego on Monday, October 15th 2018.
Python at Microsoft
VS Code is a light-weight cross-platform open source IDE with support for Python, Django, and Django templates. If you’re interested in learning more, check out the guide to getting started with Django on VS Code. You can also learn how to deploy a python container to Azure. Check out the full list of features Azure offers to Python developers.