Should I avoid multi-table (concrete) inheritance in Django by any means?
DjangoInheritanceModelsMulti Table-InheritanceConcrete InheritanceDjango Problem Overview
Many experienced developers recommend against using Django multi-table inheritance because of its poor performance:
- Django gotcha: concrete inheritance by Jacob Kaplan-Moss, a core contributor of Django.
> In nearly every case, abstract inheritance is a better approach for > the long term. I’ve seen more than few sites crushed under the load > introduced by concrete inheritance, so I’d strongly suggest that > Django users approach any use of concrete inheritance with a large > dose of skepticism.
> Multi-table inheritance, sometimes called “concrete inheritance,” is > considered by the authors and many other developers to be a bad thing. > We strongly recommend against using it.
> At all costs, everyone should avoid multi-table inheritance > since it adds both confusion and substantial overhead. > Instead of multi-table inheritance, use explicit OneToOneFields and > ForeignKeys between models so you can control when joins are > traversed.
But without multi-table inheritance, I can't easily
-
Reference base model in another model (have to use GenericForeignKey or reverse dependency);
-
Get all instances of base model.
(feel free to add more)
So what is wrong with this kind of inheritance in Django? Why are explicit OneToOneFields better?
How much does performance suffer from JOINs? Are there any benchmarks that show the difference in performance?
Does not select_related()
allow us to control when JOINs are invoked?
I have moved concrete examples to a separate question since this one is becoming too broad, and added a list of reasons for using multi-table inheritance instead.
Django Solutions
Solution 1 - Django
First of all, inheritance has not a natural translation to relational database architecture (ok, I know, Oracle Type Objects and some other RDBMS support inheritance but django don't take advantage of this functionality)
At this point, notice than django generates new tables to subclasses and write lots of left joins
to retrieve data from this 'sub-tables'. And left joins are not your friends. In a high performance scenario, like a game backend or something else, you should avoid it and resolve inheritance 'by hand' with some artifacts like nulls, OneToOne or foreign keys. In OneToOne
scenario, you can call the related table directly or only if you need it.
... BUT ...
"In my opinion (TGW)" you should to include model inheritance in your enterprise projects when it catches to your universe of discourse. I do this, and I save a lot of development hours to my customers thanks to this feature. Also, code becomes clean and elegant and that means easy maintenance (notice that this kind of projects don't have hundreds or requests by second)
Question by question
Q: What is wrong with this kind of inheritance in Django?
A: Lot of tables, a lot of left joins.
Q: Why are explicit OneToOneFields better?
A: You can access directly to related model without left joins.
Q: Are there any illustrative examples (benchmarks)?
A: No comparable.
Q: Does not select_related() allow us to control when JOINs are invoked?
A: django joins needed tables.
Q: What are the alternatives to multi-table inheritance when I need to reference a base class in another model?
A: Nullification. OneToOne relations and lots of code lines. It depends on application needs.
Q: Are GenericForeignKeys better in this case?
A: Not for me.
Q: What if I need OneToOneField to base model?
A: Write it. There is no problem with this. For example, you can extend User model, and also you can have a OneToOne to User base model for some users.
Conclusion
You should to know the cost of write and maintenance code without model inheritance also the cost of hardware to support model inheritance applications and act accordingly.
Just a joke: you can write it on assembly code, and it will run faster.
Quoting Trey Hunner:
>Your time is usually much more expensive than your CPU's time.
Solution 2 - Django
From what I understand, you are using OneToOneField
on the RelatedModel
to the BaseModel
because ultimately, you want a one-to-one link between RelatedModel
and each Submodel1
to Submodel9
. If so, there's a more efficient way of doing that without multi-table inheritance nor generic relations.
Just get rid of the BaseModel
and in each SubmodelX
, have a OneToOneField
to RelatedModel
class Submodel1(models.Model):
related_model = models.OneToOneField(RelatedModel, null=True, blank=True, related_name='the_thing')
some_field = models.TextField()
# ...
class Submodel9(models.Model):
related_model = models.OneToOneField(RelatedModel, null=True, blank=True, related_name='the_thing')
another_field = models.TextField()
This would allow you to access SubmodelX
from an instance of RelatedModel
using a field named the_thing
, just as in the multi-table inheritance example you first gave.
Note that you may use abstract inheritance to factor out the related_model
field and any other common fields between SubModel1
to Submodel9
.
The reason using multi-table inheritance is inefficient is because it generates an extra table for the base model, and therefore extra JOINs to access those fields. Using generic relations would be more efficient if you later find that you instead need a ForeignKey
field from RelatedModel
to each SubmodelX
. However, Django does not support generic relations in select_related()
and you might have to end up building your own queries to do so efficiently. The tradeoff between performance and ease of coding is up to you depending on how much load you expect on the server and how much time you want to spend optimizing.
Solution 3 - Django
The world has changed.
The first thing to note is that the article titled Django gotcha: concrete inheritance was nearly four years old at the time this question was asked; in 2014. Both Django and RDBMs systems have come a long way since then (example mysql 5.0 or 5.1 were the widely used versions and 5.5 general availability was still one month away).
Joins to my left, joins to my right
It is true that multi table inheritance does result in extra joins behind the scenes most of the time. But joins are not evil. It's worth noting that in a properly normalized database, you almost always have to join to fetch all the required data. When proper indexes are used, joins do not include any significant performance penalties.
INNER JOIN vs LEFT OUTER JOIN
This is indeed the case against multi table inheritance, with other approaches it's possible to avoid a costly LEFT OUTER JOIN and do an INNER JOIN instead or perhaps a subquery. But with multi table inheritance you are denied that choice
Solution 4 - Django
Whether the occurrence of LEFT OUTER JOIN
is an issue in itself, I cannot say, but, in any case, it may be interesting to note in which cases these outer joins actually occur.
This is a naive attempt to illustrate the above, using some example queries.
Suppose we have some models using multi-table inheritance as follows:
from django.db import models
class Parent(models.Model):
parent_field = models.CharField(max_length=10)
class ChildOne(Parent):
child_one_field = models.CharField(max_length=10)
class ChildTwo(Parent):
child_two_field = models.CharField(max_length=10)
By default, the child instances get a parent_ptr
and parent instances can access child objects (if they exist) using childone
or childtwo
. Note that parent_ptr
represents a one-to-one relation which is used as the primary key (the actual child tables have no id
column).
Here's a quick-and-dirty unit test with some naive Django
query examples, showing the corresponding number of occurrences of INNER JOIN
and OUTER JOIN
in the SQL
:
import re
from django.test import TestCase
from inheritance.models import (Parent, ChildOne, ChildTwo)
def count_joins(query, inner_outer):
""" Count the occurrences of JOIN in the query """
return len(re.findall('{} join'.format(inner_outer), str(query).lower()))
class TestMultiTableInheritance(TestCase):
def test_queries(self):
# get children (with parent info)
query = ChildOne.objects.all().query
self.assertEqual(1, count_joins(query, 'inner'))
self.assertEqual(0, count_joins(query, 'outer'))
# get parents
query = Parent.objects.all().query
self.assertEqual(0, count_joins(query, 'inner'))
self.assertEqual(0, count_joins(query, 'outer'))
# filter children by parent field
query = ChildOne.objects.filter(parent_field=parent_value).query
self.assertEqual(1, count_joins(query, 'inner'))
self.assertEqual(0, count_joins(query, 'outer'))
# filter parents by child field
query = Parent.objects.filter(childone__child_one_field=child_value).query
self.assertEqual(1, count_joins(query, 'inner'))
self.assertEqual(0, count_joins(query, 'outer'))
# get child field values via parent
query = Parent.objects.values_list('childone__child_one_field').query
self.assertEqual(0, count_joins(query, 'inner'))
self.assertEqual(1, count_joins(query, 'outer'))
# get multiple child field values via parent
query = Parent.objects.values_list('childone__child_one_field',
'childtwo__child_two_field').query
self.assertEqual(0, count_joins(query, 'inner'))
self.assertEqual(2, count_joins(query, 'outer'))
# get child-two field value from child-one, through parent
query = ChildOne.objects.values_list('parent_ptr__childtwo__child_two_field').query
self.assertEqual(1, count_joins(query, 'inner'))
self.assertEqual(1, count_joins(query, 'outer'))
# get parent field value from parent, but through child
query = Parent.objects.values_list('childone__parent_field').query
self.assertEqual(0, count_joins(query, 'inner'))
self.assertEqual(2, count_joins(query, 'outer'))
# filter parents by parent field, but through child
query = Parent.objects.filter(childone__parent_field=parent_value).query
self.assertEqual(2, count_joins(query, 'inner'))
self.assertEqual(0, count_joins(query, 'outer'))
Note, not all of these queries make sense: they are just for illustrative purposes.
Also note that this test code is not DRY, but that is on purpose.
Solution 5 - Django
Django implements multi-table inheritance via an automatically-created OneToOneField as its docs says.So either use abstract inheritance or I don't think using an explicit OneToOneFields or ForeignKeys makes any differences.