How do I force Django to ignore any caches and reload data?

PythonDjangoCaching

Python Problem Overview


I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.

What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:

>>> MyModel.objects.count()
885
# (Here I added some more data from another process.)
>>> MyModel.objects.count()
885
>>> MyModel.objects.update()
0
>>> MyModel.objects.count()
1025

As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.

I can't find any documentation on that update() method and have no idea what other bad things it might do.

My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

Python Solutions


Solution 1 - Python

Having had this problem and found two definitive solutions for it I thought it worth posting another answer.

This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.

Demonstrate like this

Run a django shell in terminal 1

>>> MyModel.objects.get(id=1).my_field
u'old'

And another in terminal 2

>>> MyModel.objects.get(id=1).my_field
u'old'
>>> a = MyModel.objects.get(id=1)
>>> a.my_field = "NEW"
>>> a.save()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>> 

Back to terminal 1 to demonstrate the problem - we still read the old value from the database.

>>> MyModel.objects.get(id=1).my_field
u'old'

Now in terminal 1 demonstrate the solution

>>> from django.db import transaction
>>> 
>>> @transaction.commit_manually
... def flush_transaction():
...     transaction.commit()
... 
>>> MyModel.objects.get(id=1).my_field
u'old'
>>> flush_transaction()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>> 

The new data is now read

Here is that code in an easy to paste block with docstring

from django.db import transaction

@transaction.commit_manually
def flush_transaction():
    """
    Flush the current transaction so we don't read stale data

    Use in long running processes to make sure fresh data is read from
    the database.  This is a problem with MySQL and the default
    transaction mode.  You can fix it by setting
    "transaction-isolation = READ-COMMITTED" in my.cnf or by calling
    this function at the appropriate moment
    """
    transaction.commit()

The alternative solution is to change my.cnf for MySQL to change the default transaction mode

transaction-isolation = READ-COMMITTED

Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.

Update 3 years later

Now that Django 1.6 has turned on autocommit in MySQL this is no longer a problem. The example above now works fine without the flush_transaction() code whether your MySQL is in REPEATABLE-READ (the default) or READ-COMMITTED transaction isolation mode.

What was happening in previous versions of Django which ran in non autocommit mode was that the first select statement opened a transaction. Since MySQL's default mode is REPEATABLE-READ this means that no updates to the database will be read by subsequent select statements - hence the need for the flush_transaction() code above which stops the transaction and starts a new one.

There are still reasons why you might want to use READ-COMMITTED transaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED.

The flush_transaction() code now produces a deprecation warning in Django 1.6 so I recommend you remove it.

Solution 2 - Python

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.

we used the @transaction.commit_manually decorator and calls to transaction.commit() just before every occasion where you need up-to-date info.

As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.

detailed info here:

http://devblog.resolversystems.com/?p=439

Solution 3 - Python

I'm not sure I'd recommend it...but you can just kill the cache yourself:

>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> qs._result_cache = None
>>> qs.count()
2

And here's a better technique that doesn't rely on fiddling with the innards of the QuerySet: Remember that the caching is happening within a QuerySet, but refreshing the data simply requires the underlying Query to be re-executed. The QuerySet is really just a high-level API wrapping a Query object, plus a container (with caching!) for Query results. Thus, given a queryset, here is a general-purpose way of forcing a refresh:

>>> MyModel().save()
>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> from django.db.models import QuerySet
>>> qs = QuerySet(model=MyModel, query=qs.query)
>>> qs.count()  # refreshed!
2
>>> party_time()

Pretty easy! You can of course implement this as a helper function and use as needed.

Solution 4 - Python

Seems like the count() goes to cache after the first time. This is the django source for QuerySet.count:

def count(self):
    """
    Performs a SELECT COUNT() and returns the number of records as an
    integer.

    If the QuerySet is already fully cached this simply returns the length
    of the cached results set to avoid multiple SELECT COUNT(*) calls.
    """
    if self._result_cache is not None and not self._iter:
        return len(self._result_cache)

    return self.query.get_count(using=self.db)

update does seem to be doing quite a bit of extra work, besides what you need.
But I can't think of any better way to do this, short of writing your own SQL for the count.
If performance is not super important, I would just do what you're doing, calling update before count.

QuerySet.update:

def update(self, **kwargs):
    """
    Updates all elements in the current QuerySet, setting all the given
    fields to the appropriate values.
    """
    assert self.query.can_filter(), \
            "Cannot update a query once a slice has been taken."
    self._for_write = True
    query = self.query.clone(sql.UpdateQuery)
    query.add_update_values(kwargs)
    if not transaction.is_managed(using=self.db):
        transaction.enter_transaction_management(using=self.db)
        forced_managed = True
    else:
        forced_managed = False
    try:
        rows = query.get_compiler(self.db).execute_sql(None)
        if forced_managed:
            transaction.commit(using=self.db)
        else:
            transaction.commit_unless_managed(using=self.db)
    finally:
        if forced_managed:
            transaction.leave_transaction_management(using=self.db)
    self._result_cache = None
    return rows
update.alters_data = True

Solution 5 - Python

If you append .all() to a queryset, it'll force a reread from the DB. Try MyModel.objects.all().count() instead of MyModel.objects.count().

Solution 6 - Python

You can also use MyModel.objects._clone().count(). All of the methods in the the QuerySet call _clone() prior to doing any work - that ensures that any internal caches are invalidated.

The root cause is that MyModel.objects is the same instance each time. By cloning it you're creating a new instance without the cached value. Of course, you can always reach in and invalidate the cache if you'd prefer to use the same instance.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionscippyView Question on Stackoverflow
Solution 1 - PythonNick Craig-WoodView Answer on Stackoverflow
Solution 2 - PythonhwjpView Answer on Stackoverflow
Solution 3 - PythonChris ClarkView Answer on Stackoverflow
Solution 4 - PythonadamJLevView Answer on Stackoverflow
Solution 5 - PythonSarah MesserView Answer on Stackoverflow
Solution 6 - PythonTravis SwicegoodView Answer on Stackoverflow