How to pull a random record using Django's ORM?

PythonDjangoDjango Models

Python Problem Overview


I have a model that represents paintings I present on my site. On the main webpage I'd like to show some of them: newest, one that was not visited for most time, most popular one and a random one.

I'm using Django 1.0.2.

While first 3 of them are easy to pull using django models, last one (random) causes me some trouble. I can ofc code it in my view, to something like this:

number_of_records = models.Painting.objects.count()
random_index = int(random.random()*number_of_records)+1
random_paint = models.Painting.get(pk = random_index)

It doesn't look like something I'd like to have in my view tho - this is entirely part of database abstraction and should be in the model. Also, here I need to take care of removed records (then number of all records won't cover me all the possible key values) and probably lots of other things.

Any other options how I can do it, preferably somehow inside the model abstraction?

Python Solutions


Solution 1 - Python

Simply use:

MyModel.objects.order_by('?').first()

It is documented in QuerySet API.

Solution 2 - Python

Using order_by('?') will kill the db server on the second day in production. A better way is something like what is described in Getting a random row from a relational database.

from django.db.models.aggregates import Count
from random import randint

class PaintingManager(models.Manager):
    def random(self):
        count = self.aggregate(count=Count('id'))['count']
        random_index = randint(0, count - 1)
        return self.all()[random_index]

Solution 3 - Python

The solutions with order_by('?')[:N] are extremely slow even for medium-sized tables if you use MySQL (don't know about other databases).

order_by('?')[:N] will be translated to SELECT ... FROM ... WHERE ... ORDER BY RAND() LIMIT N query.

It means that for every row in table the RAND() function will be executed, then the whole table will be sorted according to value of this function and then first N records will be returned. If your tables are small, this is fine. But in most cases this is a very slow query.

I wrote simple function that works even if id's have holes (some rows where deleted):

def get_random_item(model, max_id=None):
    if max_id is None:
        max_id = model.objects.aggregate(Max('id')).values()[0]
    min_id = math.ceil(max_id*random.random())
    return model.objects.filter(id__gte=min_id)[0]

It is faster than order_by('?') in almost all cases.

Solution 4 - Python

Here's a simple solution:

from random import randint

count = Model.objects.count()
random_object = Model.objects.all()[randint(0, count - 1)] #single random object

Solution 5 - Python

You could create a manager on your model to do this sort of thing. To first understand what a manager is, the Painting.objects method is a manager that contains all(), filter(), get(), etc. Creating your own manager allows you to pre-filter results and have all these same methods, as well as your own custom methods, work on the results.

EDIT: I modified my code to reflect the order_by['?'] method. Note that the manager returns an unlimited number of random models. Because of this I've included a bit of usage code to show how to get just a single model.

from django.db import models

class RandomManager(models.Manager):
    def get_query_set(self):
        return super(RandomManager, self).get_query_set().order_by('?')

class Painting(models.Model):
    title = models.CharField(max_length=100)
    author = models.CharField(max_length=50)

    objects = models.Manager() # The default manager.
    randoms = RandomManager() # The random-specific manager.

Usage

random_painting = Painting.randoms.all()[0]

Lastly, you can have many managers on your models, so feel free to create a LeastViewsManager() or MostPopularManager().

Solution 6 - Python

The other answers are either potentially slow (using order_by('?')) or use more than one SQL query. Here's a sample solution with no ordering and just one query (assuming Postgres):

random_instance_or_none = Model.objects.raw('''
    select * from {0} limit 1
    offset floor(random() * (select count(*) from {0}))
'''.format(Model._meta.db_table)).first()

Be aware that this will raise an index error if the table is empty. Write yourself a model-agnostic helper function to check for that.

Solution 7 - Python

Just a simple idea how I do it:

def _get_random_service(self, professional):
    services = Service.objects.filter(professional=professional)
    i = randint(0, services.count()-1)
    return services[i]

Solution 8 - Python

Randomization in DB feels nasty and better in python. But at the same time, it's not a good idea to bring all the data from DB to python memory just to ignore most of the results (especially in the production environment). we might need some sort of filtering also.

  1. So Basically we have data at DB,
  2. we wanna use the rand function of python
  3. and afterwords bring up the whole required data from DB.

Basically using 2 queries will be much less expensive than picking random in DB CPU (computing in DB) or loading whole data (heavy Network Utilization). Solutions explained must need a scalable nature trying to plan here won't work for a production environment espicially with filters, soft/hard deletes, or even with an is_public flag. because probably random id we generated might be deleted from the database or will be cut down in filters. Its a bad practice to assume max_id(records) == count(records).

(Ofcouce, If you do'not delete a percentage of data which is comparable to query uses, or if you dont wanna use any kond of filters, and if you are confident, random id which you can proceed with a random )

if you want only one items. Refer ( @Valter Silva )

import random

mgr = models.Painting.objects
qs = mgr.filter(...)
random_id = random.choice(1, qs.count())-1        # <--- [ First Query Hit ]

random_paint = qs[random_id] ## <-- [ Second Query Hit ]

if you want 'n' items.

import random

req_no_of_random_items = 8        ## i need 8 random items.
qs = models.Painting.objects.filter(...)

## if u prefer to use random values often, you can keep this in cache. 
possible_ids = list(qs.values_list('id', flat=True))        # <--- [ First Query Hit ]

possible_ids = random.choices(possible_ids, k=8)
random_paint = qs.filter(pk__in=possible_ids) ## in a generic case to get 'n' items.

or if you want to have a more optimized code for production, use a cachefunction to get ids of products:

from django.core.cache import cache

def id_set_cache(qs):
    key = "some_random_key_for_cache"
    id_set =  cache.get(key)
    if id_set is None:
        id_set = list(qs.values_list('id', flat=True)
        cache.set(key, id_set)
    retrun id_set

Solution 9 - Python

Hi I needed to select a random record from a queryset who's length I also needed to report (ie web page produced described item and said records left)

q = Entity.objects.filter(attribute_value='this or that')
item_count = q.count()
random_item = q[random.randomint(1,item_count+1)]

took half as long(0.7s vs 1.7s) as:

item_count = q.count()
random_item = random.choice(q)

I'm guessing it avoids pulling down the whole query before selecting the random entry and made my system responsive enough for a page that is accessed repeatedly for a repetitive task where users want to see the item_count count down.

Solution 10 - Python

Method for auto-incrementing primary key with no deletes

If you have a table where the primary key is a sequential integer with no gaps, then the following method should work:

import random
max_id = MyModel.objects.last().id
random_id = random.randint(0, max_id)
random_obj = MyModel.objects.get(pk=random_id)

This method is much more efficient than other methods here that iterate through all rows of the table. While it does require two database queries, both are trivial. Furthermore, it's simple and doesn't require defining any extra classes. However, it's applicability is limited to tables with an auto-incrementing primary key where rows have never deleted, such that there are no gaps in the sequence of ids.

In the case where rows have been deleted such that are gaps, this method could still work if it is retried until an existing primary key is randomly selected.

References

Solution 11 - Python

This is Highly recomended Getting a random row from a relational database

Because using django orm to do such a thing like that, will makes your db server angry specially if you have big data table :|

And the solution is provide a Model Manager and write the SQL query by hand ;)

Update:

Another solution which works on any database backend even non-rel ones without writing custom ModelManager. Getting Random objects from a Queryset in Django

Solution 12 - Python

Just to note a (fairly common) special case, if there is a indexed auto-increment column in the table with no deletes, the optimum way to do a random select is a query like:

SELECT * FROM table WHERE id = RAND() LIMIT 1

that assumes such a column named id for table. In django you can do this by:

Painting.objects.raw('SELECT * FROM appname_painting WHERE id = RAND() LIMIT 1')

in which you must replace appname with your application name.

In General, with an id column, the order_by('?') can be done much faster with:

Paiting.objects.raw(
        'SELECT * FROM auth_user WHERE id>=RAND() * (SELECT MAX(id) FROM auth_user) LIMIT %d' 
    % needed_count)

Solution 13 - Python

You may want to use the same approach that you'd use to sample any iterator, especially if you plan to sample multiple items to create a sample set. @MatijnPieters and @DzinX put a lot of thought into this:

def random_sampling(qs, N=1):
    """Sample any iterable (like a Django QuerySet) to retrieve N random elements

    Arguments:
      qs (iterable): Any iterable (like a Django QuerySet)
      N (int): Number of samples to retrieve at random from the iterable

    References:
      @DZinX:  https://stackoverflow.com/a/12583436/623735
      @MartinPieters: https://stackoverflow.com/a/12581484/623735
    """
    samples = []
    iterator = iter(qs)
    # Get the first `N` elements and put them in your results list to preallocate memory
    try:
        for _ in xrange(N):
            samples.append(iterator.next())
    except StopIteration:
        raise ValueError("N, the number of reuested samples, is larger than the length of the iterable.")
    random.shuffle(samples)  # Randomize your list of N objects
    # Now replace each element by a truly random sample
    for i, v in enumerate(qs, N):
        r = random.randint(0, i)
        if r < N:
            samples[r] = v  # at a decreasing rate, replace random items
    return samples

Solution 14 - Python

One much easier approach to this involves simply filtering down to the recordset of interest and using random.sample to select as many as you want:

from myapp.models import MyModel
import random

my_queryset = MyModel.objects.filter(criteria=True)  # Returns a QuerySet
my_object = random.sample(my_queryset, 1)  # get a single random element from my_queryset
my_objects = random.sample(my_queryset, 5)  # get five random elements from my_queryset

Note that you should have some code in place to verify that my_queryset is not empty; random.sample returns ValueError: sample larger than population if the first argument contains too few elements.

Solution 15 - Python

I got very simple solution, make custom manager:

class RandomManager(models.Manager):
    def random(self):
        return random.choice(self.all())

and then add in model:

class Example(models.Model):
    name = models.CharField(max_length=128)
    objects = RandomManager()

Now, you can use it:

Example.objects.random()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkenderView Question on Stackoverflow
Solution 1 - PythonmuhukView Answer on Stackoverflow
Solution 2 - PythonEmil IvanovView Answer on Stackoverflow
Solution 3 - PythonMikhail KorobovView Answer on Stackoverflow
Solution 4 - PythonMaulik PatelView Answer on Stackoverflow
Solution 5 - PythonSoviutView Answer on Stackoverflow
Solution 6 - PythonNelo MitranimView Answer on Stackoverflow
Solution 7 - PythonValter SilvaView Answer on Stackoverflow
Solution 8 - PythonjerinisreadyView Answer on Stackoverflow
Solution 9 - PythonpjmnobleView Answer on Stackoverflow
Solution 10 - PythonDaniel HimmelsteinView Answer on Stackoverflow
Solution 11 - PythonAlireza SavandView Answer on Stackoverflow
Solution 12 - PythonAmir Ali AkbariView Answer on Stackoverflow
Solution 13 - PythonhobsView Answer on Stackoverflow
Solution 14 - PythoneykanalView Answer on Stackoverflow
Solution 15 - PythonLagRangeView Answer on Stackoverflow