How To Count Your Postgres Tables In Django, Fast!

PUBLISHED ON 17.11.2020 — DATABASE, PYTHON

When you want to do a full count on your tables in django, you often do it like this:

MyModel.objects.count()

which will produce a sql like this:

SELECT COUNT(*) AS "__count" FROM "myapp_mymodel"

Which is pretty fine most of the time, but what happens if you have a lot of data and rows on your table? Then for instance in my experience it took 20 seconds to count all the rows in a table!! especially if you have a more complex query for your counts.

There’s an option in postgres called estimate counts, which will give you well, an estimate for your counts! So unless you want a very precise number of counts from your table, you’re better off this way:

The sql query would be like this:

SELECT reltuples as approximate_row_count FROM pg_class WHERE relname = 'table_name';

Now what I needed was a function which receives a custom query and gives me back the approximate row counts. One smart person called Michael Fuhr has come up with this function to achieve this. So for more sophisticated queries you can use this function:

CREATE FUNCTION count_estimate(query text) RETURNS integer AS
$func$
DECLARE
    rec   record;
    rows  integer;
BEGIN
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN rows IS NOT NULL;
    END LOOP;

    RETURN rows;
END
$func$ LANGUAGE plpgsql;

You first have to create this function in your postgres database and use it like this:

SELECT count_estimate('Your custom query here');

Now what I’ve done in django to achieve all this is that at first I made sure I’m gonna create this stored procedure in my database by creating it in a migration.

./manage.py makemigrations --empty myapp -n count_estimate_procedure
# Generated by Django 2.2.11 on 2020-11-17 15:30

from django.db import migrations

STORED_PROCEDURE_SQL = """
CREATE FUNCTION count_estimate(query text) RETURNS integer AS
$func$
DECLARE
    rec   record;
    rows  integer;
BEGIN
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN rows IS NOT NULL;
    END LOOP;

    RETURN rows;
END
$func$ LANGUAGE plpgsql;
"""


class Migration(migrations.Migration):

    dependencies = [
        ("myapp", "0005_some_other_migration_file"),
    ]

    operations = [
        migrations.RunSQL(
            (STORED_PROCEDURE_SQL), ("DROP FUNCTION IF EXISTS count_estimate;")
        )
    ]

and then create a helper function to call this stored procedure from django:

def estimate_count_fast(sql):
    """ postgres really sucks at full table counts, this is a faster version"""
    cursor = connection.cursor()
    cursor.execute(sql)
    row = cursor.fetchone()
    return int(row[0])

Now you just have to write your count sql and give it to this function, in my case it was something like this:

MYMODEL_COUNT_SQL = """
SELECT count_estimate('SELECT * FROM myapp_mymodel WHERE some_condition');
"""

Now it might not be a pretty sql, but you get my point. Don’t you?!

And use this query in my helper function:

estimate_count_fast(MYMODEL_COUNT_SQL)

And that’s pretty much it. Wish you a very fast performant system!