Recently, I had to deal with an issue on DMOJ, where a page displaying 100 out of around a million elements took over 10 seconds to load. Naturally, I started investigating the issue.

(Note: DMOJ uses MariaDB, but the same problem, as well as the eventual solution, should work the same on MySQL as well.)

The first course of action, of course, was to see what the database was trying to do, by running an EXPLAIN query. For those of you who don’t know, if you have a query of the form SELECT x FROM y WHERE z, running EXPLAIN SELECT x FROM y WHERE z would show what the query is doing, without actually executing the query.

A quick look at the EXPLAIN output showed that MariaDB first did a filter on a 2000 row table, and then joined in the table with a million elements. Then, the 100 wanted rows were filtered out. This query plan was quite horrifying.

A much better strategy would be to filter out the 100 rows we want, since all I was doing was a descending sort on the INTEGER PRIMARY KEY column, and then selecting 100 rows from that. This could quickly be very quickly using the primary key on the million-row table. Afterwards, we can join in the smaller tables.

It is unclear why MariaDB chose the suboptimal query plan in this case, even though the million-row table was listed first in the list of joins.

Fortunately, there is a way to override MariaDB’s attempted cleverness. To do this, we simply replace all occurrences of INNER JOIN with STRAIGHT_JOIN. This tells MariaDB to join tables in the order specified, resulting in the desired query plan.

As DMOJ uses the Django ORM to do most queries, I wrote some code to convince Django ORM to use STRAIGHT_JOIN for certain queries:

class CacheDict(dict):
    def __init__(self, func):
        super(CacheDict, self).__init__()
        self.func = func

    def __missing__(self, key):
        self[key] = value = self.func(key)
        return value


def make_straight_join_query(QueryType):
    class Query(QueryType):
        def join(self, join, reuse=None):
            alias = super(Query, self).join(join, reuse)
            join = self.alias_map[alias]
            if join.join_type == INNER:
                join.join_type = 'STRAIGHT_JOIN'
            return alias

    return Query


straight_join_cache = CacheDict(make_straight_join_query)


def use_straight_join(queryset):
    if connections[queryset.db].vendor != 'mysql':
        return
    queryset.query = queryset.query.clone(straight_join_cache[type(queryset.query)])


# Example:
queryset = Model.objects.select_related('foreign_key_one', 'foreign_key_two')
use_straight_join(queryset)
# Now when using `queryset`, the actual query will use STRAIGHT_JOIN.
# This is preserved even after further filtering and joining on the queryset.

I hope you find my article and code sample useful.