I'm modeling game results in a Django app I'm working on. A team consists of one or more players, and players may be on multiple teams – a ManyToManyField in Django. Then there are locations that maintain a list of players (not teams) who are authorized to play there, and players may be authorized for multiple locations – another ManyToManyField. What I wanted to find was, given a location, a list of teams whose players are all authorized to play at that location.

A naive solution in Django would grab all of the players for the given location, then grab all teams those players were part of, then check each team to make sure the other members were also authorized for the location. This is both ugly and slow. I wanted a single query, so this is what I came up with:

SELECT DISTINCT ptp.team_id
FROM players_team_players ptp
  INNER JOIN locations_location_players llp ON llp.player_id = ptp.player_id
WHERE llp.location_id = %(location_id)s AND (
  array(SELECT p.player_id FROM players_team_players p WHERE p.team_id = ptp.team_id) <@
  array(SELECT l.player_id FROM locations_location_players l WHERE l.location_id = %(location_id)s)
);

First, I join the many-to-many linking tables for teams and locations together using player_id and limit by location_id, leaving me with a set of potential teams for that location. I still need to check to make sure that, of those teams, all the players on each team are authorized to play at that location. PostgreSQL has some nifty array operators that do the trick nicely, since I basically just want a subset operator for two subqueries. So I select all the player_ids for each remaining team and transform it into an array. Then I select the entire list of player_ids authorized for the given location, and transform that into an array. From there it's a simple set comparison using the <@ operator, which means "is contained by."

At first glance, it seems like selecting the entire list of authorized players for the location each time would slow things down considerably. However, if you look at the query plan that is generated, PostgreSQL is smart enough to only perform this selection once, since it doesn't reference any outer tables. Once I had a list of team_ids, getting a list of Team objects was as simple as Team.objects.filter(pk__in=team_ids).