Demystifying SQL Queries
SQL is fundamentally a small simple syntax that mainly works with sets. Let's see how.
Let’s look at an average, everyday SQL query:
1 SELECT something
2 FROM table
3 WHERE conditions
Not hard to draw parallels to other known concepts:
FROM table: A set of elements
WHERE conditions: A filter operation on these elements
SELECT something: A presenter of an individual element
Tables are sets
The bare minimum way of interacting with a set is to look at some or all of its elements. That is precisely what this query is doing. We’re selecting a subset of a set based on a few conditions and representing each element of this subset in some format. Recall that a subset of a set is also a set in itself. This allows us to do something like:
1 SELECT something
2 FROM (
3 SELECT something
4 FROM table
5 WHERE conditions
6 )
7 WHERE conditions
We have a nested query, where instead of selecting from a table, we’re selecting from the result of selecting from a table.
Sets can have labels or aliases, so:
1 SELECT a_name.something
2 FROM (
3 SELECT something
4 FROM table
5 WHERE conditions
6 ) as a_name
7 WHERE conditions
We seem to be doing some recursive-looking stuff, and I find it beautiful.
This also means an assertion is in order: a select query operates upon a set and returns a set. Naturally, a set can be that of a single element as well.
1 SELECT COUNT(*) FROM table WHERE conditions
2 --- better yet:
3 SELECT COUNT(*) as something FROM table WHERE conditions
Let’s substitute this set in our original query:
1 SELECT the same something
2 FROM
3 (SELECT COUNT(*) as something FROM table WHERE conditions)
4 WHERE conditions
So a select query operates upon a set and returns a set, and we know that sets can be unioned or intersected:
1 SELECT something
2 FROM (a set)
3 WHERE conditions
4 UNION -- or INTERSECTION
5 SELECT the same something
6 FROM (a similar set)
7 WHERE conditions
Notice that “something” and “the same something” are important. We can only union or intersection similar sets. Apples and oranges can’t be unioned in the relational algebra land.
Joins are Sets
Let’s talk about joins. 😅 Chances are, at some point in life, you’ve written an INNER JOIN instead of an OUTER JOIN and got incorrect results. Or something along those lines. Joins can be very opaque, even to a regular practitioner.
A few things are essential when considering joins:
A join is a product of 2 sets. Always. Multi-table joins are “first join these two,” “take the result,” and “join the result with the next.”
Joins are always performed on sets. So you can “join” any of the above-mentioned sets, and you’re still good. Do note that language semantics dictate that you use aliases to disambiguate.
NULL is always a part of each set. Implicitly so, for practicality.
This is best explained through an example:
1 numbers = { 1, 2, 3 }
2 letters = { a, b, c }
Joining numbers
and letters
1 numbers | letters
2 --------|---------
3 NULL | NULL
4 NULL | a
5 NULL | b
6 NULL | c
7 1 | NULL
8 1 | a
9 1 | b
10 1 | c
11 2 | NULL
12 2 | a
13 2 | b
14 2 | c
15 3 | NULL
16 3 | a
17 3 | b
18 3 | c
Do you feel you are getting more than what you bargained for? Me too. Depending on the context, we’ll want different subsets of this mega-joined set. That’s precisely what different kinds of joins are for. These joins will determine what working set we’ll use.
Inner join: Do not consider the entries with NULL on either side.
Left outer join: Do not consider entries with NULL on the LEFT side.
Right outer join: Do not consider entries with NULL on the RIGHT side.
Full outer join: Consider all entries.
The entry where both sides are NULL is not considered for practical reasons. A slightly better example:
1 numbers = { [1, a], [2, b], [3, c] }
2 letters = { [a, x], [b, y], [c, z] }
1 JOIN
2 numbers AND letters
3 ON second element of number = first element of letter
Our working set:
1 numbers | letters
2 --------|---------
3 NULL | NULL
4 NULL | [a, x]
5 NULL | [b, y]
6 NULL | [c, z]
7 [1, a] | NULL
8 [1, a] | [a, x]
9 [1, a] | [b, y]
10 [1, a] | [c, z]
11 [2, b] | NULL
12 [2, b] | [a, x]
13 [2, b] | [b, y]
14 [2, b] | [c, z]
15 [3, c] | NULL
16 [3, c] | [a, x]
17 [3, c] | [b, y]
18 [3, c] | [c, z]
After applying conditions and removing both sides NULL
entry:
1 numbers | letters
2 --------|---------
3 NULL | [a, x]
4 NULL | [b, y]
5 NULL | [c, z]
6 [1, a] | NULL
7 [1, a] | [a, x]
8 [2, b] | NULL
9 [2, b] | [b, y]
10 [3, c] | NULL
11 [3, c] | [c, z]
Rearranging a little for better understanding:
1 numbers | letters | included in
2 --------|---------|-----------
3 [1, a] | [a, x] | Full, Inner, Left and Right
4 [2, b] | [b, y] | Full, Inner, Left and Right
5 [3, c] | [c, z] | Full, Inner, Left and Right
6 [1, a] | NULL | Full and Left
7 [2, b] | NULL | Full and Left
8 [3, c] | NULL | Full and Left
9 NULL | [a, x] | Full and Right
10 NULL | [b, y] | Full and Right
11 NULL | [c, z] | Full and Right
Left and Right are determined from the join syntax. When A joins B, A is Left, and B is Right.
While attempting to write a join query, I encourage you to work out your join on paper first with dummy data. Especially with multi-table joins.
Functions are Sets
That sounded nice, but it isn’t true. Functions aren’t sets; they operate on sets. Remember, a single value is also a set, so each function accepts a set as an argument and returns a set.
1 SELECT anything.today
2 FROM
3 ( SELECT now() AS today ) AS anything
My apologies for dropping a query on you without any domain context, but consider this slightly more complex function, which returns all the sibling branches of a given restaurant branch. The restaurant has many Restaurant Branches, and Restaurant Branch belongs to a Restaurant, to aid your understanding.
1 CREATE OR REPLACE FUNCTION co_branches(branch_id BIGINT)
2 RETURNS TABLE(id BIGINT)
3 AS $function$
4
5 SELECT b2.id
6 FROM
7 restaurant_branches b1
8 JOIN restaurant_branches b2 ON b1.restaurant_id = b2.restaurant_id
9 WHERE b1.id = branch_id;
10
11 $function$
12 LANGUAGE SQL;
13
14 --- used simply:
15 SELECT * FROM COBRANCHES(42); -- Co-branches of Branch#42
16 --- But more powerful, when used like:
17 SELECT id, (SELECT COUNT(*) FROM COBRANCHES(id)) as branch_count FROM restaurant_branches;
We’re replacing a query with a function. And we learned that both SELECT queries and Functions operate on sets and return sets.
This brings me to another assertion: Sets and set operations tend to compose well. This is very important! Functional programming nerds practically live by this motto. Once you have small units that compose well, you can build complex units with relative simplicity.
Fundamentally, SQL is not so different. An important thing to keep in mind is that this composing behavior is mainly about the data and how the data is interpreted and processed. The query language leaves much to be desired when it comes to composing. A lot of things like aliases and joins can easily be taken care of by a competent library. But, more on this and the advantages of using something like ARel in a later post.
Reading is destructuring; Writing is composing
Let’s collect all the set-like behavior we’ve seen so far:
We can interact with a set by looking at all or some of its elements
A subset of a set is also a set in itself
A set can have a label or an alias
A set can be that of a single element as well
A set can be unioned or intersected with another set
Reading or writing complex queries becomes much easier if we think of it as composing queries together or decomposing a large query into smaller parts.
How to read complex queries
Start with the innermost or smallest “SELECT” clauses
Replace them with an appropriately and descriptively named function, say
co_branches_of_given_branch
instead of justco_branches
.If these inner queries (now functions) use a column/value from the outer queries, treat them as function arguments. (
co_branches
usedbranch_id
)
Keep applying this method until you reach the outermost query.
How to write complex queries
This boils down to a top-down vs. bottom-up approach. If you’re a top-down person: – Write the top-most query, assume all the lower-level functions exist, with appropriate and descriptive names. – Recursively apply the same strategy to each lower-level function
Conversely, if you’re a bottom-up person: – Figure out the lowest level functions/queries you need, and write them – Build up your larger query by composing these functions.
💡 Food for thought: How many nested select queries does your favorite relational database allow?↩