# Demystifying SQL Queries

### SQL is fundamentally a small simple syntax that mainly works with sets. Let's see how.

Let’s look at an average, everyday SQL query:

```
1 SELECT something
2 FROM table
3 WHERE conditions
```

Not hard to draw parallels to other known concepts:

FROM table: A set of elements

WHERE conditions: A filter operation on these elements

SELECT something: A presenter of an individual element

## Tables are sets

The bare minimum way of interacting with a set is to look at *some or all* of its elements. That is precisely what this query is doing. We’re selecting a subset of a set based on a few conditions and representing each element of this subset in some format. Recall that a subset of a set is *also a set* in itself. This allows us to do something like:

```
1 SELECT something
2 FROM (
3 SELECT something
4 FROM table
5 WHERE conditions
6 )
7 WHERE conditions
```

We have a nested query, where instead of selecting from a table, we’re selecting from the result of selecting from a table.

Sets can have labels or aliases, so:

```
1 SELECT a_name.something
2 FROM (
3 SELECT something
4 FROM table
5 WHERE conditions
6 ) as a_name
7 WHERE conditions
```

We seem to be doing some recursive-looking stuff, and I find it beautiful.

This also means an assertion is in order: **a select query operates upon a set and returns a set**. Naturally, a set can be that of a single element as well.

```
1 SELECT COUNT(*) FROM table WHERE conditions
2 --- better yet:
3 SELECT COUNT(*) as something FROM table WHERE conditions
```

Let’s substitute *this set* in our original query:

```
1 SELECT the same something
2 FROM
3 (SELECT COUNT(*) as something FROM table WHERE conditions)
4 WHERE conditions
```

So a select query operates upon a set and returns a set, and we know that sets can be unioned or intersected:

```
1 SELECT something
2 FROM (a set)
3 WHERE conditions
4 UNION -- or INTERSECTION
5 SELECT the same something
6 FROM (a similar set)
7 WHERE conditions
```

Notice that “something” and “the same something” are important. We can only union or intersection similar sets. Apples and oranges can’t be unioned in the relational algebra land.

## Joins are Sets

Let’s talk about joins. 😅 Chances are, at some point in life, you’ve written an INNER JOIN instead of an OUTER JOIN and got incorrect results. Or something along those lines. Joins can be very opaque, even to a regular practitioner.

A few things are essential when considering joins:

A join is a product of 2 sets. Always. Multi-table joins are “first join these two,” “take the result,” and “join the result with the next.”

Joins are always performed on sets. So you can “join” any of the above-mentioned sets, and you’re still good. Do note that language semantics dictate that you use aliases to disambiguate.

NULL is always a part of each set. Implicitly so, for practicality.

This is best explained through an example:

```
1 numbers = { 1, 2, 3 }
2 letters = { a, b, c }
```

Joining `numbers`

and `letters`

```
1 numbers | letters
2 --------|---------
3 NULL | NULL
4 NULL | a
5 NULL | b
6 NULL | c
7 1 | NULL
8 1 | a
9 1 | b
10 1 | c
11 2 | NULL
12 2 | a
13 2 | b
14 2 | c
15 3 | NULL
16 3 | a
17 3 | b
18 3 | c
```

Do you feel you are getting more than what you bargained for? Me too. Depending on the context, we’ll want different subsets of this mega-joined set. That’s precisely what different kinds of joins are for. These joins will determine what working set we’ll use.

Inner join: Do not consider the entries with NULL on either side.

Left outer join: Do not consider entries with NULL on the LEFT side.

Right outer join: Do not consider entries with NULL on the RIGHT side.

Full outer join: Consider all entries.

The entry where both sides are NULL is not considered for practical reasons. A slightly better example:

```
1 numbers = { [1, a], [2, b], [3, c] }
2 letters = { [a, x], [b, y], [c, z] }
```

```
1 JOIN
2 numbers AND letters
3 ON second element of number = first element of letter
```

Our working set:

```
1 numbers | letters
2 --------|---------
3 NULL | NULL
4 NULL | [a, x]
5 NULL | [b, y]
6 NULL | [c, z]
7 [1, a] | NULL
8 [1, a] | [a, x]
9 [1, a] | [b, y]
10 [1, a] | [c, z]
11 [2, b] | NULL
12 [2, b] | [a, x]
13 [2, b] | [b, y]
14 [2, b] | [c, z]
15 [3, c] | NULL
16 [3, c] | [a, x]
17 [3, c] | [b, y]
18 [3, c] | [c, z]
```

After applying conditions and removing both sides `NULL`

entry:

```
1 numbers | letters
2 --------|---------
3 NULL | [a, x]
4 NULL | [b, y]
5 NULL | [c, z]
6 [1, a] | NULL
7 [1, a] | [a, x]
8 [2, b] | NULL
9 [2, b] | [b, y]
10 [3, c] | NULL
11 [3, c] | [c, z]
```

Rearranging a little for better understanding:

```
1 numbers | letters | included in
2 --------|---------|-----------
3 [1, a] | [a, x] | Full, Inner, Left and Right
4 [2, b] | [b, y] | Full, Inner, Left and Right
5 [3, c] | [c, z] | Full, Inner, Left and Right
6 [1, a] | NULL | Full and Left
7 [2, b] | NULL | Full and Left
8 [3, c] | NULL | Full and Left
9 NULL | [a, x] | Full and Right
10 NULL | [b, y] | Full and Right
11 NULL | [c, z] | Full and Right
```

Left and Right are determined from the join syntax. When A joins B, A is Left, and B is Right.

While attempting to write a join query, I encourage you to work out your join on paper first with dummy data. Especially with multi-table joins.

## Functions are Sets

That sounded nice, but it isn’t true. Functions aren’t sets; they *operate* on sets. Remember, a single value is also a set, so **each function accepts a set as an argument and returns a set**.

```
1 SELECT anything.today
2 FROM
3 ( SELECT now() AS today ) AS anything
```

My apologies for dropping a query on you without any domain context, but consider this slightly more complex function, which returns all the sibling branches of a given restaurant branch. The restaurant has many Restaurant Branches, and Restaurant Branch belongs to a Restaurant, to aid your understanding.

```
1 CREATE OR REPLACE FUNCTION co_branches(branch_id BIGINT)
2 RETURNS TABLE(id BIGINT)
3 AS $function$
4
5 SELECT b2.id
6 FROM
7 restaurant_branches b1
8 JOIN restaurant_branches b2 ON b1.restaurant_id = b2.restaurant_id
9 WHERE b1.id = branch_id;
10
11 $function$
12 LANGUAGE SQL;
13
14 --- used simply:
15 SELECT * FROM COBRANCHES(42); -- Co-branches of Branch#42
16 --- But more powerful, when used like:
17 SELECT id, (SELECT COUNT(*) FROM COBRANCHES(id)) as branch_count FROM restaurant_branches;
```

We’re replacing a query with a function. And we learned that both SELECT queries and Functions operate on sets and return sets.

This brings me to another assertion: **Sets and set operations tend to compose well.** This is very important! Functional programming nerds practically live by this motto. Once you have small units that compose well, you can build complex units with relative simplicity.

Fundamentally, SQL is not so different. An important thing to keep in mind is that this composing behavior is mainly about the data and how the data is interpreted and processed. The query language leaves much to be desired when it comes to composing. A lot of things like aliases and joins can easily be taken care of by a competent library. But, more on this and the advantages of using something like ARel in a later post.

## Reading is destructuring; Writing is composing

Let’s collect all the set-like behavior we’ve seen so far:

We can interact with a set by looking at all or some of its elements

A subset of a set is

*also a set*in itselfA set can have a label or an alias

A set can be that of a single element as well

A set can be unioned or intersected with another set

Reading or writing complex queries becomes much easier if we think of it as composing queries together or decomposing a large query into smaller parts.

### How to read complex queries

Start with the innermost or smallest “SELECT” clauses

Replace them with an appropriately and descriptively named function, say

`co_branches_of_given_branch`

instead of just`co_branches`

.If these inner queries (now functions) use a column/value from the outer queries, treat them as function arguments. (

`co_branches`

used`branch_id`

)

Keep applying this method until you reach the outermost query.

### How to write complex queries

This boils down to a top-down vs. bottom-up approach. If you’re a top-down person: – Write the top-most query, assume all the lower-level functions exist, with appropriate and descriptive names. – Recursively apply the same strategy to each lower-level function

Conversely, if you’re a bottom-up person: – Figure out the lowest level functions/queries you need, and write them – Build up your larger query by composing these functions.

💡 Food for thought: How many nested select queries does your favorite relational database allow?↩