How do you write a SQL statement?

This article covers some best practices for writing SQL queries for data analysts and data scientists. Most of our discussion will concern SQL in general, but we’ll include some notes on features specific to Metabase that make writing SQL a breeze.

Correctness, readability, then optimization: in that order

The standard warning against premature optimization applies here. Avoid tuning your SQL query until you know your query returns the data you’re looking for. And even then, only prioritize optimizing your query if it’s run frequently (like powering a popular dashboard), or if the query traverses a large number of rows. In general, prioritize accuracy (does the query produce the intended results), and readability (can others easily understand and modify the code) before worrying about performance.

Make your haystacks as small as possible before searching for your needles

Arguably, we’re already getting into optimization here, but the goal should be to tell the database to scan the minimum number of values necessary to retrieve your results.

Part of SQL’s beauty is its declarative nature. Instead of telling the database how to retrieve records, you need only tell the database which records you need, and the database should figure out the most efficient way to get that information. Consequently, much of the advice about improving the efficiency of queries is simply about showing people how to use the tools in SQL to articulate their needs with more precision.

We’ll review the general order of query execution, and include tips along the way to reduce your search space. Then we’ll talk about three essential tools to add to your utility belt: , , and .

First, get to know your data

Familiarize yourself with your data before your write a single line of code by studying the metadata to make sure that a column really does contain the data you expect. The SQL editor in Metabase features a handy data reference tab (accessible via the book icon), where you can browse through the tables in your database, and view their columns and connections (figure 1):

How do you write a SQL statement?
Fig. 1. Use the Data Reference sidebar to view a table's fields.

You can also view sample values for specific columns (figure 2).

How do you write a SQL statement?
Fig. 2. Use the Data Reference sidebar to view sample data.

Metabase gives you many different ways to explore your data: you can X-ray tables, compose questions using the query builder and Notebook Editor, convert a saved question to SQL code, or build from an existing native query. We cover this in other articles; for now, let’s go through the general workflow of a query.

Developing your query

Everyone’s method will differ, but here’s an example workflow to follow when developing a query.

  • As above, study the column and table metadata. If you’re using Metabase’s native query editor, you can also search for SQL snippets that contain SQL code for the table and columns you’re working with. Snippets allow you to see how other analysts have been querying the data. Or you can start a query from an existing SQL question.
  • To get a feel for a table’s values, SELECT * from the tables you’re working with and LIMIT your results. Keep the LIMIT applied as you refine your columns (or add more columns via joins).
  • Narrow down the columns to the minimal set required to answer your question.
  • Apply any filters to those columns.
  • If you need to aggregate data, aggregate a small number of rows and confirm that the aggregations are as you expect.
  • Once you have a query returning the results you need, look for sections of the query to save as a Common Table Expression (CTE) to encapsulate that logic.
  • With Metabase, you can also save code as a SQL snippet to share and reuse in other queries.

The general order of query execution

Before we get into individual tips on writing SQL code, it’s important to have a sense of how databases will carry out your query. This differs from the reading order (left to right, top to bottom) you use to compose your query. Query optimizers can change the order of the following list, but this general lifecycle of a SQL query is good to keep in mind when writing SQL. We’ll use the execution order to group the tips on writing good SQL that follow.

The rule of thumb here is this: the earlier in this list you can eliminate data, the better.

  1. (and JOIN) get(s) the tables referenced in the query. These tables represent the maximum search space specified by your query. Where possible, restrict this search space before moving forward.
  2. filters data.
  3. aggregates data.
  4. filters out aggregated data that doesn’t meet the criteria.
  5. grabs the columns (then deduplicates rows if DISTINCT is invoked).
  6. merges the selected data into a results set.
  7. sorts the results.

And, of course, there will always be occasions where the query optimizer for your particular database will devise a different query plan, so don’t get hung up on this order.

Some query guidelines (not rules)

The following tips are guidelines, not rules, intended to keep you out of trouble. Each database handles SQL differently, has a slightly different set of functions, and takes different approaches to optimizing queries. And that’s before we even get into comparing traditional transactional databases with analytics databases that use columnar storage formats, which have vastly different performance characteristics.

Comment your code, especially the why

Help people out (including yourself three months from now) by adding comments that explain different parts of the code. The most important thing to capture here is the “why.” For example, it’s obvious that the code below filters out orders with

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
6 greater than 10, but the reason it’s doing that is because the first 10 orders are used for testing.

SELECT
  id,
  product
FROM
  orders
-- filter out test orders
WHERE
  order.id > 10

The catch here is that you introduce a little maintenance overhead: if you change the code, you need to make sure that the comment is still relevant and up to date. But that’s a small price to pay for readable code.

SQL best practices for FROM

Join tables using the ON keyword

Although it’s possible to “join” two tables using a

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clause (that is, to perform an implicit join, like
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
8), you should instead prefer an explicit JOIN:

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id

Mostly for readability, as the

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
9 +
SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id
0 syntax distinguishes joins from
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clauses intended to filter the results.

Alias multiple tables

When querying multiple tables, use aliases, and employ those aliases in your select statement, so the database (and your reader) doesn’t need to parse which column belongs to which table. Note that if you have columns with the same name across multiple tables, you will need to explicitly reference them with either the table name or alias.

Avoid

SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id

Prefer

SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id

This is a trivial example, but when the number of tables and columns in your query increases, your readers won’t have to track down which column is in which table. That and your queries might break if you join a table with an ambiguous column name (e.g., both tables include a field called

SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id
2.

Note that field filters are incompatible with table aliases, so you’ll need to remove aliases when connecting filter widgets to your Field Filters.

SQL best practices for WHERE

Filter with WHERE before HAVING

Use a

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clause to filter superfluous rows, so you don’t have to compute those values in the first place. Only after removing irrelevant rows, and after aggregating those rows and grouping them, should you include a
SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id
4 clause to filter out aggregates.

Avoid functions on columns in WHERE clauses

Using a function on a column in a

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clause can really slow down your query, as the function makes the query non-sargable (i.e., it prevents the database from using an index to speed up the query). Instead of using the index to skip to the relevant rows, the function on the column forces the database to run the function on each row of the table.

And remember, the concatenation operator

SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id
6 is also a function, so don’t get fancy trying to concat strings to filter multiple columns. Prefer multiple conditions instead:

Avoid

SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'

Prefer

SELECT hero, sidekick
FROM superheros
WHERE
  hero = 'Batman'
  AND
  sidekick = 'Robin'

Prefer
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
4 to
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
5

This is not always the case. It’s good to know that

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
5 compares characters, and can be paired with wildcard operators like
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
0, whereas the
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
4 operator compares strings and numbers for exact matches. The
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
4 can take advantage of columns. This isn’t the case with all databases, as
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
5 can use indexes (if they exist for the field) as long as you avoid prefixing the search term with the wildcard operator,
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
0. Which brings us to our next point:

Avoid bookending wildcards in WHERE statements

Using wildcards for searching can be expensive. Prefer adding wildcards to the end of strings. Prefixing a string with a wildcard can lead to a full table scan.

Avoid

SELECT column FROM table WHERE col LIKE "%wizar%"

Prefer

SELECT column FROM table WHERE col LIKE "wizar%"

Prefer EXISTS to IN

If you just need to verify the existence of a value in a table, prefer

SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
5 to
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
6, as the
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
5 process exits as soon as it finds the search value, whereas
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
6 will scan the entire table.
SELECT
  books.title,
  authors.last_name,
  authors.first_name
FROM fiction_books AS books
  LEFT JOIN fiction_authors AS authors
  ON books.author_id = authors.id
6 should be used for finding values in lists.

Similarly, prefer

SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
0 to
SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
1.

SQL best practices for GROUP BY

Order multiple groupings by descending cardinality

Where possible,

SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
2 columns in order of descending cardinality. That is, group by columns with more unique values first (like IDs or phone numbers) before grouping by columns with fewer distinct values (like state or gender).

SQL best practices for HAVING

Only use HAVING for filtering aggregates

And before

SELECT
  title,
  last_name,
  first_name
FROM fiction_books
  LEFT JOIN fiction_authors
  ON fiction_books.author_id = fiction_authors.id
4, filter out values using a
SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clause before aggregating and grouping those values.

SQL best practices for SELECT

SELECT columns, not stars

Specify the columns you’d like to include in the results (though it’s fine to use

SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
5 when first exploring tables — just remember to
SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
6 your results).

SQL best practices for UNION

Prefer UNION All to UNION

If duplicates are not an issue,

SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
7 won’t discard them, and since
SELECT hero, sidekick
FROM superheros
WHERE hero || sidekick = 'BatmanRobin'
7 isn’t tasked with removing duplicates, the query will be more efficient.

SQL best practices for ORDER BY

Avoid sorting where possible, especially in subqueries

Sorting is expensive. If you must sort, make sure your subqueries are not needlessly sorting data.

SQL best practices for INDEX

This section is for the database admins in the crowd (and a topic too large to fit in this article). One of the most common things folks run into when experiencing performance issues in database queries is a lack of adequate indexing.

Which columns you should index usually depends on the columns you’re filtering by (i.e., which columns typically end up in your

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clauses). If you find that you’re always filtering by a common set of columns, you should consider indexing those columns.

Adding indexes

Indexing foreign key columns and frequently queried columns can significantly decrease query times. Here’s an example statement to create an index:

CREATE INDEX product_title_index ON products (title)

There are different types of indexes available, the most common index type uses a B-tree to speed up retrieval. Check out our article on making dashboards faster, and consult your database’s documentation on how to create an index.

Use partial indexes

For particularly large datasets, or lopsided datasets, where certain value ranges appear more frequently, consider creating an index with a

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
7 clause to limit the number of rows indexed. Partial indexes can also be useful for date ranges as well, for example if you want to index the past week of data only.

Use composite indexes

For columns that typically go together in queries (such as last_name, first_name), consider creating a composite index. The syntax is similar to creating a single index. For example:

CREATE INDEX full_name_index ON customers (last_name, first_name)

EXPLAIN

Look for bottlenecks

Some databases, like PostgreSQL, offer insight into the query plan based on your SQL code. Simply prefix your code with the keywords

SELECT hero, sidekick
FROM superheros
WHERE
  hero = 'Batman'
  AND
  sidekick = 'Robin'
1. You can use these commands to check your query plans and look for bottlenecks, or to compare plans from one version of your query to another to see which version is more efficient.

Here’s an example query using the

SELECT hero, sidekick
FROM superheros
WHERE
  hero = 'Batman'
  AND
  sidekick = 'Robin'
2 sample database available for PostgreSQL.

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
0

And the output:

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
1

You’ll see milliseconds required for planning time, execution time, as well as the cost, rows, width, times, loops, memory usage, and more. Reading these analyses is somewhat of an art, but you can use them to identify problem areas in your queries (such as nested loops, or columns that could benefit from indexing), as you refine them.

Here’s PostreSQL’s documentation on using EXPLAIN.

WITH

Organize your queries with Common Table Expressions (CTE)

Use the

SELECT hero, sidekick
FROM superheros
WHERE
  hero = 'Batman'
  AND
  sidekick = 'Robin'
3 clause to encapsulate logic in a common table expression (CTE). Here’s an example of a query that looks for the products with the highest average revenue per unit sold in 2019, as well as max and min values.

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
2

The

SELECT hero, sidekick
FROM superheros
WHERE
  hero = 'Batman'
  AND
  sidekick = 'Robin'
3 clause makes the code readable, as the main query (what you’re actually looking for) isn’t interrupted by a long sub query.

You can also use CTEs to make your SQL more readable if, for example, your database has fields that are awkwardly named, or that require a little bit of data munging to get the useful data. For example, CTEs can be useful when working with JSON fields. Here’s an example of extracting and converting fields from a JSON blob of user events.

SELECT
  o.id,
  o.total,
  p.vendor
FROM
  orders AS o
  JOIN products AS p ON o.product_id = p.id
3

Alternatively, you could save a subquery as a SQL snippet (figure 3 - note the parentheses around the snippet), to easily reuse that code in other queries.

How do you write a SQL statement?
Fig. 3. Storing a subquery in a snippet and using it in a FROM clause.

And yes, as you might expect, the Aerodynamic Leather Toucan fetches the highest average revenue per unit sold.

With Metabase, you don’t even have to use SQL

SQL is amazing. But so are Metabase’s Query Builder and Notebook Editor. You can compose queries using Metabase’s graphical interface to join tables, filter and summarize data, create custom columns, and more. And with custom expressions, you can handle the vast majority of analytical use cases, without ever needing to reach for SQL. Questions composed using the Notebook Editor also benefit from automatic drill-through, which allows viewers of your charts to click through and explore the data, a feature not available to questions written in SQL.

Glaring errors or omissions?

There are libraries of books on SQL, so we’re only scratching the surface here. You can share the secrets of your SQL sorcery with other Metabase users on our forum.

What is an example of an SQL statement?

An SQL SELECT statement retrieves records from a database table according to clauses (for example, FROM and WHERE ) that specify criteria. The syntax is: SELECT column1, column2 FROM table1, table2 WHERE column2='value';

What are the basic SQL statements?

Some of The Most Important SQL Commands.
SELECT - extracts data from a database..
UPDATE - updates data in a database..
DELETE - deletes data from a database..
INSERT INTO - inserts new data into a database..
CREATE DATABASE - creates a new database..
ALTER DATABASE - modifies a database..
CREATE TABLE - creates a new table..

What are the 5 SQL statements?

Types of SQL Statements.
Data Definition Language (DDL) Statements..
Data Manipulation Language (DML) Statements..
Transaction Control Statements..
Session Control Statements..
System Control Statement..
Embedded SQL Statements..

What is the syntax of SQL statement?

All the SQL statements start with any of the keywords like SELECT, INSERT, UPDATE, DELETE, ALTER, DROP, CREATE, USE, SHOW and all the statements end with a semicolon (;). The most important point to be noted here is that SQL is case insensitive, which means SELECT and select have same meaning in SQL statements.