Skip to content

SQL Fundamentals

What is SQL?

SQL is a language used to access data stored in relational database.

Note

Some of the examples below use a general case, e.g., column_name and table_name, while others use specific examples, e.g., year and movies.

List of commands can be found here.

Databases that use SQL include, MySQL, SQLite, and PostgreSQL

When getting started, or for practice, you can checkout the sqlite3 Python library. Using this library, you can run SQL directly in application code using string literals.

The key data types to remember are INTEGER, REAL (float), VARCHAR (small-medium strings), TEXT (large strings), and DATE (string of form YYYY-MM-DD). It is also good to know NULL.

Basic CRUD Operations

Create

CREATE TABLE

Create a whole new table in the database, you probably shouldn't use this often unless starting a new database.

sql
CREATE TABLE table_name ( 
  column dtype, 
  ...
);
CREATE TABLE table_name ( 
  column dtype, 
  ...
);

INSERT INTO

Add a row to the table by specifying which columns to add and their values.

sql
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);

Read

SELECT

Read a column, or set of columns, from the table.

sql
SELECT * FROM table_name;               -- Select all columns
SELECT column FROM table_name;     --Select one column
SELECT column1, column2 from table_name; --Select many columns
SELECT * FROM table_name;               -- Select all columns
SELECT column FROM table_name;     --Select one column
SELECT column1, column2 from table_name; --Select many columns

Update

ALTER TABLE

Adds a column to a table.

sql
ALTER TABLE table_name
ADD COLUMN column_name TEXT;
ALTER TABLE table_name
ADD COLUMN column_name TEXT;

UPDATE ... SET

Edit a row, or multiple rows, in a table.

sql
--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;
--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;
sql
--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
    WHEN sex = "m" THEN "f"
    WHEN sex = "f" THEN "m"
	  END;
--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
    WHEN sex = "m" THEN "f"
    WHEN sex = "f" THEN "m"
	  END;

Delete

DELETE FROM

Deletes a row, or multiple rows, from a table.

sql
DELETE FROM table_name
WHERE column_name IS NULL;
DELETE FROM table_name
WHERE column_name IS NULL;

Constraints

Add constraints to columns to enforce specific behaviour about those columns. Some important constraints are PRIMARY KEY, UNIQUE, NOT NULL, DEFAULT.

Modifiers & Filters

AS

Alias the result of a retrieved column.

sql
SELECT column AS "new_column_name" FROM table_name;
SELECT column AS "new_column_name" FROM table_name;
Attention

When aliasing something with AS, always put the alias in double quotes "<alias>", otherwise some SQL parsers may throw a runtime error.

DISTINCT

Filters all unique (de-duplicated) column entries.

sql
SELECT DISTINCT column FROM table_name;
SELECT DISTINCT column FROM table_name;

WHERE

Filters output based on the provided condition. WHERE supports the following operators =, !=, >, <, >=, <=.

sql
SELECT column1 FROM table_name WHERE column2 > n;
SELECT column1 FROM table_name WHERE column2 > n;
Attention

You cannot use aggregate functions in WHERE clauses. Use HAVING instead, upon grouping.

WHERE ... LIKE

Further filters outputs from a where clause using wildcards such as _ (single character) or % (zero or more characters).

sql
SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere
SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere

WHERE ... BETWEEN

Further filter outputs from a range of values.

sql
SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;
SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;

WHERE ... IS NULL

Use this or IS NOT NULL to query entries that are missing data.

sql
SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns
SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns
Tip

You can combine modifiers after a WHERE statement by using operators like AND or OR. Ensure that you specify the column and the glob pattern for each condition.

Example (see this problem)

sql
SELECT * FROM Patients 
WHERE 
  conditions LIKE 'DIAB1%' OR 
  conditions LIKE '% DIAB1%';
SELECT * FROM Patients 
WHERE 
  conditions LIKE 'DIAB1%' OR 
  conditions LIKE '% DIAB1%';

Example (see this problem)

sql
SELECT name, population, area FROM World
WHERE 
  population >= 25000000 OR
  area >= 3000000;
SELECT name, population, area FROM World
WHERE 
  population >= 25000000 OR
  area >= 3000000;

OFFSET

Return with an offset from the top index

sql
SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary
SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary

Sorting & Ordering

ORDER BY

Sorts the result by the column specified.

sql
SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending
SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending

GROUP BY

Returns the result in groupings. Typically used with aggregate functions. Should succeed WHERE (if used) and precede ORDER BY. You may group by multiple columns, such that GROUP BY X, Y will put all those with the same value for both X and Y in one group. See

sql
--Returns the total number of downloads per category
SELECT category, SUM(downloads) 
FROM fake_apps 
GROUP BY category;
--Returns the total number of downloads per category
SELECT category, SUM(downloads) 
FROM fake_apps 
GROUP BY category;
Attention

Whenever you use a GROUP BY clause you MUST also include an aggregate function in the SELECT statement.

GROUP BY ... HAVING

Similar to where, but filters groups instead of rows. All WHERE clauses work with HAVING.

sql
SELECT year,  
   genre,  
   COUNT(name)  
FROM movies  
GROUP BY 1, 2  
HAVING COUNT(name) > 10;
SELECT year,  
   genre,  
   COUNT(name)  
FROM movies  
GROUP BY 1, 2  
HAVING COUNT(name) > 10;

Limiting

LIMIT

Only returns the number of entries specified by the limit. LIMIT should always come last.

sql
SELECT * FROM movies LIMIT 10;
SELECT * FROM movies LIMIT 10;

Logic & Flow Control

IF()

Impose a conditional on a selection.

sql
SELECT
  IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;
SELECT
  IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;

CASE

One way to handle if-then logic in SQL, like a switch case.

sql
SELECT name, --This `,` is important!
  CASE
    WHEN genre = 'comedy' THEN 'Chill'
    WHEN genre = 'romance' THEN 'Chill'
    ELSE 'Intense'
  END AS 'Mood' --Good practice to rename the output column
FROM movies;
SELECT name, --This `,` is important!
  CASE
    WHEN genre = 'comedy' THEN 'Chill'
    WHEN genre = 'romance' THEN 'Chill'
    ELSE 'Intense'
  END AS 'Mood' --Good practice to rename the output column
FROM movies;

IFNULL()

Sets the row to another value if a null value is presently encountered.

sql
SELECT
	IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;
SELECT
	IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;

Aggregates & Calculations

Aggregate functions all use a similar syntax,

sql
SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name;  --Counts all entries in table, including nulls
SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name;  --Counts all entries in table, including nulls
Counting

You cannot count NULLs, they are omitted from counting calculations. This has an impact on how to solve LC 1581 Customer Who Visited but Did Not Make Any Transactions.

Common aggregate functions include

  • COUNT(col)
  • SUM(col)
  • MIN(col) / MAX(col)
  • AVG(col)
  • ROUND(col, n_decimals)

CONCAT()

Concatenates two or more strings.

sql
CONCAT(argument1, argument2, ...)
CONCAT(argument1, argument2, ...)

UPPER() and LOWER()

Converts a string into upper case or lower case lettering.

sql
UPPER(username)
UPPER(username)

SUBSTRING()

Used to slice strings into different sub-strings.

sql
--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`
--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`

STRING_AGG() and GROUP_CONCAT()

Used for collecting string entries into a single entry. See this thread on stack overflow. STRING_AGG() is typical usage, but GROUP_CONCAT() is used in MySQL. For example, the two below are equivalent.

sql
--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');
--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');

RANK() and DENSE_RANK()

Used to rank aggregate outputs.

sql
--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
  DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;
--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
  DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;

See LC 185 Department Top Three Salaries.

Date Functions

DATE_ADD()

Used to add a fixed interval to a date. Useful for checking for missing records in combination with LEAD() and LAG()

sql
DATE_ADD(column, INTERVAL 1 DAY)
DATE_ADD(column, INTERVAL 1 DAY)

Window Functions

See this cheat sheet for a good primer on windowing functions.

LEAD() and LAG()

Used to reference rows with a known offset to the current row.

sql
SELECT
  LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;
SELECT
  LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;

Multiple Tables

When you use an ID from one table to gather additional information from another table, this is called joining. Understanding how to join tables effectively is an important skill to have when working with SQL. Think of joining as SQL's analog to cross referencing two datasets. In order to create a join, you need to have two columns that reference the same underlying entity, such as a User ID.

Types of joins

Inner join: In this join, the rows of different tables will be combined if the join condition is true. However any fields that don't match the condition will be omitted in the returned value. In this way it's making the result set smaller, think of it like an inner product. Left join: Sometimes called an outer join. Here, any unmatched entries from the former table (FROM table1) will be kept, and unmatched entries from the latter table (LEFT JOIN table2) will be padded with NULL. Cross join: Joins all elements of one table with elements of the other, e.g., if you have 3 shirts in table1 and 2 pants in table2, cross joining these tables would return 6 entries for all the possible combinations.

Tip

You can filter on a JOIN by adding an AND after the ON keyword, for example, see the solution to LC 1158 Market Analysis I.

JOIN ... ON

Example of an inner join

sql
SELECT column_name(s)
FROM table_1
JOIN table_2
  ON table_1.column_name = table_2.column_name;
SELECT column_name(s)
FROM table_1
JOIN table_2
  ON table_1.column_name = table_2.column_name;

Example of an outer join

sql
SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
  ON table_1.column_name = table_2.column_name;
SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
  ON table_1.column_name = table_2.column_name;

Example of a cross join

sql
SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;
SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;

Primary and Foreign Keys

A primary key uniquely identifies each row in a table. When the primary key of one table is found in another table, it is called a foreign key and is used to reference rows in other tables. The most common types of joins will be joining a foreign key from one table with the primary key from another table.

UNION

Concatenates two tables together.

sql
SELECT * FROM table1  
UNION 
SELECT * FROM table2;
SELECT * FROM table1  
UNION 
SELECT * FROM table2;
Attention

SQL has strict rules for appending data — the schema's have to match:

  • Tables must have the same number of columns.
  • The columns must have the same data types in the same order as the first table.

WITH

Bind the output of a query to a variable (or alias) for joining calculations with other tables. Think of WITH as a being used to declare variables. A WITH statement is also referred to as a Common Table Expression (CTE).

sql
WITH previous_results AS (  
   SELECT ...  
   ...  
   ...  
   ...  
)  
SELECT *  
FROM previous_results  
JOIN customers  
  ON _____ = _____;
WITH previous_results AS (  
   SELECT ...  
   ...  
   ...  
   ...  
)  
SELECT *  
FROM previous_results  
JOIN customers  
  ON _____ = _____;
Do's and don'ts of WITH and CTEs

If possible, use joins or sub-queries instead of WITH. This will achieve better performance. See LC 185 Department Top Three Salaries for an example of how to replace a WITH statement with sub-queries.

  • Do use with complicated queries to improve readability and maintenance.
  • Do use when you need to re-use the query
  • Do use for debugging
  • Don't use when a standard JOIN or LEFT JOIN can achieve the same task (due to performance impact)
  • Don't use when simple sub-queries will suffice.

Declaring Variables

Use variables as follows, place them after the BEGIN keyword in the function.

sql
--First, declare your variable and its type
DECLARE M INT;

--Next, set the variable
SET M = N - 1;      --Option 2: Use the `SET` keyword
--First, declare your variable and its type
DECLARE M INT;

--Next, set the variable
SET M = N - 1;      --Option 2: Use the `SET` keyword