SQL Fundamentals

What is SQL?

SQL is a language used to access data stored in relational database.

Note

Some of the examples below use a general case, e.g., column_name and table_name, while others use specific examples, e.g., year and movies.

List of commands can be found here.

Databases that use SQL include, MySQL, SQLite, and PostgreSQL

When getting started, or for practice, you can checkout the sqlite3 Python library. Using this library, you can run SQL directly in application code using string literals.

The key data types to remember are INTEGER, REAL (float), VARCHAR (small-medium strings), TEXT (large strings), and DATE (string of form YYYY-MM-DD). It is also good to know NULL.

Basic CRUD Operations

Create

`CREATE TABLE`

Create a whole new table in the database, you probably shouldn't use this often unless starting a new database.

sql

CREATE TABLE table_name ( 
  column dtype, 
  ...
);

CREATE TABLE table_name ( 
  column dtype, 
  ...
);

`INSERT INTO`

Add a row to the table by specifying which columns to add and their values.

sql

INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);

INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);

Read

`SELECT`

Read a column, or set of columns, from the table.

sql

SELECT * FROM table_name;               -- Select all columns
SELECT column FROM table_name;     --Select one column
SELECT column1, column2 from table_name; --Select many columns

SELECT * FROM table_name;               -- Select all columns
SELECT column FROM table_name;     --Select one column
SELECT column1, column2 from table_name; --Select many columns

Update

`ALTER TABLE`

Adds a column to a table.

sql

ALTER TABLE table_name
ADD COLUMN column_name TEXT;

ALTER TABLE table_name
ADD COLUMN column_name TEXT;

`UPDATE ... SET`

Edit a row, or multiple rows, in a table.

sql

--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;

--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;

sql

--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
    WHEN sex = "m" THEN "f"
    WHEN sex = "f" THEN "m"
	  END;

--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
    WHEN sex = "m" THEN "f"
    WHEN sex = "f" THEN "m"
	  END;

Delete

`DELETE FROM`

Deletes a row, or multiple rows, from a table.

sql

DELETE FROM table_name
WHERE column_name IS NULL;

DELETE FROM table_name
WHERE column_name IS NULL;

Constraints

Add constraints to columns to enforce specific behaviour about those columns. Some important constraints are PRIMARY KEY, UNIQUE, NOT NULL, DEFAULT.

Modifiers & Filters

`AS`

Alias the result of a retrieved column.

sql

SELECT column AS "new_column_name" FROM table_name;

SELECT column AS "new_column_name" FROM table_name;

Attention

When aliasing something with AS, always put the alias in double quotes "<alias>", otherwise some SQL parsers may throw a runtime error.

`DISTINCT`

Filters all unique (de-duplicated) column entries.

sql

SELECT DISTINCT column FROM table_name;

SELECT DISTINCT column FROM table_name;

`WHERE`

Filters output based on the provided condition. WHERE supports the following operators =, !=, >, <, >=, <=.

sql

SELECT column1 FROM table_name WHERE column2 > n;

SELECT column1 FROM table_name WHERE column2 > n;

Attention

You cannot use aggregate functions in WHERE clauses. Use HAVING instead, upon grouping.

`WHERE ... LIKE`

Further filters outputs from a where clause using wildcards such as _ (single character) or % (zero or more characters).

sql

SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere

SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere

`WHERE ... BETWEEN`

Further filter outputs from a range of values.

sql

SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;

SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;

`WHERE ... IS NULL`

Use this or IS NOT NULL to query entries that are missing data.

sql

SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns

SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns

Tip

You can combine modifiers after a WHERE statement by using operators like AND or OR. Ensure that you specify the column and the glob pattern for each condition.

Example (see this problem)

sql

SELECT * FROM Patients 
WHERE 
  conditions LIKE 'DIAB1%' OR 
  conditions LIKE '% DIAB1%';

SELECT * FROM Patients 
WHERE 
  conditions LIKE 'DIAB1%' OR 
  conditions LIKE '% DIAB1%';

Example (see this problem)

sql

SELECT name, population, area FROM World
WHERE 
  population >= 25000000 OR
  area >= 3000000;

SELECT name, population, area FROM World
WHERE 
  population >= 25000000 OR
  area >= 3000000;

`OFFSET`

Return with an offset from the top index

sql

SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary

SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary

Sorting & Ordering

`ORDER BY`

Sorts the result by the column specified.

sql

SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending

SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending

`GROUP BY`

Returns the result in groupings. Typically used with aggregate functions. Should succeed WHERE (if used) and precede ORDER BY. You may group by multiple columns, such that GROUP BY X, Y will put all those with the same value for both X and Y in one group. See

sql

--Returns the total number of downloads per category
SELECT category, SUM(downloads) 
FROM fake_apps 
GROUP BY category;

--Returns the total number of downloads per category
SELECT category, SUM(downloads) 
FROM fake_apps 
GROUP BY category;

Attention

Whenever you use a GROUP BY clause you MUST also include an aggregate function in the SELECT statement.

`GROUP BY ... HAVING`

Similar to where, but filters groups instead of rows. All WHERE clauses work with HAVING.

sql

SELECT year,  
   genre,  
   COUNT(name)  
FROM movies  
GROUP BY 1, 2  
HAVING COUNT(name) > 10;

SELECT year,  
   genre,  
   COUNT(name)  
FROM movies  
GROUP BY 1, 2  
HAVING COUNT(name) > 10;

Limiting

`LIMIT`

Only returns the number of entries specified by the limit. LIMIT should always come last.

sql

SELECT * FROM movies LIMIT 10;

SELECT * FROM movies LIMIT 10;

Logic & Flow Control

`IF()`

Impose a conditional on a selection.

sql

SELECT
  IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;

SELECT
  IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;

`CASE`

One way to handle if-then logic in SQL, like a switch case.

sql

SELECT name, --This `,` is important!
  CASE
    WHEN genre = 'comedy' THEN 'Chill'
    WHEN genre = 'romance' THEN 'Chill'
    ELSE 'Intense'
  END AS 'Mood' --Good practice to rename the output column
FROM movies;

SELECT name, --This `,` is important!
  CASE
    WHEN genre = 'comedy' THEN 'Chill'
    WHEN genre = 'romance' THEN 'Chill'
    ELSE 'Intense'
  END AS 'Mood' --Good practice to rename the output column
FROM movies;

`IFNULL()`

Sets the row to another value if a null value is presently encountered.

sql

SELECT
	IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;

SELECT
	IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;

Aggregates & Calculations

Aggregate functions all use a similar syntax,

sql

SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name;  --Counts all entries in table, including nulls

SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name;  --Counts all entries in table, including nulls

Counting

You cannot count NULLs, they are omitted from counting calculations. This has an impact on how to solve LC 1581 Customer Who Visited but Did Not Make Any Transactions.

Common aggregate functions include

COUNT(col)
SUM(col)
MIN(col) / MAX(col)
AVG(col)
ROUND(col, n_decimals)

`CONCAT()`

Concatenates two or more strings.

sql

CONCAT(argument1, argument2, ...)

CONCAT(argument1, argument2, ...)

`UPPER()` and `LOWER()`

Converts a string into upper case or lower case lettering.

sql

UPPER(username)

UPPER(username)

`SUBSTRING()`

Used to slice strings into different sub-strings.

sql

--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`

--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`

`STRING_AGG()` and `GROUP_CONCAT()`

Used for collecting string entries into a single entry. See this thread on stack overflow. STRING_AGG() is typical usage, but GROUP_CONCAT() is used in MySQL. For example, the two below are equivalent.

sql

--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');

--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');

`RANK()` and `DENSE_RANK()`

Used to rank aggregate outputs.

sql

--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
  DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;

--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
  DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;

See LC 185 Department Top Three Salaries.

Date Functions

`DATE_ADD()`

Used to add a fixed interval to a date. Useful for checking for missing records in combination with LEAD() and LAG()

sql

DATE_ADD(column, INTERVAL 1 DAY)

DATE_ADD(column, INTERVAL 1 DAY)

Window Functions

See this cheat sheet for a good primer on windowing functions.

`LEAD()` and `LAG()`

Used to reference rows with a known offset to the current row.

sql

SELECT
  LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;

SELECT
  LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;

Multiple Tables

When you use an ID from one table to gather additional information from another table, this is called joining. Understanding how to join tables effectively is an important skill to have when working with SQL. Think of joining as SQL's analog to cross referencing two datasets. In order to create a join, you need to have two columns that reference the same underlying entity, such as a User ID.

Types of joins

Inner join: In this join, the rows of different tables will be combined if the join condition is true. However any fields that don't match the condition will be omitted in the returned value. In this way it's making the result set smaller, think of it like an inner product. Left join: Sometimes called an outer join. Here, any unmatched entries from the former table (FROM table1) will be kept, and unmatched entries from the latter table (LEFT JOIN table2) will be padded with NULL. Cross join: Joins all elements of one table with elements of the other, e.g., if you have 3 shirts in table1 and 2 pants in table2, cross joining these tables would return 6 entries for all the possible combinations.

Tip

You can filter on a JOIN by adding an AND after the ON keyword, for example, see the solution to LC 1158 Market Analysis I.

`JOIN ... ON`

Example of an inner join

sql

SELECT column_name(s)
FROM table_1
JOIN table_2
  ON table_1.column_name = table_2.column_name;

SELECT column_name(s)
FROM table_1
JOIN table_2
  ON table_1.column_name = table_2.column_name;

Example of an outer join

sql

SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
  ON table_1.column_name = table_2.column_name;

SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
  ON table_1.column_name = table_2.column_name;

Example of a cross join

sql

SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;

SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;

Primary and Foreign Keys

A primary key uniquely identifies each row in a table. When the primary key of one table is found in another table, it is called a foreign key and is used to reference rows in other tables. The most common types of joins will be joining a foreign key from one table with the primary key from another table.

`UNION`

Concatenates two tables together.

sql

SELECT * FROM table1  
UNION 
SELECT * FROM table2;

SELECT * FROM table1  
UNION 
SELECT * FROM table2;

Attention

SQL has strict rules for appending data — the schema's have to match:
Tables must have the same number of columns.
The columns must have the same data types in the same order as the first table.

`WITH`

Bind the output of a query to a variable (or alias) for joining calculations with other tables. Think of WITH as a being used to declare variables. A WITH statement is also referred to as a Common Table Expression (CTE).

sql

WITH previous_results AS (  
   SELECT ...  
   ...  
   ...  
   ...  
)  
SELECT *  
FROM previous_results  
JOIN customers  
  ON _____ = _____;

WITH previous_results AS (  
   SELECT ...  
   ...  
   ...  
   ...  
)  
SELECT *  
FROM previous_results  
JOIN customers  
  ON _____ = _____;

Do's and don'ts of WITH and CTEs

If possible, use joins or sub-queries instead of WITH. This will achieve better performance. See LC 185 Department Top Three Salaries for an example of how to replace a WITH statement with sub-queries.
Do use with complicated queries to improve readability and maintenance.
Do use when you need to re-use the query
Do use for debugging
Don't use when a standard JOIN or LEFT JOIN can achieve the same task (due to performance impact)
Don't use when simple sub-queries will suffice.

Declaring Variables

Use variables as follows, place them after the BEGIN keyword in the function.

sql

--First, declare your variable and its type
DECLARE M INT;

--Next, set the variable
SET M = N - 1;      --Option 2: Use the `SET` keyword

--First, declare your variable and its type
DECLARE M INT;

--Next, set the variable
SET M = N - 1;      --Option 2: Use the `SET` keyword

SQL Fundamentals ​

Basic CRUD Operations ​

Create ​

CREATE TABLE ​

INSERT INTO ​

Read ​

SELECT ​

Update ​

ALTER TABLE ​

UPDATE ... SET ​

Delete ​

DELETE FROM ​

Constraints ​

Modifiers & Filters ​

AS ​

DISTINCT ​

WHERE ​

WHERE ... LIKE ​

WHERE ... BETWEEN ​

WHERE ... IS NULL ​

OFFSET ​

Sorting & Ordering ​

ORDER BY ​

GROUP BY ​

GROUP BY ... HAVING ​

Limiting ​

LIMIT ​

Logic & Flow Control ​

IF() ​

CASE ​

IFNULL() ​

Aggregates & Calculations ​

CONCAT() ​

UPPER() and LOWER() ​

SUBSTRING() ​

STRING_AGG() and GROUP_CONCAT() ​

RANK() and DENSE_RANK() ​

Date Functions ​

DATE_ADD() ​

Window Functions ​

LEAD() and LAG() ​

Multiple Tables ​

JOIN ... ON ​

Primary and Foreign Keys ​

UNION ​

WITH ​

Declaring Variables ​

SQL Fundamentals

Basic CRUD Operations

Create

`CREATE TABLE`

`INSERT INTO`

Read

`SELECT`

Update

`ALTER TABLE`

`UPDATE ... SET`

Delete

`DELETE FROM`

Constraints

Modifiers & Filters

`AS`

`DISTINCT`

`WHERE`

`WHERE ... LIKE`

`WHERE ... BETWEEN`

`WHERE ... IS NULL`

`OFFSET`

Sorting & Ordering

`ORDER BY`

`GROUP BY`

`GROUP BY ... HAVING`

Limiting

`LIMIT`

Logic & Flow Control

`IF()`

`CASE`

`IFNULL()`

Aggregates & Calculations

`CONCAT()`

`UPPER()` and `LOWER()`

`SUBSTRING()`

`STRING_AGG()` and `GROUP_CONCAT()`

`RANK()` and `DENSE_RANK()`

Date Functions

`DATE_ADD()`

Window Functions

`LEAD()` and `LAG()`

Multiple Tables

`JOIN ... ON`

Primary and Foreign Keys

`UNION`

`WITH`

Declaring Variables