SQL Fundamentals
What is SQL?
SQL is a language used to access data stored in relational database.
Note
Some of the examples below use a general case, e.g.,
column_name
andtable_name
, while others use specific examples, e.g.,year
andmovies
.
List of commands can be found here.
Databases that use SQL include, MySQL, SQLite, and PostgreSQL
When getting started, or for practice, you can checkout the sqlite3 Python library. Using this library, you can run SQL directly in application code using string literals.
The key data types to remember are INTEGER
, REAL
(float), VARCHAR
(small-medium strings), TEXT
(large strings), and DATE
(string of form YYYY-MM-DD). It is also good to know NULL
.
Basic CRUD Operations
Create
CREATE TABLE
Create a whole new table in the database, you probably shouldn't use this often unless starting a new database.
CREATE TABLE table_name (
column dtype,
...
);
CREATE TABLE table_name (
column dtype,
...
);
INSERT INTO
Add a row to the table by specifying which columns to add and their values.
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);
INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, 'string_2', value_3);
Read
SELECT
Read a column, or set of columns, from the table.
SELECT * FROM table_name; -- Select all columns
SELECT column FROM table_name; --Select one column
SELECT column1, column2 from table_name; --Select many columns
SELECT * FROM table_name; -- Select all columns
SELECT column FROM table_name; --Select one column
SELECT column1, column2 from table_name; --Select many columns
Update
ALTER TABLE
Adds a column to a table.
ALTER TABLE table_name
ADD COLUMN column_name TEXT;
ALTER TABLE table_name
ADD COLUMN column_name TEXT;
UPDATE ... SET
Edit a row, or multiple rows, in a table.
--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;
--e.g. Update a single row
UPDATE table_name
SET column_name = 'new_data'
WHERE row_name = id;
--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
WHEN sex = "m" THEN "f"
WHEN sex = "f" THEN "m"
END;
--e.g. Conditionally swap an ENUM in all roles
UPDATE Salary
SET sex = CASE
WHEN sex = "m" THEN "f"
WHEN sex = "f" THEN "m"
END;
Delete
DELETE FROM
Deletes a row, or multiple rows, from a table.
DELETE FROM table_name
WHERE column_name IS NULL;
DELETE FROM table_name
WHERE column_name IS NULL;
Constraints
Add constraints to columns to enforce specific behaviour about those columns. Some important constraints are PRIMARY KEY
, UNIQUE
, NOT NULL
, DEFAULT
.
Modifiers & Filters
AS
Alias the result of a retrieved column.
SELECT column AS "new_column_name" FROM table_name;
SELECT column AS "new_column_name" FROM table_name;
Attention
When aliasing something with
AS
, always put the alias in double quotes"<alias>"
, otherwise some SQL parsers may throw a runtime error.
DISTINCT
Filters all unique (de-duplicated) column entries.
SELECT DISTINCT column FROM table_name;
SELECT DISTINCT column FROM table_name;
WHERE
Filters output based on the provided condition. WHERE
supports the following operators =, !=, >, <, >=, <=
.
SELECT column1 FROM table_name WHERE column2 > n;
SELECT column1 FROM table_name WHERE column2 > n;
Attention
You cannot use aggregate functions in
WHERE
clauses. UseHAVING
instead, upon grouping.
WHERE ... LIKE
Further filters outputs from a where clause using wildcards such as _
(single character) or %
(zero or more characters).
SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere
SELECT * FROM table_name WHERE column LIKE 'AB_DE'; --Selects rows matching `_` wildcard
SELECT * FROM table_name WHERE column LIKE '%aaa%'; --Selects rows containing `aaa` anywhere
WHERE ... BETWEEN
Further filter outputs from a range of values.
SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;
SELECT * FROM movies WHERE year BETWEEN 1970 AND 1979;
WHERE ... IS NULL
Use this or IS NOT NULL
to query entries that are missing data.
SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns
SELECT * FROM table_name where column IS NOT NULL; --Selects any non-NULL columns
Tip
You can combine modifiers after a
WHERE
statement by using operators likeAND
orOR
. Ensure that you specify the column and the glob pattern for each condition.Example (see this problem)
sqlSELECT * FROM Patients WHERE conditions LIKE 'DIAB1%' OR conditions LIKE '% DIAB1%';
SELECT * FROM Patients WHERE conditions LIKE 'DIAB1%' OR conditions LIKE '% DIAB1%';
Example (see this problem)
sqlSELECT name, population, area FROM World WHERE population >= 25000000 OR area >= 3000000;
SELECT name, population, area FROM World WHERE population >= 25000000 OR area >= 3000000;
OFFSET
Return with an offset from the top index
SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary
SELECT DISTINCT salary FROM Employee LIMIT 1 OFFSET 1 --Selects second highest salary
Sorting & Ordering
ORDER BY
Sorts the result by the column specified.
SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending
SELECT column1 FROM table_name ORDER BY column2 DESC; --Select and order descending
SELECT column1 FROM table_name ORDER BY column2 ASC; --Select and order ascending
GROUP BY
Returns the result in groupings. Typically used with aggregate functions. Should succeed WHERE
(if used) and precede ORDER BY
. You may group by multiple columns, such that GROUP BY X, Y
will put all those with the same value for both X and Y in one group. See
--Returns the total number of downloads per category
SELECT category, SUM(downloads)
FROM fake_apps
GROUP BY category;
--Returns the total number of downloads per category
SELECT category, SUM(downloads)
FROM fake_apps
GROUP BY category;
Attention
Whenever you use a
GROUP BY
clause you MUST also include an aggregate function in theSELECT
statement.
GROUP BY ... HAVING
Similar to where, but filters groups instead of rows. All WHERE
clauses work with HAVING
.
SELECT year,
genre,
COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;
SELECT year,
genre,
COUNT(name)
FROM movies
GROUP BY 1, 2
HAVING COUNT(name) > 10;
Limiting
LIMIT
Only returns the number of entries specified by the limit. LIMIT
should always come last.
SELECT * FROM movies LIMIT 10;
SELECT * FROM movies LIMIT 10;
Logic & Flow Control
IF()
Impose a conditional on a selection.
SELECT
IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;
SELECT
IF(id="5", name, NULL) --If the ID is 5, then it will select the name, else NULL
FROM table_name;
CASE
One way to handle if-then logic in SQL, like a switch case.
SELECT name, --This `,` is important!
CASE
WHEN genre = 'comedy' THEN 'Chill'
WHEN genre = 'romance' THEN 'Chill'
ELSE 'Intense'
END AS 'Mood' --Good practice to rename the output column
FROM movies;
SELECT name, --This `,` is important!
CASE
WHEN genre = 'comedy' THEN 'Chill'
WHEN genre = 'romance' THEN 'Chill'
ELSE 'Intense'
END AS 'Mood' --Good practice to rename the output column
FROM movies;
IFNULL()
Sets the row to another value if a null value is presently encountered.
SELECT
IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;
SELECT
IFNULL(lastname, "Doe") --Sets the lastname to `Doe` if it's currently NULL
FROM table_name;
Aggregates & Calculations
Aggregate functions all use a similar syntax,
SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name; --Counts all entries in table, including nulls
SELECT FUNC(column_name) FROM table_name;
--For example
SELECT COUNT(column_name) FROM table_name; --Counts non-null entries in `column_name`
SELECT COUNT(*) FROM table_name; --Counts all entries in table, including nulls
Counting
You cannot count
NULL
s, they are omitted from counting calculations. This has an impact on how to solve LC 1581 Customer Who Visited but Did Not Make Any Transactions.
Common aggregate functions include
COUNT(col)
SUM(col)
MIN(col) / MAX(col)
AVG(col)
ROUND(col, n_decimals)
CONCAT()
Concatenates two or more strings.
CONCAT(argument1, argument2, ...)
CONCAT(argument1, argument2, ...)
UPPER()
and LOWER()
Converts a string into upper case or lower case lettering.
UPPER(username)
UPPER(username)
SUBSTRING()
Used to slice strings into different sub-strings.
--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`
--e.g. username = `qwerty`
SUBSTRING(username, 1, 3) --Returns from index 1 to index 3 = `qwe`
SUBSTRING(username, 3, LENGTH(username)) --Returns from index 3 to end of string = `erty`
STRING_AGG()
and GROUP_CONCAT()
Used for collecting string entries into a single entry. See this thread on stack overflow. STRING_AGG()
is typical usage, but GROUP_CONCAT()
is used in MySQL. For example, the two below are equivalent.
--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');
--PostgreSQL
SELECT product_id, STRING_AGG(DISTINCT product_names, ',' ORDER BY product_names ASC);
--MySQL
SELECT product_id, GROUP_CONCAT(DISTINCT product ORDER BY product ASC separator ',');
RANK()
and DENSE_RANK()
Used to rank aggregate outputs.
--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;
--In the query below, you may optionally add PARTITION BY to further group your rankings.
SELECT
DENSE_RANK() OVER(ORDER BY score DESC) AS "rank"
FROM Scores;
See LC 185 Department Top Three Salaries.
Date Functions
DATE_ADD()
Used to add a fixed interval to a date. Useful for checking for missing records in combination with LEAD()
and LAG()
DATE_ADD(column, INTERVAL 1 DAY)
DATE_ADD(column, INTERVAL 1 DAY)
Window Functions
See this cheat sheet for a good primer on windowing functions.
LEAD()
and LAG()
Used to reference rows with a known offset to the current row.
SELECT
LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;
SELECT
LEAD(column_name, offset) OVER(ORDER BY id) as "new_column_name"
FROM table_name;
Multiple Tables
When you use an ID from one table to gather additional information from another table, this is called joining. Understanding how to join tables effectively is an important skill to have when working with SQL. Think of joining as SQL's analog to cross referencing two datasets. In order to create a join, you need to have two columns that reference the same underlying entity, such as a User ID.
Types of joins
Inner join: In this join, the rows of different tables will be combined if the join condition is true. However any fields that don't match the condition will be omitted in the returned value. In this way it's making the result set smaller, think of it like an inner product. Left join: Sometimes called an outer join. Here, any unmatched entries from the former table (
FROM table1
) will be kept, and unmatched entries from the latter table (LEFT JOIN table2
) will be padded withNULL
. Cross join: Joins all elements of one table with elements of the other, e.g., if you have 3 shirts in table1 and 2 pants in table2, cross joining these tables would return 6 entries for all the possible combinations.
Tip
You can filter on a
JOIN
by adding anAND
after theON
keyword, for example, see the solution to LC 1158 Market Analysis I.
JOIN ... ON
Example of an inner join
SELECT column_name(s)
FROM table_1
JOIN table_2
ON table_1.column_name = table_2.column_name;
SELECT column_name(s)
FROM table_1
JOIN table_2
ON table_1.column_name = table_2.column_name;
Example of an outer join
SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
ON table_1.column_name = table_2.column_name;
SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
ON table_1.column_name = table_2.column_name;
Example of a cross join
SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;
SELECT months.month, COUNT(*) as 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE newspaper.start_month <= months.month AND newspaper.end_month >= months.month
GROUP BY months.month;
Primary and Foreign Keys
A primary key uniquely identifies each row in a table. When the primary key of one table is found in another table, it is called a foreign key and is used to reference rows in other tables. The most common types of joins will be joining a foreign key from one table with the primary key from another table.
UNION
Concatenates two tables together.
SELECT * FROM table1
UNION
SELECT * FROM table2;
SELECT * FROM table1
UNION
SELECT * FROM table2;
Attention
SQL has strict rules for appending data — the schema's have to match:
- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.
WITH
Bind the output of a query to a variable (or alias) for joining calculations with other tables. Think of WITH
as a being used to declare variables. A WITH
statement is also referred to as a Common Table Expression (CTE).
WITH previous_results AS (
SELECT ...
...
...
...
)
SELECT *
FROM previous_results
JOIN customers
ON _____ = _____;
WITH previous_results AS (
SELECT ...
...
...
...
)
SELECT *
FROM previous_results
JOIN customers
ON _____ = _____;
Do's and don'ts of WITH
and CTEs
WITH
and CTEsIf possible, use joins or sub-queries instead of
WITH
. This will achieve better performance. See LC 185 Department Top Three Salaries for an example of how to replace aWITH
statement with sub-queries.
- Do use with complicated queries to improve readability and maintenance.
- Do use when you need to re-use the query
- Do use for debugging
- Don't use when a standard
JOIN
orLEFT JOIN
can achieve the same task (due to performance impact)- Don't use when simple sub-queries will suffice.
Declaring Variables
Use variables as follows, place them after the BEGIN
keyword in the function.
--First, declare your variable and its type
DECLARE M INT;
--Next, set the variable
SET M = N - 1; --Option 2: Use the `SET` keyword
--First, declare your variable and its type
DECLARE M INT;
--Next, set the variable
SET M = N - 1; --Option 2: Use the `SET` keyword