SQL Basics cheat sheet


Column 1 Structured Query Language. It is a standard language for accessing and manipulating databases. SQL can be used to insert, update, delete, calculate, or retrieve data from a database. 1. FROM/JOIN: Occur together. 2. WHERE: Filters rows based on conditions. 3. GROUP BY: Groups rows into summary rows based on specified columns. 4. HAVING: Filters grouped data. 5. SELECT 6. WINDOW FUNCTIONS: Does row-wise calculations. 7. DISTINCT: Removes duplicate rows if specified. 8. ORDER BY: Sorts the result set based on specified columns or expressions. 9. LIMIT: Restricts the number of rows returned. 10. OFFSET: Specifies how many rows to skip before starting to return rows. employee_info `\| id \| first_name \| last_name \| gender \| age \| \|--- \|------------\|------------ \|------- \|----- \| \| 1 \| Ella \| Hartwell \| F \| 45 \| \| 2 \| Nathan \| Caldwell \| M \| 21 \| \| 3 \| Lila \| Carmichael \| F \| 30 \|` employee_sat `\| id \| satisfaction \| performance \| salary \| dep \| startdate \| \|--- \|----------- \|------------ \|-------- \|--- -\| --------- \| \| 1 \| 0.48 \| 44 \| low \| 1 \| 2020/08/11 \| \| 2 \| 0.33 \| 79 \| medium \| 4 \| 2016/02/14 \| \| 3 \| 0.77 \| 83 \| high \| 3 \| 2024/01/15 \|` Numeric INT (INTEGER): Whole numbers (e.g., 1, 42, -99). BIGINT: Larger whole numbers when INT isn't enough. DECIMAL (NUMERIC): Fixed-point numbers, often used for currency (e.g., DECIMAL(10,2) for money). FLOAT: Approximate numeric data for scientific calculations. Character VARCHAR(n): Variable-length strings (e.g., VARCHAR(255) for names or descriptions). CHAR(n): Fixed-length strings (less common than VARCHAR but useful for uniform data like codes or identifiers). TEXT: Large variable-length text for comments or long descriptions. Date and Time DATE: Only the date (e.g., 2024-11-15). TIME: Only the time (e.g., 14:30:00). DATETIME: Both date and time (e.g., 2024-11-15 14:30:00). TIMESTAMP: Similar to DATETIME but often used for tracking changes with timezone awareness. Other BOOLEAN: Can store TRUE, FALSE, or NULL (represented as 1 or 0 in some systems). To temporarily change the data type (without altering the original table): `--- Option 1: CAST SELECT CAST(price AS DECIMAL(10, 2)) AS converted_price # - specifying two decimal points FROM products; --- Option 2: Using :: Syntax SELECT AVG(price::NUMERIC) AS average_price FROM products;` Convert to whole/discrete numbers -> value::INTEGER Convert to numeric -> value::NUMERIC Convert to decimal -> value::DECIMAL (specifying # of dps) Convert to big continuous -> value::DOUBLE PRECISION (returns 15-17 dps) Convert to text/categorical -> value::TEXT or value::VARCHAR Column 2 `--- 1. Get all the columns from a table. SELECT * FROM employee_info; --- 2. Get the first_name and last_name, ordered by id in ascending order. SELECT first_name, last_name FROM employee_info ORDER BY id ASC; --- DESC for descending --- 3. Get the first 5 rows from the employee_info table. SELECT * FROM employee_info LIMIT 5; --- 4. Get the first 5 rows from the employee_info table. SELECT DISTINCT last_name FROM employee_info;` Filtering NUMERIC columns = equal <> and != not equal < less than > greater than <= less than or equal to >= greater than or equal to BETWEENbetween two values - BETWEEN(value1, value2) INmatches specific values - IN (1, 9, 10). Can I use column aliases in the WHERE clause? NO - column aliases are defined in the SELECT clause, executed after the WHERE clause. `--- 1. WHERE: Get all the employees whose satisfaction level is greater than 0.50. SELECT * FROM employee_sat WHERE satisfaction > 0.50 --- 2. BETWEEN: Get all employees with a satisfaction score between 0.20 and 0.50. SELECT * FROM employee_sat WHERE satisfaction BETWEEN 0.20 AND 0.50; --- 3. MANY CONDITIONS: Get all employees with satisfaction scores greater than 0.50 and performance scores greater than 80. SELECT * FROM employee_sat WHERE satisfaction > 0.50 AND performance > 80;` Filtering TEXT columns = values that are a specific string LIKE values that contain a specific string NOT LIKE values that don't contain a specific string IN values in a specified set of strings - IN ('abc', 'def') Note that SQL uses SINGLE QUOTATION MARKS. '%low' means anything before the string low 'low%' means anything after the string low --- 1. WHERE: Get all the employees with a low salary. SELECT * FROM employee_sat WHERE salary = 'low'; --- 2. LIKE: Get the employees with the string "low" in their salary (e.g., low, low-medium, medium-low). SELECT * FROM employee_sat WHERE salary LIKE '%low%'; --- 3. NOT LIKE: Get all employees without the string 'low' in their salary. ELECT * FROM employee_sat WHERE salary NOT LIKE '%low%'; --- 4. IN: Get employees who have either low or high salaries. SELECT * FROM employee_sat WHERE salary IN ('low', 'high') --- 5. CASE-INSENSITIVE: Find all employees with a low salary (Low, low, Low-Medium). ELECT * FROM employee_sat WHERE salary ILIKE '%low%'; Column 3 `--- 1. Create Categories: Categorize turnover risk. SELECT id, satisfaction, performance, CASE WHEN satisfaction >= 0.8 AND performance >= 90 THEN 'Engaged' WHEN satisfaction < 0.5 OR performance < 60 THEN 'At Risk' ELSE 'Neutral' END AS engagement_status FROM employee_sat; --- 2. Replacing Values: Create labels for low, medium, high. SELECT id, CASE WHEN salary = 'low' THEN 'L' WHEN salary = 'medium' THEN 'M' WHEN salary = 'high' THEN 'H' ELSE 'Unknown' END AS salary_label FROM employee_sat;` Two options that will allow you to replace values WITHOUT changing the original table: Use the CASE WHEN clause or the REPLACE clause. `SELECT REPLACE(column_name, 'OldSubstring', 'NewSubstring') AS updated_column FROM table_name;` --- 1. Return the employees without a satisfaction score. SELECT * FROM employee_sat WHERE satisfaction IS NULL; --- 2. Return only employees who have a satisfaction score. SELECT * FROM employee_sat WHERE satisfaction IS NOT NULL; --- 3. Check which columns in a table have missing data. SELECT COUNT(CASE WHEN id IS NULL THEN 1 END) AS id_nulls, COUNT(CASE WHEN satisfaction IS NULL THEN 1 END) AS sat_nulls, COUNT(CASE WHEN performance IS NULL THEN 1 END) AS perf_nulls FROM employee_sat; --- 4. Replace NULLs with a specific value. SELECT id, CASE WHEN satisfaction IS NULL THEN 0 ELSE satisfaction END AS sat_imputed, CASE WHEN performance IS NULL THEN 'Unknown' ELSE performance END AS perf_imputed FROM employee_sat; --- 5. Impute NULLs in the satisfaction col with the col mean. -- first we create a CTE to hold the mean. WITH mean_satisfaction AS ( SELECT AVG(satisfaction) AS mean_value FROM employee_sat WHERE satisfaction IS NOT NULL ) -- then we apply this CTE SELECT id, COALESCE(satisfaction, mean_value) AS satisfaction_imputed, performance FROM employee_sat CROSS JOIN mean_satisfaction; --- we could also use a window function: SELECT id, COALESCE(satisfaction, AVG(satisfaction) OVER (PARTITION BY 1)) AS satisfaction_imputed, performance FROM employee_sat;

Column 1

Structured Query Language. It is a standard language for accessing and manipulating databases. SQL can be used to insert, update, delete, calculate, or retrieve data from a database.

1. FROM/JOIN: Occur together.

2. WHERE: Filters rows based on conditions.

3. GROUP BY: Groups rows into summary rows based on specified columns.

4. HAVING: Filters grouped data.

5. SELECT

6. WINDOW FUNCTIONS: Does row-wise calculations.

7. DISTINCT: Removes duplicate rows if specified.

8. ORDER BY: Sorts the result set based on specified columns or expressions.

9. LIMIT: Restricts the number of rows returned.

10. OFFSET: Specifies how many rows to skip before starting to return rows.

employee_info

| id 	| first_name   | last_name  	| gender | age   |
|--- |------------|------------	|------- |----- |
| 1  	| Ella       	          | Hartwell             | F             | 45     |
| 2  	| Nathan          | Caldwell            | M            | 21      |
| 3  	| Lila                  | Carmichael 	| F              | 30    |

employee_sat

| id 	| satisfaction | performance | salary      | dep | startdate |
|--- |----------- |------------  |--------	|--- -| --------- |
| 1  	| 0.48               | 44                     | low           | 1       | 2020/08/11 |
| 2  	| 0.33               | 79                      | medium | 4      | 2016/02/14 |
| 3  	| 0.77               | 83                      | high         | 3      | 2024/01/15 |

Numeric

INT (INTEGER): Whole numbers (e.g., 1, 42, -99).
BIGINT: Larger whole numbers when INT isn't enough.
DECIMAL (NUMERIC): Fixed-point numbers, often used for currency (e.g., DECIMAL(10,2) for money).
FLOAT: Approximate numeric data for scientific calculations.

Character

VARCHAR(n): Variable-length strings (e.g., VARCHAR(255) for names or descriptions).
CHAR(n): Fixed-length strings (less common than VARCHAR but useful for uniform data like codes or identifiers).
TEXT: Large variable-length text for comments or long descriptions.

Date and Time

DATE: Only the date (e.g., 2024-11-15).
TIME: Only the time (e.g., 14:30:00).
DATETIME: Both date and time (e.g., 2024-11-15 14:30:00).
TIMESTAMP: Similar to DATETIME but often used for tracking changes with timezone awareness.

Other

BOOLEAN: Can store TRUE, FALSE, or NULL (represented as 1 or 0 in some systems).

To temporarily change the data type (without altering the original table):

--- Option 1: CAST
SELECT CAST(price AS DECIMAL(10, 2)) AS converted_price # - specifying two decimal points
FROM products;

--- Option 2: Using :: Syntax 
SELECT AVG(price::NUMERIC) AS average_price
FROM products;

Convert to whole/discrete numbers -> value::INTEGER

Convert to numeric -> value::NUMERIC

Convert to decimal -> value::DECIMAL (specifying # of dps)

Convert to big continuous -> value::DOUBLE PRECISION (returns 15-17 dps)

Convert to text/categorical -> value::TEXT or value::VARCHAR

Column 2

--- 1. Get all the columns from a table.
SELECT *
FROM employee_info;

--- 2. Get the first_name and last_name, ordered by id in ascending order.
SELECT first_name, last_name
FROM employee_info
ORDER BY id ASC; --- DESC for descending

--- 3. Get the first 5 rows from the employee_info table.
SELECT *
FROM employee_info
LIMIT 5;

--- 4. Get the first 5 rows from the employee_info table.
SELECT DISTINCT last_name
FROM employee_info;

Filtering NUMERIC columns

= equal
<> and != not equal
< less than
> greater than
<= less than or equal to
>= greater than or equal to
BETWEENbetween two values - BETWEEN(value1, value2)
INmatches specific values - IN (1, 9, 10).

Can I use column aliases in the WHERE clause?

NO - column aliases are defined in the SELECT clause, executed after the WHERE clause.

--- 1. WHERE: Get all the employees whose satisfaction level is greater than 0.50.
SELECT *
FROM employee_sat
WHERE satisfaction > 0.50

--- 2. BETWEEN: Get all employees with a satisfaction score between 0.20 and 0.50.
SELECT *
FROM employee_sat
WHERE satisfaction BETWEEN 0.20 AND 0.50;

--- 3. MANY CONDITIONS: Get all employees with satisfaction scores greater than 0.50 and performance scores greater than 80.
SELECT *
FROM employee_sat
WHERE satisfaction > 0.50 AND performance > 80;

Filtering TEXT columns

= values that are a specific string
LIKE values that contain a specific string
NOT LIKE values that don't contain a specific string
IN values in a specified set of strings - IN ('abc', 'def')

Note that SQL uses SINGLE QUOTATION MARKS.

'%low' means anything before the string low
'low%' means anything after the string low

--- 1. WHERE: Get all the employees with a low salary.
SELECT *
FROM employee_sat
WHERE salary = 'low';

--- 2. LIKE: Get the employees with the string "low" in their salary (e.g., low, low-medium, medium-low).
SELECT *
FROM employee_sat
WHERE salary LIKE '%low%';

--- 3. NOT LIKE: Get all employees without the string 'low' in their salary.
ELECT *
FROM employee_sat
WHERE salary NOT LIKE '%low%';

--- 4. IN: Get employees who have either low or high salaries.
SELECT *
FROM employee_sat
WHERE salary IN ('low', 'high')

--- 5. CASE-INSENSITIVE: Find all employees with a low salary (Low, low, Low-Medium).
ELECT *
FROM employee_sat
WHERE salary ILIKE '%low%';

Column 3

--- 1. Create Categories: Categorize turnover risk. 
SELECT id, satisfaction, performance,
    CASE WHEN satisfaction >= 0.8 AND performance >= 90 THEN 'Engaged'
               WHEN satisfaction < 0.5 OR performance < 60 THEN 'At Risk'
               ELSE 'Neutral'  END AS engagement_status
FROM employee_sat;

--- 2. Replacing Values: Create labels for low, medium, high.
SELECT id,
    CASE WHEN salary = 'low' THEN 'L'
               WHEN salary = 'medium' THEN 'M'
               WHEN salary = 'high' THEN 'H'
        ELSE 'Unknown'   END AS salary_label
FROM employee_sat;

Two options that will allow you to replace values WITHOUT changing the original table: Use the CASE WHEN clause or the REPLACE clause.

SELECT REPLACE(column_name, 'OldSubstring', 'NewSubstring') AS updated_column
FROM table_name;

--- 1. Return the employees without a satisfaction score.
SELECT *
FROM employee_sat
WHERE satisfaction IS NULL;

--- 2. Return only employees who have a satisfaction score.
SELECT *
FROM employee_sat
WHERE satisfaction IS NOT NULL;

--- 3. Check which columns in a table have missing data.
SELECT 
    COUNT(CASE WHEN id IS NULL THEN 1 END) AS id_nulls,
    COUNT(CASE WHEN satisfaction IS NULL THEN 1 END) AS sat_nulls,
    COUNT(CASE WHEN performance IS NULL THEN 1 END) AS perf_nulls
FROM employee_sat;

--- 4.  Replace NULLs with a specific value.
SELECT id,
    CASE 
        WHEN satisfaction IS NULL THEN 0
        ELSE satisfaction
    END AS sat_imputed,
    CASE 
        WHEN performance IS NULL THEN 'Unknown'
        ELSE performance
    END AS perf_imputed
FROM employee_sat;

--- 5. Impute NULLs in the satisfaction col with the col mean.

-- first we create a CTE to hold the mean.
WITH mean_satisfaction AS (
    SELECT AVG(satisfaction) AS mean_value
    FROM employee_sat
    WHERE satisfaction IS NOT NULL
)

-- then we apply this CTE
SELECT id,
    COALESCE(satisfaction, mean_value) AS satisfaction_imputed,
    performance
FROM employee_sat
CROSS JOIN mean_satisfaction;

--- we could also use a window function:

SELECT id,
    COALESCE(satisfaction, AVG(satisfaction) OVER (PARTITION BY 1)) AS satisfaction_imputed,
    performance
FROM employee_sat;