SQL Basics
Column 1 Structured Query Language. It is a standard language for accessing and manipulating databases. SQL can be used to insert, update, delete, calculate, or retrieve data from a database. 1. FROM/JOIN: Occur together. 2. WHERE: Filters rows based on conditions. 3. GROUP BY: Groups rows into summary rows based on specified columns. 4. HAVING: Filters grouped data. 5. SELECT 6. WINDOW FUNCTIONS: Does row-wise calculations. 7. DISTINCT: Removes duplicate rows if specified. 8. ORDER BY: Sorts the result set based on specified columns or expressions. 9. LIMIT: Restricts the number of rows returned. 10. OFFSET: Specifies how many rows to skip before starting to return rows. employee_info | id | first_name | last_name | gender | age |
|--- |------------|------------ |------- |----- |
| 1 | Ella | Hartwell | F | 45 |
| 2 | Nathan | Caldwell | M | 21 |
| 3 | Lila | Carmichael | F | 30 | employee_sat | id | satisfaction | performance | salary | dep | startdate |
|--- |----------- |------------ |-------- |--- -| --------- |
| 1 | 0.48 | 44 | low | 1 | 2020/08/11 |
| 2 | 0.33 | 79 | medium | 4 | 2016/02/14 |
| 3 | 0.77 | 83 | high | 3 | 2024/01/15 | Numeric
Character
Date and Time
Other
To temporarily change the data type (without altering the original table): --- Option 1: CAST
SELECT CAST(price AS DECIMAL(10, 2)) AS converted_price # - specifying two decimal points
FROM products;
--- Option 2: Using :: Syntax
SELECT AVG(price::NUMERIC) AS average_price
FROM products;
Convert to whole/discrete numbers -> value::INTEGER Convert to numeric -> value::NUMERIC Convert to decimal -> value::DECIMAL (specifying # of dps) Convert to big continuous -> value::DOUBLE PRECISION (returns 15-17 dps) Convert to text/categorical -> value::TEXT or value::VARCHAR Column 2 --- 1. Get all the columns from a table.
SELECT *
FROM employee_info;
--- 2. Get the first_name and last_name, ordered by id in ascending order.
SELECT first_name, last_name
FROM employee_info
ORDER BY id ASC; --- DESC for descending
--- 3. Get the first 5 rows from the employee_info table.
SELECT *
FROM employee_info
LIMIT 5;
--- 4. Get the first 5 rows from the employee_info table.
SELECT DISTINCT last_name
FROM employee_info; Filtering NUMERIC columns
Can I use column aliases in the WHERE clause? NO - column aliases are defined in the SELECT clause, executed after the WHERE clause. --- 1. WHERE: Get all the employees whose satisfaction level is greater than 0.50.
SELECT *
FROM employee_sat
WHERE satisfaction > 0.50
--- 2. BETWEEN: Get all employees with a satisfaction score between 0.20 and 0.50.
SELECT *
FROM employee_sat
WHERE satisfaction BETWEEN 0.20 AND 0.50;
--- 3. MANY CONDITIONS: Get all employees with satisfaction scores greater than 0.50 and performance scores greater than 80.
SELECT *
FROM employee_sat
WHERE satisfaction > 0.50 AND performance > 80;
Filtering TEXT columns
Note that SQL uses SINGLE QUOTATION MARKS.
--- 1. WHERE: Get all the employees with a low salary.
SELECT *
FROM employee_sat
WHERE salary = 'low';
--- 2. LIKE: Get the employees with the string "low" in their salary (e.g., low, low-medium, medium-low).
SELECT *
FROM employee_sat
WHERE salary LIKE '%low%';
--- 3. NOT LIKE: Get all employees without the string 'low' in their salary.
ELECT *
FROM employee_sat
WHERE salary NOT LIKE '%low%';
--- 4. IN: Get employees who have either low or high salaries.
SELECT *
FROM employee_sat
WHERE salary IN ('low', 'high')
--- 5. CASE-INSENSITIVE: Find all employees with a low salary (Low, low, Low-Medium).
ELECT *
FROM employee_sat
WHERE salary ILIKE '%low%';
Column 3 --- 1. Create Categories: Categorize turnover risk.
SELECT id, satisfaction, performance,
CASE WHEN satisfaction >= 0.8 AND performance >= 90 THEN 'Engaged'
WHEN satisfaction < 0.5 OR performance < 60 THEN 'At Risk'
ELSE 'Neutral' END AS engagement_status
FROM employee_sat;
--- 2. Replacing Values: Create labels for low, medium, high.
SELECT id,
CASE WHEN salary = 'low' THEN 'L'
WHEN salary = 'medium' THEN 'M'
WHEN salary = 'high' THEN 'H'
ELSE 'Unknown' END AS salary_label
FROM employee_sat; Two options that will allow you to replace values WITHOUT changing the original table: Use the CASE WHEN clause or the REPLACE clause. SELECT REPLACE(column_name, 'OldSubstring', 'NewSubstring') AS updated_column
FROM table_name; --- 1. Return the employees without a satisfaction score.
SELECT *
FROM employee_sat
WHERE satisfaction IS NULL;
--- 2. Return only employees who have a satisfaction score.
SELECT *
FROM employee_sat
WHERE satisfaction IS NOT NULL;
--- 3. Check which columns in a table have missing data.
SELECT
COUNT(CASE WHEN id IS NULL THEN 1 END) AS id_nulls,
COUNT(CASE WHEN satisfaction IS NULL THEN 1 END) AS sat_nulls,
COUNT(CASE WHEN performance IS NULL THEN 1 END) AS perf_nulls
FROM employee_sat;
--- 4. Replace NULLs with a specific value.
SELECT id,
CASE
WHEN satisfaction IS NULL THEN 0
ELSE satisfaction
END AS sat_imputed,
CASE
WHEN performance IS NULL THEN 'Unknown'
ELSE performance
END AS perf_imputed
FROM employee_sat;
--- 5. Impute NULLs in the satisfaction col with the col mean.
-- first we create a CTE to hold the mean.
WITH mean_satisfaction AS (
SELECT AVG(satisfaction) AS mean_value
FROM employee_sat
WHERE satisfaction IS NOT NULL
)
-- then we apply this CTE
SELECT id,
COALESCE(satisfaction, mean_value) AS satisfaction_imputed,
performance
FROM employee_sat
CROSS JOIN mean_satisfaction;
--- we could also use a window function:
SELECT id,
COALESCE(satisfaction, AVG(satisfaction) OVER (PARTITION BY 1)) AS satisfaction_imputed,
performance
FROM employee_sat; |