Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion 02_activities/assignments/Cohort_8/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,15 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
Architecture 1: Overwriting (Type 1)
Keeping the address columns directly in the CUSTOMER table. When a customer moves, to simply run an UPDATE command to replace the old address with the new one.
This is the simplest to maintain. However, comes with the cost of losing all history.

Architecture 2: Keeping the History (Type 2)
Creating a separate CUSTOMER_ADDRESS table. In addition to the address data, you adding additional date columns to flag current status.
When a customer moves: we can find find the old row and the new flagging variable will change to not current. and a new row with the new address with the flagging variable value as current.
This architecture requires more storage and complex queries. However, it preserves all the address history.

```

***
Expand Down
Binary file added 02_activities/assignments/Cohort_8/Prompt1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added 02_activities/assignments/Cohort_8/Prompt2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
157 changes: 147 additions & 10 deletions 02_activities/assignments/Cohort_8/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
/* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables!
We tell them, no problem! We can produce a list with all of the appropriate details.

Using the following syntax you create our super cool and not at all needy manager a list:
Using the following syntax you create our super cool and not at all needy manager a list:*/

SELECT
product_name || ', ' || product_size|| ' (' || product_qty_type || ')'
product_name || ', ' || COALESCE(product_size, '')|| ' (' || COALESCE(product_qty_type, '') || ')' AS product_details
FROM product


/*
But wait! The product table has some bad data (a few NULL values).
Find the NULLs and then using COALESCE, replace the NULL with a blank for the first column with
nulls, and 'unit' for the second column with nulls.
Expand All @@ -34,18 +34,44 @@ each new market date for each customer, or select only the unique market dates p
(without purchase details) and number those visits.
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */

SELECT
customer_id, market_date, product_id, quantity,
DENSE_RANK() OVER
(
PARTITION BY customer_id
ORDER BY market_date
) AS visit_number
FROM customer_purchases;



/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
then write another query that uses this one as a subquery (or temp table) and filters the results to
only the customer’s most recent visit. */


SELECT *
FROM (
SELECT customer_id, market_date, product_id, quantity,
DENSE_RANK() OVER
(
PARTITION BY customer_id
ORDER BY market_date DESC
) AS visit_number
FROM customer_purchases) AS ranked_visits
WHERE visit_number = 1;



/* 3. Using a COUNT() window function, include a value along with each row of the
customer_purchases table that indicates how many different times that customer has purchased that product_id. */


SELECT customer_id, product_id, market_date, quantity,
COUNT(product_id) OVER
(
PARTITION BY customer_id, product_id
ORDER BY market_date, transaction_time
) AS product_purchase_count
FROM customer_purchases;

-- String manipulations
/* 1. Some product names in the product table have descriptions like "Jar" or "Organic".
Expand All @@ -59,10 +85,16 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for

Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */


SELECT product_name,
TRIM(SUBSTR(product_name, NULLIF(INSTR(product_name, '-'), 0)+1)) AS description
FROM product;

/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */

SELECT product_id, product_name, product_size,
TRIM(SUBSTR(product_name, NULLIF(INSTR(product_name, '-'), 0)+1)) AS description
FROM product
WHERE product_size REGEXP '[0,9]';


-- UNION
Expand All @@ -75,7 +107,34 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
3) Query the second temp table twice, once for the best day, once for the worst day,
with a UNION binding them. */


-- Calculating total sales for each market date
WITH daily_sales AS (
SELECT market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date
),
-- Ranking the days to find the highest sales and lowest sales
ranked_sales AS (
SELECT market_date, total_sales,
RANK() OVER (ORDER BY total_sales DESC) AS rank_best,
RANK() OVER (ORDER BY total_sales ASC) AS rank_worst
FROM daily_sales
)
-- Combining the best and worst days using UNION
SELECT market_date, total_sales,
'Highest Sales Day' AS sales_type
FROM ranked_sales
WHERE rank_best = 1

UNION

SELECT
market_date,
total_sales,
'Lowest Sales Day' AS sales_type
FROM ranked_sales
WHERE rank_worst = 1;


/* SECTION 3 */
Expand All @@ -91,6 +150,45 @@ Think a bit about the row counts: how many distinct vendors, product names are t
How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */

WITH vendor_products AS (
-- To getting a unique list of products sold by each vendor along with their price
SELECT DISTINCT
v.vendor_name,
p.product_name,
vi.original_price
FROM vendor_inventory AS vi
JOIN vendor AS v ON vi.vendor_id = v.vendor_id
JOIN product AS p ON vi.product_id = p.product_id
),

all_customers AS (
-- To get a list of all unique customers
SELECT DISTINCT customer_id
FROM customer
),

hypothetical_sales AS (
-- Cross join every vendor product with every customer
SELECT
vp.vendor_name,
vp.product_name,
vp.original_price,
5 AS quantity_sold -- Assumption: 5 units sold per customer
FROM
vendor_products AS vp
CROSS JOIN
all_customers AS c
)

-- Finally aggregation: Calculate total revenue per product per vendor
SELECT
vendor_name,
product_name,
SUM(quantity_sold * original_price) AS total_hypothetical_revenue
FROM hypothetical_sales
GROUP BY
vendor_name, product_name
ORDER BY total_hypothetical_revenue DESC;


-- INSERT
Expand All @@ -99,19 +197,47 @@ This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */

CREATE TABLE product_units AS
SELECT *,
CURRENT_TIMESTAMP AS snapshot_timestamp
FROM product
WHERE product_qty_type = 'unit';


/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */

INSERT INTO product_units (
product_id,
product_name,
product_size,
product_category_id,
product_qty_type,
snapshot_timestamp
)
VALUES (
999,
'Apple Pie',
'10"',
3,
'unit',
CURRENT_TIMESTAMP
);


-- DELETE
/* 1. Delete the older record for the whatever product you added.

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/


DELETE FROM product_units
WHERE
product_name = 'Apple Pie'
AND snapshot_timestamp < (
SELECT MAX(snapshot_timestamp)
FROM product_units
WHERE product_name = 'Apple Pie'
);

-- UPDATE
/* 1.We want to add the current_quantity to the product_units table.
Expand All @@ -130,6 +256,17 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */



ALTER TABLE product_units
ADD current_quantity INT;
UPDATE product_units
SET current_quantity = COALESCE(
(
SELECT quantity
FROM vendor_inventory vi
WHERE vi.product_id = product_units.product_id
ORDER BY market_date DESC
LIMIT 1
),
0
);

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/* MODULE 4 */
/* UNION */

/* 1. Emulate a FULL OUTER JOIN with a UNION */
DROP TABLE IF EXISTS temp.store1;
CREATE TEMP TABLE IF NOT EXISTS temp.store1
(
costume TEXT,
quantity INT
);

INSERT INTO temp.store1
VALUES("tiger",6),
("elephant",2),
("princess", 4);


DROP TABLE IF EXISTS temp.store2;
CREATE TEMP TABLE IF NOT EXISTS temp.store2
(
costume TEXT,
quantity INT
);

INSERT INTO temp.store2
VALUES("tiger",2),
("dancer",7),
("superhero", 5);

SELECT s1.costume, s1.quantity as store1_quantity, s2.quantity as store2_quantity, 'top query' as location
FROM store1 s1
LEFT JOIN store2 s2
ON s1.costume = s2.costume

UNION ALL -- allow for duplicates, because a FULL OUTER join would ALSO allow for duplicates

SELECT s2.costume, s1.quantity , s2.quantity, 'bottom query'
FROM store2 s2
LEFT JOIN store1 s1
ON s1.costume = s2.costume
WHERE s1.costume IS NULL




43 changes: 43 additions & 0 deletions 04_this_cohort/live_code/Cohort_8/module_4/INTERSECT_EXCEPT.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
/* MODULE 4 */
/* INTERSECT & EXCEPT */

/* 1. Find products that have been sold (e.g. are in customer purchases AND product) */
SELECT product_id
FROM product
INTERSECT
SELECT product_id
FROM customer_purchases;


/* 2. Find products that have NOT been sold (e.g. are NOT in customer purchases even though in product) */
SELECT product_name, x.product_id

FROM (
SELECT product_id
FROM product
EXCEPT
SELECT product_id
FROM customer_purchases
) x
INNER JOIN product p on x.product_id = p.product_id;

/* 3. Directions matter... if we switch the order here:
products that do not exist, because no products purchased are NOT in the product table (e.g. are NOT in product even though in customer purchases)*/

SELECT product_id
FROM customer_purchases
EXCEPT
SELECT product_id
FROM product;


/* 4. We can remake the intersect with a WHERE subquery for more details ... */

SELECT * FROM product
WHERE product_id IN (
SELECT product_id
FROM product
INTERSECT
SELECT product_id
FROM customer_purchases
)
Loading