Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions 02_activities/assignments/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
The first possible architecture for keeping customer addresses retains records of changes, and is considered Type 2. In such a structure, a unique identifier is associated with the duration of an item before it changes (in this case, a unique identifier is associated with the customer's address while it was in use). In doing so, the store will be able to keep track of the history of changes, and be able to conduct analyses, such as the relationship between customers' purchasing patterns and their residential address.

The other option, which simply overwrites old values with new values, is considered Type 1. In effect, this option will retain only the latest/newest information in the database. While this is a simple and straightforward way to cope with changes in data, this architecture will prevent us from doing any comparative or historical analysis (e.g. analyzing sales based on customers' previous addresses).
```

***
Expand Down Expand Up @@ -182,5 +184,12 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
At the core of it all, humans seem to still be at the core of neural nets and machine learning. Or at least that seemed to be the case for Boykis at the time when this article was written in 2019. It is mighty interesting how much our world has changed even in the last 6 years – including the advent of ChatGPT, and the incorporation of technology into our lives in almost every facet of our lives, and I have several, perhaps somewhat disconnected, thoughts about all of this.

Personally, it seems quite bleak. To me, the human parts of what the author identified in the article have somewhat dissipated by now. Large language models contain a lot of biases: gender, racial, etc. This is a known fact that many linguists have been researching. Since they are trained on existing corpora, which by definition are never truly “complete” or captures how the human population communicates, the use of technologies like ChatGPT and other human-robot interactions seems to be a vicious cycle of such biases being perpetuated indefinitely.

I am a linguist at heart and by training, and I care deeply about what people speak, how people speak and why people speak the way they do. In the article, the researchers relied on linguists’ efforts (e.g. WordNet, identifying frequencies of word usage, etc) to develop a better understanding for machine learning. But in this day and age, I fear that the way humans use language has already changed with the advent of technology and computer-mediated communication. Experimental linguists are already making use of tools like ChatGPT to generate “language stimuli” for their studies; others are now leaving academia in droves to work on these large language models in tech, many of whom making the off-the-cuff remark that they’ve “joined the dark side”. A recent graduate from our Linguistics program worked for Meta soon after they graduated, working particularly on their AI chatbots for their virtual reality influencers (a series of words that perhaps seemed quite strange even in 2019?), and they shared that “it honestly doesn’t feel right”. Considering how young and old minds alike are already glued to their devices in 2025, I can hardly imagine how humans’ language use will evolve in time to come, especially as we are seeing an explosion of human-computer and human-human-via-computer interactions in this world. I think we should all be wary about humankind’s rapid embrace of technology in pursuit of efficiency and automation, because at the end of the day, we are going to need to remember that we are capable of much more than machines can (or should).

[footnote: I may whine about humanity’s downfall in pursuit of technology, but since academia is also very bleak, who knows what might happen after this degree — a career in language tech perhaps?]

```
89 changes: 86 additions & 3 deletions 02_activities/assignments/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ The `||` values concatenate the columns into strings.
Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed.
All the other rows will remain the same.) */

SELECT
product_name || ', ' || COALESCE(product_size, ' ') || ' (' || COALESCE(product_qty_type, 'unit') || ')' as list_of_products
FROM product


--Windowed Functions
Expand All @@ -32,17 +35,38 @@ each new market date for each customer, or select only the unique market dates p
(without purchase details) and number those visits.
HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */

SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_counter
from customer_purchases cp


/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1,
then write another query that uses this one as a subquery (or temp table) and filters the results to
only the customer’s most recent visit. */

/* Query to reverse numbering */

SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_counter
from customer_purchases cp

/* Query to isolate customers' most recent visits */

WITH recent_visits AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_counter
from customer_purchases cp)
SELECT *FROM recent_visits
WHERE visit_counter =1


/* 3. Using a COUNT() window function, include a value along with each row of the
customer_purchases table that indicates how many different times that customer has purchased that product_id. */

SELECT *,
COUNT() OVER(
PARTITION BY customer_id, product_id) AS customer_product_counter
FROM customer_purchases


-- String manipulations
Expand All @@ -57,10 +81,25 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for

Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */

SELECT *,
(CASE
WHEN INSTR(product_name,'-') >0 THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE "null"
END)
AS description
FROM product


/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */

SELECT *,
(CASE
WHEN INSTR(product_name,'-') >0 THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
ELSE "null"
END)
AS description
FROM product
WHERE product_size REGEXP '[0-9]'


-- UNION
Expand All @@ -73,7 +112,18 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
3) Query the second temp table twice, once for the best day, once for the worst day,
with a UNION binding them. */

CREATE TEMPORARY TABLE temp.sales_bydate AS
SELECT market_date,
SUM ((quantity * cost_to_customer_per_qty)) AS total_sales
FROM customer_purchases
GROUP BY market_date

SELECT * FROM temp.sales_bydate
WHERE total_sales = (SELECT MIN(total_sales) FROM temp.sales_bydate)
UNION
SELECT * FROM temp.sales_bydate
WHERE total_sales = (SELECT max(total_sales) FROM temp.sales_bydate)
ORDER BY total_sales ASC


/* SECTION 3 */
Expand All @@ -89,26 +139,38 @@ Think a bit about the row counts: how many distinct vendors, product names are t
How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */


WITH vendor_projections AS (
SELECT DISTINCT vendor_name, product_name, (5*original_price) as earnings_per_product, customer_id FROM vendor_inventory vi
INNER JOIN vendor v
ON v.vendor_id = vi.vendor_id
INNER JOIN product p
ON p.product_id = vi.product_id
CROSS JOIN customer)
SELECT vendor_name, product_name, SUM(earnings_per_product) AS projected_sales FROM vendor_projections
GROUP BY vendor_name, product_name

-- INSERT
/*1. Create a new table "product_units".
This table will contain only products where the `product_qty_type = 'unit'`.
It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.
Name the timestamp column `snapshot_timestamp`. */


CREATE TABLE product_units AS
SELECT *, CURRENT_TIMESTAMP AS snapshot_timestamp FROM product
WHERE product_qty_type = 'unit'

/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */


INSERT INTO product_units
VALUES (10, 'Eggs', '2 dozen', 6, 'unit', CURRENT_TIMESTAMP)

-- DELETE
/* 1. Delete the older record for the whatever product you added.

HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/

DELETE FROM product_units WHERE product_name='Eggs' and snapshot_timestamp < '2025-02-06 16:15:38'


-- UPDATE
Expand All @@ -128,6 +190,27 @@ Finally, make sure you have a WHERE statement to update the right row,
you'll need to use product_units.product_id to refer to the correct row within the product_units table.
When you have all of these components, you can run the update statement. */

ALTER TABLE product_units
ADD current_quantity INT;

-- how to get "last" quantity per product
SELECT quantity AS most_recent_qty FROM vendor_inventory vi
GROUP BY vi.product_id
HAVING max(market_date)

--Temp table with latest sales by product (not sure if this is right?)
CREATE TEMPORARY TABLE temp.recent_sales AS
SELECT product_id, vi.quantity FROM vendor_inventory vi
WHERE vi.market_date = (SELECT MAX(market_date) FROM vendor_inventory vi
GROUP BY vi.product_id)

-- replace null values to 0
COALESCE(rs_quantity, 0)

-- FULLY STRUGGLED ON THIS FOR HOURS -- I've been stuck on where COALESCE comes in. I also tried joining the temp table with product_units but of course it didn't work

UPDATE product_units pu
SET current_quantity = COALESCE((SELECT rs.quantity FROM temp.recent_sales rs
WHERE pu.product_id = rs.product_id LIMIT 1),
0)
WHERE ...
Binary file added 02_activities/assignments/images/bookstore1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added 02_activities/assignments/images/bookstore2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.