Hanstke · Hanstke · Apr 9, 2026 · Apr 12, 2026 · Apr 13, 2026 · Apr 13, 2026
diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md
@@ -57,6 +57,16 @@ The store wants to keep customer addresses. Propose two architectures for the CU
 
 ```
 Your answer...
+Prompt 3
+
+The first possible architecture for the CUSTOMER_ADDRESS table would be Type 1 Slowly Changing Dimensions, which would perform the overwriting function. With this option, each time a customer's address is
+updated, the prior address would be deleted. The main benefit of this option is that it saves storage space, and is more efficient if prior addresses are not relevant to the business.
+
+The second possible architecture would be Type 2 Slowly Changing Dimensions, which would perform the function of retaining changes. This would create a new record each time the address was updated, using
+metadata to retain these records and keep track of the current address. This option might take up more storage space, but allows for "time-based analytics", "auditability", and "trend analysis". Essentially,
+this option would allow the business to retain more detailed records of its customers, and perform data analysis on those records rather than allowing them to be overwritten.
+
+Information about type 1 and type 2 slowly changing dimensions accessed at https://coalesce.io/data-insights/type-1-vs-type-2-slowly-changing-dimensions/.
 ```
 
 ***
@@ -192,4 +202,9 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c
 
 ```
 Your thoughts...
+
+The first repeated ethical issue that stood out to me in this article is the emphasis on unattributed labour. Be it virtual workers through Amazon's Mechanical Turk system, graduate students, or the romantic partners of the involved researchers, there is a huge amount of human labour involved in creating training data sets that goes unacknowledged and uncredited. It particularly struck me how a select number of researchers have achieved great fame and success for coming up with the *process* of creating these training data sets, yet, we will never know the names of the graduate students who relentlessly searched through newspapers to contribute to WordNet, or the Turks who clicked through untold thousands of images. Even if they were paid (likely insubstantial) amounts of money to do so, it reveals an uncomfortable truth about the idealized process of research, in which highly educated researchers are just as capable of exploiting more vulnerable groups--graduate students, online gig workers--as, taking the example from the article, something like the fast fashion industry is capable of exploiting underpaid labourers to hand-sew an endless stream of $5 garments.
+
+The second repeated ethical issue I noted is the way in which these training data sets are intended to be "objective"--for instance, the computer is able to accurately identify whether a photo contains a hotdog--and yet, there is a twofold ethical dilemma in this presumption of objectivity. Firstly, these training data sets are being employed for some ends that are simply not objective, such as providing descriptions of human faces. Secondly, the fact that these systems are being contributed to by such a huge number of unknown people makes it impossible to know if objectivity was even the primary aim for many of these contributors as they were processing data, or what unconscious biases may have otherwise prevailed in their decision-making (and the need to revamp ImageNet suggests many such biases may have affected the data). Clearly, an ethical approach to engaging with these technologies requires us to rethink any claim to objectivity, and really consider the human reasoning behind supposed artificial intelligence, just as the author of this article now sees the human effort behind every stitch in her clothing.
+
 ```
diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql
@@ -23,7 +23,9 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil
 All the other rows will remain the same. */
 --QUERY 1
 
-
+SELECT 
+product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, "unit" ) || ')'
+FROM product;
 
 
 --END QUERY
@@ -40,8 +42,18 @@ each new market date for each customer, or select only the unique market dates p
 HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). 
 Filter the visits to dates before April 29, 2022. */
 --QUERY 2
+SELECT x.*
 
+FROM
+(
+	SELECT 
+	customer_id,
+	market_date,
+	ROW_NUMBER()OVER(PARTITION BY customer_id ORDER BY market_date ASC) as num_of_visits
 
+	FROM customer_purchases
+	WHERE market_date < '2022-04-29'
+)x;
 
 
 --END QUERY
@@ -53,7 +65,19 @@ only the customer’s most recent visit.
 HINT: Do not use the previous visit dates filter. */
 --QUERY 3
 
+SELECT x.*
 
+FROM 
+ (
+	  SELECT
+	  customer_id,
+	  market_date,
+      ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY market_date DESC) AS most_recent_visit
+
+      FROM customer_purchases
+ )x
+
+ WHERE most_recent_visit <= 1;
 
 
 --END QUERY
@@ -66,8 +90,23 @@ You can make this a running count by including an ORDER BY within the PARTITION
 Filter the visits to dates before April 29, 2022. */
 --QUERY 4
 
-
-
+SELECT x.*
+
+FROM 
+ (
+	  SELECT
+	  product_id,
+	  vendor_id,
+	  market_date,
+	  customer_id,
+	  quantity,
+	  cost_to_customer_per_qty,
+	  transaction_time,
+      COUNT(product_id) OVER(PARTITION BY product_id, customer_id) AS num_of_purchases
+
+      FROM customer_purchases
+ )x
+WHERE market_date < '2022-04-29';
 
 --END QUERY
 
@@ -85,7 +124,16 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
 Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
 --QUERY 5
 
+SELECT
+product_name,
+
+CASE 
+WHEN INSTR(product_name, '-')
+THEN LTRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
+ELSE NULL
+END AS description
 
+FROM product;
 
 
 --END QUERY
@@ -94,7 +142,9 @@ Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR w
 /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
 --QUERY 6
 
-
+SELECT *
+FROM product
+WHERE product_size REGEXP '\d';
 
 
 --END QUERY
@@ -111,8 +161,51 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
 with a UNION binding them. */
 --QUERY 7
 
+DROP TABLE IF EXISTS temp.new_customer_purchases;
+CREATE TEMPORARY TABLE temp.new_customer_purchases AS
 
+SELECT
+market_date
+,quantity
+,cost_to_customer_per_qty
+FROM customer_purchases;
 
+SELECT 
+market_date, 
+SUM(quantity * cost_to_customer_per_qty) OVER(PARTITION BY market_date) AS sales
+FROM new_customer_purchases;
+
+DROP TABLE IF EXISTS temp.new_new_customer_purchases;
+CREATE TEMPORARY TABLE temp.new_new_customer_purchases AS
+
+SELECT
+market_date, 
+SUM(quantity * cost_to_customer_per_qty) OVER(PARTITION BY market_date) AS sales
+FROM new_customer_purchases;
+
+SELECT market_date, sales, best_day
+
+FROM (
+	SELECT DISTINCT
+	market_date,
+	sales,
+	RANK() OVER(ORDER BY sales DESC) as best_day
+	FROM new_new_customer_purchases
+)
+WHERE best_day = 1
+
+UNION
+
+SELECT market_date, sales, worst_day
+
+FROM (
+	SELECT 
+	market_date,
+	sales,
+	RANK() OVER(ORDER BY sales ASC) as worst_day
+	FROM new_new_customer_purchases
+)
+WHERE worst_day = 1;
 
 --END QUERY
 
@@ -132,8 +225,26 @@ How many customers are there (y).
 Before your final group by you should have the product of those two queries (x*y).  */
 --QUERY 8
 
-
-
+SELECT
+    vendor_name,
+    product_name,
+    SUM(5 * vi.original_price) AS total_sales
+FROM (
+    SELECT DISTINCT
+        vendor_id,
+        product_id,
+        original_price
+    FROM vendor_inventory
+) AS vi
+CROSS JOIN (
+    SELECT DISTINCT customer_id
+    FROM customer
+) AS c
+JOIN vendor AS v
+    ON vi.vendor_id = v.vendor_id
+JOIN product AS p
+    ON vi.product_id = p.product_id
+GROUP BY v.vendor_name, p.product_name;
 
 --END QUERY
 
@@ -145,8 +256,15 @@ It should use all of the columns from the product table, as well as a new column
 Name the timestamp column `snapshot_timestamp`. */
 --QUERY 9
 
+DROP TABLE IF EXISTS temp.product_units;
 
+CREATE TEMPORARY TABLE temp.product_units AS
 
+SELECT
+    *,
+    CURRENT_TIMESTAMP AS snapshot_timestamp
+FROM product
+WHERE product_qty_type = 'unit';
 
 --END QUERY
 
@@ -155,8 +273,11 @@ Name the timestamp column `snapshot_timestamp`. */
 This can be any product you desire (e.g. add another record for Apple Pie). */
 --QUERY 10
 
+INSERT INTO temp.product_units
+VALUES ('24', 'Vegan Apple Hot Honey', 'small', '2', 'unit', CURRENT_TIMESTAMP);
 
-
+SELECT *
+FROM temp.product_units;
 
 --END QUERY
 
@@ -167,7 +288,7 @@ This can be any product you desire (e.g. add another record for Apple Pie). */
 HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
 --QUERY 11
 
-
+DELETE FROM product_units WHERE product_name='Vegan Apple Hot Honey';
 
 
 --END QUERY
@@ -191,8 +312,18 @@ Finally, make sure you have a WHERE statement to update the right row,
 When you have all of these components, you can run the update statement. */
 --QUERY 12
 
+ALTER TABLE temp.product_units
+ADD current_quantity INT;
 
-
+UPDATE temp.product_units
+SET current_quantity = (
+    SELECT
+	COALESCE(quantity, 0)
+    FROM vendor_inventory
+    WHERE product_id = product_units.product_id
+    ORDER BY market_date DESC
+    LIMIT 1
+);
 
 --END QUERY
 

diff --git a/02_activities/assignments/DC_Cohort/bookstore_logical_model.png b/02_activities/assignments/DC_Cohort/bookstore_logical_model.png