dhamijap · dhamijap · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md
@@ -47,8 +47,11 @@ There are several tools online you can use, I'd recommend [Draw.io](https://www.
 
 **HINT:** You do not need to create any data for this prompt. This is a conceptual model only. 
 
+![ERD](./images/assignment_2-1.drawio.png)
+
 #### Prompt 2
 We want to create employee shifts, splitting up the day into morning and evening. Add this to the ERD.
+![ERD](./images/assignment_2-2.drawio.png)
 
 #### Prompt 3
 The store wants to keep customer addresses. Propose two architectures for the CUSTOMER_ADDRESS table, one that will retain changes, and another that will overwrite. Which is type 1, which is type 2? 
@@ -57,6 +60,11 @@ The store wants to keep customer addresses. Propose two architectures for the CU
 
 ```
 Your answer...
+
+The first architecture will retain changes to the customer_addresses when customer's change their address. This is known as type 2. It would include: customer_address_id (unique id for each customer and address pairing), customer_id(identification for each customer), date_id (reference to the date to help find recent vs old addresses), and address (the actual address). This would have each customer's address as a separate line and the customer's would be duplicated in the table. The date and customer_address_id would help to organize what the most recent location is. 
+
+The second architecture would update the customer's address information each time. This is known as type 1. It would include: customer_id (unique identifier - one to one for customer_id table), date_id (date of update), address (current known address). This would mean that each customer gets a single entry in the table. It could also be added to the customer table with a new column of "address" since each customer is unique there. Every time the address is updated, it would be updated in the costumer table. 
+
 ```
 
 ***
@@ -192,4 +200,20 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c
 
 ```
 Your thoughts...
+
+What stood out to me is that there was a requirement for people to be labelling the images/words and the researchers always seemed to try to optimize for the most efficient (i.e. high return for low cost) option. This is a problem in that it can potentially lead to some form of abuse since one of the parties in the level will be trying to extract more value compared to the cost and that gets pushed down to the bottom level (i.e. the government tries to extract value from the instituion, the instituion tries to extract value from the head scientist, the head scientist tries to extract value from the graduate student and the scientists (as a group) try to extract value from the data labellers).
+
+If we consider the value of AI now and how much value has been extracted, the workers at the bottom doing the labelling get none of the return on their work. The scientists do get the professional recognition, but the workers do not get anything than their initial payment.  
+
+I think that this somewhat reflects the capitalistic culture and system that the scientists were working within. The scientists are likely required to do as much research as possible with limited funds, and this obviously affects their ability to perform the task. This is pervasive as the workers are also operating in the system and would like to get the highest return for lowest time/effort.
+
+This same attitude also applies for how companies now view LLMs. LLMs are viewed as being able to produce a viable output with the minimal cost and time requirement. However from my understanding, the true costs of these LLMs are not realized for the companies since the LLMs companies often offer a 'free' version and do not necessarily charge the full cost to the companies for their use. This is a buisness strategy of giving a 'sample' or introductory discount. Many of the LLM companies are not yet profitable and continually require more funding from investors to get to the point where customers have replaced their workers. Once the lowest level workers are replaced by LLMs, the companies might see the true cost of LLM use as they will have fewer options for workers.  
+
+Aside from the ethical issues of not getting value from their work, I would imagine that the data collected would be reflecting the minimal correct answer. Since the workers who were able to answer the most questions would get the most payment, then they would likely get more questions over time, and therefore, might have more influence over the data and how it is labelled. That might result in less robust labelling where certain words or images were not labelled with the most wide/descriptive labels, but instead the most common label. Ethically, this might also result in a narrowing of labels to only reflect the most common label. 
+
+By using these workers, it also runs into the issue of representive sampling, which is common in psychology studies. This occurs when the sample that is being used is not representive of the overall population. If we only used Princeton students from 1960s as our initial label, then their inherent biases would propogate through the future datasets that used it. This data can be updated over time, but that is something that researchers would have to actively think about and the inital LLMs did have biases that took time to remove. 
+
+
+
+
 ```
diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql
@@ -23,7 +23,12 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil
 All the other rows will remain the same. */
 --QUERY 1
 
-
+SELECT 
+	product_name || ', ' || 		-- keeps the product name the same
+	COALESCE(product_size, '')|| 	-- replaces NULL in product_size with a blank ''
+	' (' || COALESCE(product_qty_type, 'unit') || ')' -- replaces NULL in product_qty_type with 'unit'
+	as Product_List  				-- changes name of list to Product_list
+FROM product;
 
 
 --END QUERY
@@ -42,7 +47,14 @@ Filter the visits to dates before April 29, 2022. */
 --QUERY 2
 
 
-
+SELECT * FROM (
+	SELECT 
+		*, 
+		DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date ASC)
+		AS customer_visits
+	FROM customer_purchases 
+	WHERE market_date < '2022-04-29' 
+	) AS x;
 
 --END QUERY
 
@@ -53,8 +65,14 @@ only the customer’s most recent visit.
 HINT: Do not use the previous visit dates filter. */
 --QUERY 3
 
-
-
+SELECT * FROM (
+	SELECT
+		*, 
+		DENSE_RANK() OVER(PARTITION BY customer_id ORDER BY market_date DESC, transaction_time DESC)
+		AS customer_visits
+	FROM customer_purchases 
+	) AS x
+WHERE x.customer_visits = 1;
 
 --END QUERY
 
@@ -66,7 +84,21 @@ You can make this a running count by including an ORDER BY within the PARTITION
 Filter the visits to dates before April 29, 2022. */
 --QUERY 4
 
-
+SELECT
+	*
+	,COUNT(product_id) OVER( 
+	PARTITION BY customer_id, product_id) 
+	AS times_product_in_order 
+	-- this shows the number of orders that the product was included in an order, but not the actual amount that was purchased. 
+	-- It does show different prices of the product id
+	,SUM(quantity) OVER( 
+	PARTITION BY customer_id, product_id) 
+	AS total_quantity_of_product_purchased 
+	-- this shows the actual amount purchased for each customer 
+	-- it does not show if the product was different price each time, just the total number that the customer purchased
+FROM customer_purchases
+WHERE market_date < '2022-04-29'
+ORDER BY total_quantity_of_product_purchased DESC; -- shows us the customer who bought the most; 
 
 
 --END QUERY
@@ -85,7 +117,13 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
 Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
 --QUERY 5
 
-
+SELECT product_name,
+	CASE
+		WHEN INSTR(product_name, '-')>0 -- only performs the substr on words with - in it
+			THEN RTRIM(LTRIM(SUBSTR(product_name,instr(product_name, '-')+1))) -- extracts text beyond first - and trims blank spaces
+		ELSE NULL -- fills in rest as NULL
+	END AS description -- names the column description 
+FROM product;
 
 
 --END QUERY
@@ -94,7 +132,15 @@ Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR w
 /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
 --QUERY 6
 
+SELECT product_name,
+	CASE
+		WHEN INSTR(product_name, '-')>0 -- only performs the substr on words with - in it
+			THEN TRIM(SUBSTR(product_name,instr(product_name, '-')+1)) -- extracts text beyond first - and trims blank spaces
+		ELSE NULL -- fills in rest as NULL
+	END AS description -- names the column description 
+FROM product
 
+WHERE product_size REGEXP '\d';
 
 
 --END QUERY
@@ -111,12 +157,56 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
 with a UNION binding them. */
 --QUERY 7
 
+--------- 1. Create first temp table with total sales and each day
+--- remove table if it exists to start fresh
+DROP TABLE IF EXISTS temp.new_customer_purchases;
 
+--- create table
+CREATE TABLE temp.new_customer_purchases AS
 
+--- defintion of table
+SELECT *,
+	SUM(quantity * cost_to_customer_per_qty) AS total_sales
+FROM customer_purchases
+GROUP BY market_date; 
 
---END QUERY
+--------- 2. Create QUERY the table for top and bottom 
+
+DROP TABLE IF EXISTS temp.top_table;
+
+--- create table
+CREATE TABLE temp.top_table AS
+
+--- query table for top
+SELECT
+	market_date,
+	total_sales,
+	'best' AS day
+FROM temp.new_customer_purchases
+ORDER BY total_sales DESC
+LIMIT 1;
 
 
+--- query table for bottom
+DROP TABLE IF EXISTS temp.bottom_table;
+
+--- create table
+CREATE TABLE temp.bottom_table AS
+SELECT
+	market_date,
+	total_sales,
+	'worst' AS day
+FROM temp.new_customer_purchases
+ORDER BY total_sales ASC
+LIMIT 1;	
+
+
+--- combine them together 
+SELECT *
+FROM top_table 
+UNION -- combines them 
+SELECT * 
+FROM bottom_table;
 
 /* SECTION 3 */
 
@@ -132,21 +222,55 @@ How many customers are there (y).
 Before your final group by you should have the product of those two queries (x*y).  */
 --QUERY 8
 
-
+--- find total number of customers 
+DROP TABLE IF EXISTS temp.num_of_customers;
+
+CREATE TABLE temp.num_of_customers AS
+	SELECT
+	count(customer_id) AS total_customer_count
+	FROM customer; 
+-- should return 26 customers 
+
+--- find total number of unique products per vendor 
+DROP TABLE IF EXISTS temp.unique_products_per_vendor;
+
+CREATE TABLE temp.unique_products_per_vendor AS
+SELECT DISTINCT
+    v.vendor_name,
+    p.product_name,
+    vi.original_price,
+	5 AS quantity_per_customer
+FROM vendor_inventory AS vi
+JOIN vendor AS v 
+    ON vi.vendor_id = v.vendor_id
+JOIN product AS p 
+    ON vi.product_id = p.product_id
+ORDER BY v.vendor_name, p.product_name; -- should return 3 vendors, with 8 items (3, 1, 4 distinct products each) 
+
+--- now cross join the customers 
+SELECT 
+	up.*,
+	(up.original_price * up.quantity_per_customer * nc.total_customer_count) AS possible_revenue
+FROM temp.unique_products_per_vendor AS up
+CROSS JOIN temp.num_of_customers AS nc;
 
 
 --END QUERY
 
-
 -- INSERT
 /*1.  Create a new table "product_units". 
 This table will contain only products where the `product_qty_type = 'unit'`. 
 It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`.  
 Name the timestamp column `snapshot_timestamp`. */
 --QUERY 9
 
+DROP TABLE IF EXISTS product_units; -- drops table in case there is an error and I need to start again
 
-
+CREATE TABLE product_units AS 	-- creates a table called product_units
+	SELECT*, 					-- selects all of the columns and adds the new COLUMN	
+	CURRENT_TIMESTAMP AS snapshot_timestamp
+FROM product 					-- uses product table as the base
+WHERE product_qty_type == 'unit'; -- filters it to only product qty type of unit 
 
 --END QUERY
 
@@ -155,7 +279,8 @@ Name the timestamp column `snapshot_timestamp`. */
 This can be any product you desire (e.g. add another record for Apple Pie). */
 --QUERY 10
 
-
+INSERT INTO product_units 						-- inserts into the product unit TABLE
+VALUES(25,'Strawberry Rhubarb Pie','10"',3,'unit',CURRENT_TIMESTAMP); -- manually inputted values
 
 
 --END QUERY
@@ -167,8 +292,11 @@ This can be any product you desire (e.g. add another record for Apple Pie). */
 HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
 --QUERY 11
 
+-- SELECT * FROM product_units -- testing so that I can see what is going to be selected for deletion
+DELETE FROM product_units
+WHERE product_id = 25;
 
-
+SELECT * FROM product_units; -- testing to see that it is gone 
 
 --END QUERY
 
@@ -191,8 +319,19 @@ Finally, make sure you have a WHERE statement to update the right row,
 When you have all of these components, you can run the update statement. */
 --QUERY 12
 
+SELECT * FROM product_units; -- testing to see updates 
+
+ALTER TABLE product_units
+ADD current_quantity INT;
 
 
+UPDATE product_units
+SET current_quantity = ( -- last quantity  (most recent) 
+	SELECT coalesce(vi.quantity,0)
+	FROM vendor_inventory AS vi
+	WHERE vi.product_id = product_units.product_id
+	ORDER BY vi.market_date DESC
+	LIMIT 1); 
 
 --END QUERY
 

diff --git a/02_activities/assignments/DC_Cohort/images/assignment_2-1.drawio.png b/02_activities/assignments/DC_Cohort/images/assignment_2-1.drawio.png
diff --git a/02_activities/assignments/DC_Cohort/images/assignment_2-2.drawio.png b/02_activities/assignments/DC_Cohort/images/assignment_2-2.drawio.png
diff --git a/05_src/sql/farmersmarket.db b/05_src/sql/farmersmarket.db