Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 02_activities/assignments/DC_Cohort/Assignment1.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,5 +209,5 @@ Consider, for example, concepts of fariness, inequality, social structures, marg


```
Your thoughts...
My day to day work involves data systems primarily within the field of experimentally derived atmospheric observations. Although these datasets are intended to be entirely unbiased, with respect to fairness, inequality, and social structures, there are still avenues for such concepts to impact the data and the downstream processes they inform. For example, global datasets of atmospheric pollutants are often concentrated in regions of the world with high economic and political influence such as North America and Europe. These also tend to be the regions least impacted by the detrimental health impacts of prolonged air quality hazards, due in part to reductions in pollutants, informed by these observations and air quality warnings, again, informed by observations. Conversely, developing nations, particularly in the global south, are disproportionally impacted by air quality risks, while simultaneously having fewer, and less reliable air quality warnings, due to the limited observational datasets available. Similarly, and closer to home, the Canadian high arctic is a region of the world were the inhabitants (primarily indigenous peoples) are simultaneously impacted by extreme weather events and less reliable weather forecasts. This is almost entirely due to the population extremes in Canada driving economic and technological progress further south, leaving the higher latitude regions to lag behind. Although this is simply a result of the distribution of resources, it nonetheless results in a marginalized group in Canada to be negatively impacted. There are ongoing efforts in the Canadian atmospheric research community to address this issue, but it is ongoing and will likely remain for many years to come, as well as become more significant under future climate change scenarios.
```
5 changes: 3 additions & 2 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
One possible architecture would be to maintain a table of all customers, adding new rows for new customers, and updating their information (including postal codes) after each purchase. A second option would be to also have customers input their postal code after every purchase, but to add a new line to the table for each customer purchase. In this way, the most recent instance of a given customer purchase should be used to access their information, as older purchases may contain incorrect information.
The first option would be classified as Type 1, since there is a single row for each customer, and the information is updated every time they make a purchase. The second option is Type 2 since a new row is added from each purchase so that there are new rows with potentially new information added over time.
```

***
Expand Down Expand Up @@ -191,5 +192,5 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
The concept that modern AI models can trace their origins back to underpaid, perhaps unethical work decades ago is both shocking and unsurprising. Over the course of any technological development the current "state-of-the-art" systems owe their origins to the work of many in the past. This is simply the natural progression of technological development and neural networks, LLM, etc. are no different than any other invention. The concerning aspect here is how the developers go about improving their systems and who is impacted by their decisions. The fact that there are thousands of people out there being paid pennies to click images of dogs and cats so that some of the richest companies in the world can improve their bottom line is horrifying. As the author states, this is just another example of large companies exploiting workers for profit, just like in the fast fashion industry. Moving beyond the immediate impact of the workers in this situation, this system results in additional concerns with respect to the resulting models and output that their own biases impart to the training data. The classic example is Grok, which when trained on unmoderated Twitter posts, produced output that was riddled with racist, sexist, and otherwise horrendous answers. When training data is either not moderated, or is retrieved from a single source, it opens up the potential of the model to contain significant biases. In the case the authors site, human defined tags on images can be open to biases depending on the cultural or historical view points of the classifiers. For instance, individuals from regions will low cultural diversity may classify images of people dissimilar to themselves predominantly based on their appearances, despite perhaps more significant classifications being present. However to them, the most significant feature to an individual may be their race. In the end, LLM impact people at all stages, from training, to output, and the users of these systems must be aware of the impact and potential biases in their use.
```
81 changes: 70 additions & 11 deletions 02_activities/assignments/DC_Cohort/assignment1.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
/* 1. Write a query that returns everything in the customer table. */
--QUERY 1



Select * FROM customer;

--END QUERY

Expand All @@ -17,8 +16,10 @@
sorted by customer_last_name, then customer_first_ name. */
--QUERY 2

SELECT * FROM customer


ORDER BY customer_last_name ASC, customer_first_name ASC
LIMIT 10;

--END QUERY

Expand All @@ -28,9 +29,10 @@ sorted by customer_last_name, then customer_first_ name. */
Limit to 25 rows of output. */
--QUERY 3




SELECT * FROM customer_purchases
WHERE product_id = 4
OR product_id = 9
LIMIT 25;
--END QUERY


Expand All @@ -43,8 +45,9 @@ Limit to 25 rows of output.
*/
--QUERY 4



SELECT *, (quantity*cost_to_customer_per_qty) AS price FROM customer_purchases
WHERE customer_id BETWEEN 8 AND 10
LIMIT 25;

--END QUERY

Expand All @@ -56,8 +59,14 @@ columns and add a column called prod_qty_type_condensed that displays the word
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */
--QUERY 5

SELECT product_id, product_name, --product_qty_type,
CASE
WHEN product_qty_type = 'unit'
THEN 'unit'
ELSE 'bulk'
END AS prod_qty_type_condensed


FROM product;

--END QUERY

Expand All @@ -67,7 +76,19 @@ add a column to the previous query called pepper_flag that outputs a 1 if the pr
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */
--QUERY 6

SELECT product_id, product_name --product_qty_type,
,CASE
WHEN product_qty_type = 'unit'
THEN 'unit'
ELSE 'bulk'
END AS prod_qty_type_condensed

,CASE
WHEN product_name LIKE '%pepper%' THEN 1
ELSE 0
END AS pepper_flag

FROM product;


--END QUERY
Expand All @@ -79,9 +100,17 @@ vendor_id field they both have in common, and sorts the result by market_date, t
Limit to 24 rows of output. */
--QUERY 7

SELECT * FROM vendor v

INNER JOIN vendor_booth_assignments vba
ON v.vendor_id = vba.vendor_id

ORDER BY
vba.market_date ASC,
v.vendor_name ASC

LIMIT 24;

--END QUERY


Expand All @@ -93,8 +122,16 @@ Limit to 24 rows of output. */
at the farmer’s market by counting the vendor booth assignments per vendor_id. */
--QUERY 8



SELECT
vendor_id,
COUNT(*) AS vendors_count
FROM
vendor_booth_assignments
GROUP BY
vendor_id
ORDER BY
vendors_count DESC;


--END QUERY

Expand All @@ -106,8 +143,23 @@ of customers for them to give stickers to, sorted by last name, then first name.
HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */
--QUERY 9

SELECT
customer.customer_id,
customer.customer_first_name,
customer.customer_last_name,
SUM(customer_purchases.quantity * customer_purchases.cost_to_customer_per_qty) AS total_spend

FROM customer_purchases

INNER JOIN
customer
ON customer_purchases.customer_id = customer.customer_id

GROUP BY customer.customer_id
HAVING total_spend > 2000
ORDER BY
customer.customer_last_name,
customer.customer_first_name;

--END QUERY

Expand All @@ -125,7 +177,14 @@ VALUES(col1,col2,col3,col4,col5)
*/
--QUERY 10

-- if a table named temp.new_vendor exists, delete it, otherwise do NOTHING
DROP TABLE IF EXISTS temp.new_vendor;
CREATE TABLE temp.new_vendor AS
SELECT *
FROM vendor;

INSERT INTO temp.new_vendor
VALUES(10,'Thomass Superfood Store', 'Fresh Focused', 'Thomas', 'Rosenthal');


--END QUERY
Expand Down
Loading