Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion 02_activities/assignments/DC_Cohort/Assignment1.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,5 +209,6 @@ Consider, for example, concepts of fariness, inequality, social structures, marg


```
Your thoughts...
There are many value systems embedded in databases and data systems people encounter in their day-to-day lives. These systems collect large amounts of information about people and can include topics protected by the Charter of Rights and Freedoms: age, gender and sexuality, race and national and ethnic origin, religion, and mental or physical disability. Looking at typical Canadian ID, like a driver’s license, it provides the date of birth of an individual. This is important to show that the person is old enough to perform age related actions, specifically driving. However, this extends beyond that to other age restricted actions like entering bars or purchasing alcohol. When handled by the government in this way, it is effective as ensuring people are protected from dangerous activities like underage drinking. The spillover of this can be problematic. Even though age is protected, there are people that can use it to discriminate. If you were to rent an apartment, the owner may choose to prevent someone under a certain age from renting. While this is not inherently a problem with the value systems embedded in the database or data systems, how the people using these systems interpret results can also be a significant issue.
Social media has databases and data systems tracking your key information. Much of the initial information comes from information you provide when you sign up, name, age, and anything that can be pulled from a profile picture. As you start interacting, your algorithm shifts with what you like and friends you add. This can be used to lead people to new interests and communities. This can showcase inequalities in the world like police brutality, or it can lead to manosphere creators. More subtle information can be learned based on these connections which can lead you to seeing content catered to certain demographics, whether it be political, racial or religious content. This can shape how people act leading to increased marginalization of certain groups or promotion of ideas that increase inequality.
```
7 changes: 5 additions & 2 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
To overwrite the data, whenever the customer's address updates, the address value is updated as well. This is type 1 SCD.
To retain the data, a column could be added called current_address. When their address is changed, current_address is updated to no, and a new row with the new address is added. This is type 2 SCD.
```

***
Expand Down Expand Up @@ -191,5 +192,7 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
Many of the ethical issues central to the article are that humans are inherently human – meaning that every experience that shapes them leaves a tiny bit of bias behind. This shapes the work, in this case, how they classify objects. Since LLMs are built on the back of these labelled images, any model will take the bias of the people working on it. The article discusses how Dr. Li tried to reduce bias by including known pictures in any dataset. This way, when the Turkers completed their assignments, she could clearly see how well they completed their tasks. She assumed that some not label accurately and choose all images to complete the task faster, which would maximize their profits. She could sort out those based on the known images.
With modern LLMs like ChatGPT and image generation models like Sora, we are now in a world where the effects of bias are becoming significantly more prevalent. Early versions of these models showed heavy bias towards certain groups and ideologies, which was something that developers have tried to fix. Fundamentally, this has been a flawed attempt. For example, with Sora, many images are drawn in an anime aesthetic even when not prompted because of how common that art style is on the internet. Other tags used when prompting, such as like choosing a specific artists style, could lead to having a watermark left in the image similar to the artist’s actual signature. This is because the training data is biased towards the most common style and towards common flares in an individual artist’s catalogue. This shows the human element behind it all and why AI art can be seen as plagiarism as it takes so literally from actual artists’ work.
Even now, much discussion is around the use of AI in video games. Many companies want to move forwards with it for a variety of reasons, but with most of the focus on saving costs. Many gamers are generally against the use of AI, especially as the video game industry has fired many developers to save costs. Many people are somewhere on the spectrum of AI can be used in games, but not many agree where the line should be drawn. Many think that using it to design final assets like in game art is too far as that is stealing from peoples’ livelihoods while using it to generate temporary assets can be acceptable. One game that uses AI in a mostly acceptable way is Arc Raiders. Initially, they hired voice actors to voice the characters in the game. These characters will point out items and points of interest. As the game expands, new items and landmarks are added. Using AI, they allow the characters to point out the new items and points of interest without having to rehire the voice actors. This allows much swifter game production at a cheaper cost. Other games are being generated as AI slop, and game developers are now afraid to talk about their ideas in fear that someone will generate code and make the game first.
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
62 changes: 54 additions & 8 deletions 02_activities/assignments/DC_Cohort/assignment1.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
--SELECT
/* 1. Write a query that returns everything in the customer table. */
--QUERY 1

SELECT *
FROM customer;



Expand All @@ -16,7 +17,10 @@
/* 2. Write a query that displays all of the columns and 10 rows from the customer table,
sorted by customer_last_name, then customer_first_ name. */
--QUERY 2

SELECT *
FROM customer
ORDER BY customer_last_name, customer_first_name
LIMIT 10;



Expand All @@ -27,8 +31,10 @@ sorted by customer_last_name, then customer_first_ name. */
/* 1. Write a query that returns all customer purchases of product IDs 4 and 9.
Limit to 25 rows of output. */
--QUERY 3


SELECT *
FROM customer_purchases
WHERE product_id = 4 OR product_id = 9
LIMIT 25;


--END QUERY
Expand All @@ -42,7 +48,10 @@ filtered by customer IDs between 8 and 10 (inclusive) using either:
Limit to 25 rows of output.
*/
--QUERY 4

SELECT *, quantity*cost_to_customer_per_qty AS 'price'
FROM customer_purchases
WHERE customer_id BETWEEN 8 AND 10
LIMIT 25;



Expand All @@ -55,8 +64,13 @@ Using the product table, write a query that outputs the product_id and product_n
columns and add a column called prod_qty_type_condensed that displays the word “unit”
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */
--QUERY 5
SELECT product_id, product_name

,CASE WHEN product_qty_type = "unit" THEN "unit"
ELSE "bulk"
END AS prod_qty_type_condensed

FROM product;


--END QUERY
Expand All @@ -66,6 +80,17 @@ if the product_qty_type is “unit,” and otherwise displays the word “bulk.
add a column to the previous query called pepper_flag that outputs a 1 if the product_name
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */
--QUERY 6
SELECT product_id, product_name

,CASE WHEN product_qty_type = "unit" THEN "unit"
ELSE "bulk"
END AS prod_qty_type_condensed

,CASE WHEN product_name LIKE '%epper%' THEN 1
ELSE 0
END AS pepper_flag

FROM product;



Expand All @@ -78,9 +103,14 @@ contains the word “pepper” (regardless of capitalization), and otherwise out
vendor_id field they both have in common, and sorts the result by market_date, then vendor_name.
Limit to 24 rows of output. */
--QUERY 7
SELECT *

FROM vendor as v
INNER JOIN vendor_booth_assignments as vba
ON v.vendor_id = vba.vendor_id


ORDER BY market_date, vendor_name
LIMIT 24;

--END QUERY

Expand All @@ -92,9 +122,11 @@ Limit to 24 rows of output. */
/* 1. Write a query that determines how many times each vendor has rented a booth
at the farmer’s market by counting the vendor booth assignments per vendor_id. */
--QUERY 8
SELECT vendor_id,
COUNT(market_date)



FROM vendor_booth_assignments
GROUP BY vendor_id;

--END QUERY

Expand All @@ -106,6 +138,13 @@ of customers for them to give stickers to, sorted by last name, then first name.
HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */
--QUERY 9

SELECT c.customer_id, customer_last_name, customer_first_name
FROM customer_purchases as cp
INNER JOIN customer as c
ON c.customer_id = cp.customer_id
GROUP BY c.customer_id
HAVING SUM(quantity*cost_to_customer_per_qty) > 2000
ORDER BY customer_last_name, customer_first_name;



Expand All @@ -124,7 +163,14 @@ When inserting the new vendor, you need to appropriately align the columns to be
VALUES(col1,col2,col3,col4,col5)
*/
--QUERY 10
DROP TABLE IF EXISTS temp.new_vendor;

CREATE TABLE temp.new_vendor AS
SELECT *
FROM vendor;

INSERT INTO temp.new_vendor
VALUES(10,"Thomass Superfood Store","Fresh Focused","Thomas","Rosenthal");



Expand Down
Loading