Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions 02_activities/assignments/Assignment1.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,10 @@ Please do not pick the exact same tables that I have already diagrammed. For exa
- <img src="./images/01_farmers_market_conceptual_model.png" width="600">
- The column names can be found in a few spots (DB Schema window in the bottom right, the Database Structure tab in the main window by expanding each table entry, at the top of the Browse Data tab in the main window)

#### Answer:

<img src = "./Assignment_1-1_Logical_Model.drawio.png">

***

## Section 2:
Expand Down Expand Up @@ -204,6 +208,7 @@ Link if you encounter a paywall: https://web.archive.org/web/20240422105834/http
Consider, for example, concepts of fariness, inequality, social structures, marginalization, intersection of technology and society, etc.


```
Your thoughts...
```
**ANSWER:**

In the article “When Databases Get to Define Family”, Rida Qadri describes how a database developer’s preconceptions can introduce bias into database design which ultimately leads to the exclusion of data points that do not align with the developer’s views. When these databases are linked with organizations that provide services to the human population, this inherent bias introduces administrative barriers that may prevent individuals—particularly those in minority and marginalized groups—from accessing the resources they need just because they do not adhere to what society considers the “norm”. Yet, over time, society adjusts their definition of the “norm” which must then be reflected in updates to various databases. For example, with the general acceptance of gender as being a spectrum rather than a binary trait, the Canadian government has introduced the “X” gender on passports for individuals that identify as neither “male” nor “female”. However, just because one organization adjusts their viewpoint does not mean that other organizations will follow suit. With the “X” gender identifier, the Canadian government also includes the caveat that this identifier may not be recognized by other countries or even travel companies. In other words, one database has been updated, but most databases have not and individuals within this gender group will still struggle to access resources because they do not adhere with the global “norm”. Additionally, even as organizations update their concept of the “norm” (or claim to), significant biases often remain in databases that have been grandfathered in or are simply not well identified. One important bias that is becoming increasingly dire with the national rise in homelessness is the requirement of a permanent address to obtain government identification. Here, the underlying bias is that “all Canadian citizens have a place to live” which seems laughably naïve following a short stroll through Toronto. Because homeless individuals do not have an address that they can provide for the government database, they cannot successfully apply for government-issued ID or have it mailed to them. Without this ID, these individuals cannot access the majority of government-funded resources including healthcare, education and job opportunities. Thus, because homeless individuals do not adhere to what the Canadian government views as a “normal housing situation”, their datapoints are excluded and their need for resources goes unfulfilled. There are workarounds that allow homeless individuals to obtain a government-issued ID, but these are complicated and require assistance from community organizations further highlighting a significant lack of flexibility in the government database. Interestingly, not all public databases crumble when faced with a lack of permanent address. As an avid reader, the first thing I did when I moved to Toronto was sign up for a library card at the Toronto Public Library (TPL). As part of the sign-up process, I noticed that the TPL has a special procedure for people without a permanent address that allows them to obtain a library card and access some if not all the library’s resources. Here, the library database has been designed with a sufficient amount of flexibility to account for the different housing situations seen in the city. Though scale and usage make the Canadian government and TPL databases difficult to compare, it does pose the question of whether there is a better way to incorporate homeless individuals into the government database in hopes of moving towards a more equitable quality of life for all Canadian citizens. As society continues to move online and humans are increasingly digitized into innumerable linked databases, the issues associated with database bias will likely compound with the repercussions of these biases being disproportionately associated with minority and marginalized groups that don’t fit the national or global “norm”. Even as viewpoints change to become more inclusive, adjustments to databases may be specific and piecemeal ultimately leading to incompatibilities between different databases and the continued loss of human datapoints to a sea of “NA”s. In order to better address the issues surrounding database bias, it would be necessary to carefully review and (re)design existing and new databases to incorporate greater (if not infinite) flexibility in data input to forecast the inestimable changes to the “norm” that society will see in the future.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
100 changes: 92 additions & 8 deletions 02_activities/assignments/assignment1.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,30 @@
--SELECT
/* 1. Write a query that returns everything in the customer table. */


SELECT *
FROM customer;

/* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table,
sorted by customer_last_name, then customer_first_ name. */


SELECT *
FROM customer
ORDER BY customer_last_name, customer_first_name
LIMIT 10;

--WHERE
/* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */
-- option 1

SELECT *
FROM customer_purchases
WHERE product_id = '4' OR product_id = '9';

-- option 2


SELECT *
FROM customer_purchases
WHERE product_id IN ('4', '9');

/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
filtered by vendor IDs between 8 and 10 (inclusive) using either:
Expand All @@ -28,38 +37,71 @@ filtered by vendor IDs between 8 and 10 (inclusive) using either:
*/
-- option 1

SELECT *
,quantity*cost_to_customer_per_qty as price
FROM customer_purchases
WHERE vendor_id >= 8 AND vendor_id <= 10;

-- option 2


SELECT *
,quantity*cost_to_customer_per_qty as price
FROM customer_purchases
WHERE vendor_id BETWEEN 8 AND 10;

--CASE
/* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz.
Using the product table, write a query that outputs the product_id and product_name
columns and add a column called prod_qty_type_condensed that displays the word “unit”
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */


SELECT product_id, product_name
,CASE WHEN product_qty_type = "unit"
THEN "unit"
ELSE "bulk"
END as product_qty_type_condensed
FROM product;

/* 2. We want to flag all of the different types of pepper products that are sold at the market.
add a column to the previous query called pepper_flag that outputs a 1 if the product_name
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */


SELECT product_id, product_name
,CASE WHEN product_qty_type = "unit"
THEN "unit"
ELSE "bulk"
END as product_qty_type_condensed

,CASE WHEN product_name LIKE '%pepper%'
THEN 1
ELSE 0
END as pepper_flag

FROM product;

--JOIN
/* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the
vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */



SELECT *
FROM vendor
INNER JOIN vendor_booth_assignments
ON vendor.vendor_id = vendor_booth_assignments.vendor_id
ORDER BY vendor_name, market_date;

/* SECTION 3 */

-- AGGREGATE
/* 1. Write a query that determines how many times each vendor has rented a booth
at the farmer’s market by counting the vendor booth assignments per vendor_id. */

SELECT

vendor_id
,count(booth_number) as num_booth_assign

FROM vendor_booth_assignments
GROUP BY vendor_id;


/* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper
Expand All @@ -68,6 +110,19 @@ of customers for them to give stickers to, sorted by last name, then first name.

HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */

SELECT
customer_last_name
, customer_first_name
--, SUM(ROUND(quantity*cost_to_customer_per_qty, 2)) as total_spent_across_markets

FROM customer_purchases cp
INNER JOIN customer c
ON cp.customer_id = c.customer_id
GROUP BY cp.customer_id
HAVING SUM(ROUND(quantity*cost_to_customer_per_qty,2)) > 2000.00
ORDER BY customer_last_name, customer_first_name;

--The total_spend_across_markets can be added as a column to results as desired.


--Temp Table
Expand All @@ -82,6 +137,20 @@ When inserting the new vendor, you need to appropriately align the columns to be
VALUES(col1,col2,col3,col4,col5)
*/

--Create a temp table

DROP TABLE IF EXISTS new_vendor;

CREATE TEMP TABLE new_vendor AS

SELECT *

FROM vendor;

--Add 10th vendor
INSERT INTO new_vendor

VALUES(10, 'Thomas''s Superfood Store', 'Fresh Focused', 'Thomas', 'Rosenthal');


-- Date
Expand All @@ -90,6 +159,13 @@ VALUES(col1,col2,col3,col4,col5)
HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month
and year are! */

SELECT

customer_id
,strftime('%m', market_date) as market_month
,strftime('%Y', market_date) as market_year

FROM customer_purchases;


/* 2. Using the previous query as a base, determine how much money each customer spent in April 2022.
Expand All @@ -98,3 +174,11 @@ Remember that money spent is quantity*cost_to_customer_per_qty.
HINTS: you will need to AGGREGATE, GROUP BY, and filter...
but remember, STRFTIME returns a STRING for your WHERE statement!! */

SELECT

customer_id
, SUM(quantity * cost_to_customer_per_qty) as money_spent_in_April_2022

FROM customer_purchases
WHERE strftime('%m', market_date) = '04' AND strftime('%Y', market_date) = '2022'
GROUP BY customer_id;