Dynamics of Crime in Toronto: Socioeconomic and Environmental Drivers

📖 Project Overview

This repository hosts a Jupyter Notebook (DATA_604_L01_05_Final_Report.ipynb) that investigates the dynamics of crime across Toronto neighbourhoods. By integrating multiple datasets—police budgets, crime rates, household income, education levels, shelter occupancy, and bike rack locations—we explore how socioeconomic and environmental factors contribute to crime trends.

👥 Authors

Aaron Gelfand
David Griffin
Jackson Meier
Steen Rasmussen
Venkateshharan Balu Soundararajan

🔍 Guiding Questions

Police Budget & Crime Trends: How does the annual operating budget for police services relate to neighbourhood crime rates?
Household Income & Crime Rates: What is the association between mean household income and crime statistics?
Education Level & Crime: How do neighbourhood education attainment levels correlate with crime incidents?
Shelter Occupancy & Crime: Does shelter occupancy influence crime patterns at the neighbourhood level?
Bike Racks & Bike Thefts: How does the presence of bike parking infrastructure affect bike theft occurrences?

🗄️ Data Sources

Place the following CSV files in a data/ folder at the project root:

Police Budget: Gross Operating Budget.csv (converted to converted_budget.csv)
Crime Rates: neighbourhood-crime-rates.csv (converted to converted_crime.csv)
Income & Education: neighbourhood-profiles-2021-158-model.csv
Shelter Occupancy: Data_Shelter_Occupancy_Merged.csv
Geospatial Mapping: Address Points_Neighbourhoods.csv
Note: This file is very large (~1.8 GB). We recommend downloading only the neighbourhood subset or hosting it externally and loading via URL, rather than including it directly in the repo.
Bike Racks: Bicycle Parking Racks Data - 4326.csv (cleaned as Cleaned_Bicycle_Parking_Data.csv)

Additional intermediate CSVs generated by the notebook (e.g., crime_long2.csv, capacity_query.csv, bike_theft_area.csv) are stored in data/ after preprocessing.

📁 Uploading Data Files to Git

To include your data files in the repository and push them to remote GitHub:

Add files

git add data/Address\ Points_Neighbourhoods.csv \
        data/Bicycle\ Parking\ Racks\ Data\ -\ 4326.csv \
        data/Data_Shelter_Occupancy_Merged.csv \
        data/Gross\ Operating\ Budget.csv \
        data/neighbourhood-crime-rates.geojson \
        data/neighbourhood-profiles-2021-158-model.csv

Or to add all files in the data/ folder:

git add data/*

Commit changes

git commit -m "Add raw data files for analysis"

Push to remote
```
git push origin main
```

Handling Very Large Files

For files like Address Points_Neighbourhoods.csv (~1.8 GB), consider one of these approaches:

a) Use Git Large File Storage (LFS)

git lfs install
git lfs track "data/Address Buttons_Neighbourhoods.csv"
git add .gitattributes
git add "data/Address Points_Neighbourhoods.csv"
git commit -m "Add Address Points file using Git LFS"
git push origin main

b) Host Externally and Download Programmatically

import requests

url = "https://your-bucket.s3.amazonaws.com/Address%20Points_Neighbourhoods.csv"
response = requests.get(url, stream=True)
with open("data/Address Points_Neighbourhoods.csv", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

⚙️ Dependencies

Create a Python (>=3.8) virtual environment and install required packages:

pip install -r requirements.txt

requirements.txt:

pandas
numpy
sqlalchemy
mysql-connector-python
matplotlib
seaborn
statsmodels
scipy
geopandas
shapely
folium
plotly

💾 Database Setup

The notebook uploads cleaned tables into a MySQL database. Update the connection string in the first code cell:

engine = create_engine(
    "mysql+mysqlconnector://<user>:<password>@<host>:<port>/<database>"
)

🚀 Usage

Clone the repository:

git clone https://github.com/<username>/<repo-name>.git
cd <repo-name>

Place all raw and cleaned CSVs into data/.
Activate your virtual environment and install dependencies.

Open Jupyter Notebook and run:

jupyter notebook DATA_604_L01_05_Final_Report.ipynb

Execute cells sequentially to reproduce data cleaning, analysis, and visualizations.

📑 Notebook Structure

Setup & Imports: Load Python libraries and configure database engine.
Data Cleaning & Loading: Read raw CSVs, clean variables, and push to MySQL.
Exploratory Analysis: Visualize individual relationships for each guiding question.
Geospatial Mapping: Create neighbourhood maps with crime and infrastructure overlays.
Statistical Modeling: Fit regression and correlation models to quantify associations.
Discussion & Conclusions: Summarize key findings and policy implications.

🤝 Contributing

Contributions and improvements are welcome:

Fork this repository.
Create a branch: git checkout -b feature-name.
Commit your changes: git commit -m "Add feature".
Push: git push origin feature-name.
Open a Pull Request for review.

📜 License

This project is licensed under the MIT License. See LICENSE for details.

Prepared by DATA 604 Group L01-05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamics of Crime in Toronto: Socioeconomic and Environmental Drivers

📖 Project Overview

👥 Authors

🔍 Guiding Questions

🗄️ Data Sources

📁 Uploading Data Files to Git

Handling Very Large Files

a) Use Git Large File Storage (LFS)

b) Host Externally and Download Programmatically

⚙️ Dependencies

💾 Database Setup

🚀 Usage

📑 Notebook Structure

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Address Points_Neighbourhoods.zip		Address Points_Neighbourhoods.zip
Bicycle Parking Racks Data - 4326.csv		Bicycle Parking Racks Data - 4326.csv
DATA_604_L01_05_Final_Report.ipynb		DATA_604_L01_05_Final_Report.ipynb
Data_Shelter_Occupancy_Merged.csv		Data_Shelter_Occupancy_Merged.csv
Gross Operating Budget.csv		Gross Operating Budget.csv
README.md		README.md
neighbourhood-crime-rates.geojson		neighbourhood-crime-rates.geojson

Folders and files

Latest commit

History

Repository files navigation

Dynamics of Crime in Toronto: Socioeconomic and Environmental Drivers

📖 Project Overview

👥 Authors

🔍 Guiding Questions

🗄️ Data Sources

📁 Uploading Data Files to Git

Handling Very Large Files

a) Use Git Large File Storage (LFS)

b) Host Externally and Download Programmatically

⚙️ Dependencies

💾 Database Setup

🚀 Usage

📑 Notebook Structure

🤝 Contributing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages