This repository hosts a Jupyter Notebook (DATA_604_L01_05_Final_Report.ipynb) that investigates the dynamics of crime across Toronto neighbourhoods. By integrating multiple datasets—police budgets, crime rates, household income, education levels, shelter occupancy, and bike rack locations—we explore how socioeconomic and environmental factors contribute to crime trends.
- Aaron Gelfand
- David Griffin
- Jackson Meier
- Steen Rasmussen
- Venkateshharan Balu Soundararajan
- Police Budget & Crime Trends: How does the annual operating budget for police services relate to neighbourhood crime rates?
- Household Income & Crime Rates: What is the association between mean household income and crime statistics?
- Education Level & Crime: How do neighbourhood education attainment levels correlate with crime incidents?
- Shelter Occupancy & Crime: Does shelter occupancy influence crime patterns at the neighbourhood level?
- Bike Racks & Bike Thefts: How does the presence of bike parking infrastructure affect bike theft occurrences?
Place the following CSV files in a data/ folder at the project root:
- Police Budget:
Gross Operating Budget.csv(converted toconverted_budget.csv) - Crime Rates:
neighbourhood-crime-rates.csv(converted toconverted_crime.csv) - Income & Education:
neighbourhood-profiles-2021-158-model.csv - Shelter Occupancy:
Data_Shelter_Occupancy_Merged.csv - Geospatial Mapping:
Address Points_Neighbourhoods.csv
Note: This file is very large (~1.8 GB). We recommend downloading only the neighbourhood subset or hosting it externally and loading via URL, rather than including it directly in the repo. - Bike Racks:
Bicycle Parking Racks Data - 4326.csv(cleaned asCleaned_Bicycle_Parking_Data.csv)
Additional intermediate CSVs generated by the notebook (e.g., crime_long2.csv, capacity_query.csv, bike_theft_area.csv) are stored in data/ after preprocessing.
To include your data files in the repository and push them to remote GitHub:
-
Add files
git add data/Address\ Points_Neighbourhoods.csv \ data/Bicycle\ Parking\ Racks\ Data\ -\ 4326.csv \ data/Data_Shelter_Occupancy_Merged.csv \ data/Gross\ Operating\ Budget.csv \ data/neighbourhood-crime-rates.geojson \ data/neighbourhood-profiles-2021-158-model.csv
Or to add all files in the
data/folder:git add data/* -
Commit changes
git commit -m "Add raw data files for analysis" -
Push to remote
git push origin main
For files like Address Points_Neighbourhoods.csv (~1.8 GB), consider one of these approaches:
git lfs install
git lfs track "data/Address Buttons_Neighbourhoods.csv"
git add .gitattributes
git add "data/Address Points_Neighbourhoods.csv"
git commit -m "Add Address Points file using Git LFS"
git push origin mainimport requests
url = "https://your-bucket.s3.amazonaws.com/Address%20Points_Neighbourhoods.csv"
response = requests.get(url, stream=True)
with open("data/Address Points_Neighbourhoods.csv", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)Create a Python (>=3.8) virtual environment and install required packages:
pip install -r requirements.txtrequirements.txt:
pandas
numpy
sqlalchemy
mysql-connector-python
matplotlib
seaborn
statsmodels
scipy
geopandas
shapely
folium
plotly
The notebook uploads cleaned tables into a MySQL database. Update the connection string in the first code cell:
engine = create_engine(
"mysql+mysqlconnector://<user>:<password>@<host>:<port>/<database>"
)- Clone the repository:
git clone https://github.com/<username>/<repo-name>.git cd <repo-name>
- Place all raw and cleaned CSVs into
data/. - Activate your virtual environment and install dependencies.
- Open Jupyter Notebook and run:
jupyter notebook DATA_604_L01_05_Final_Report.ipynb
- Execute cells sequentially to reproduce data cleaning, analysis, and visualizations.
- Setup & Imports: Load Python libraries and configure database engine.
- Data Cleaning & Loading: Read raw CSVs, clean variables, and push to MySQL.
- Exploratory Analysis: Visualize individual relationships for each guiding question.
- Geospatial Mapping: Create neighbourhood maps with crime and infrastructure overlays.
- Statistical Modeling: Fit regression and correlation models to quantify associations.
- Discussion & Conclusions: Summarize key findings and policy implications.
Contributions and improvements are welcome:
- Fork this repository.
- Create a branch:
git checkout -b feature-name. - Commit your changes:
git commit -m "Add feature". - Push:
git push origin feature-name. - Open a Pull Request for review.
This project is licensed under the MIT License. See LICENSE for details.
Prepared by DATA 604 Group L01-05