Skip to content

venkateshsoundar/toronto-crime-drivers

Repository files navigation

Dynamics of Crime in Toronto: Socioeconomic and Environmental Drivers

Data Source: City of Toronto Open Data License: MIT

📖 Project Overview

This repository hosts a Jupyter Notebook (DATA_604_L01_05_Final_Report.ipynb) that investigates the dynamics of crime across Toronto neighbourhoods. By integrating multiple datasets—police budgets, crime rates, household income, education levels, shelter occupancy, and bike rack locations—we explore how socioeconomic and environmental factors contribute to crime trends.

👥 Authors

  • Aaron Gelfand
  • David Griffin
  • Jackson Meier
  • Steen Rasmussen
  • Venkateshharan Balu Soundararajan

🔍 Guiding Questions

  1. Police Budget & Crime Trends: How does the annual operating budget for police services relate to neighbourhood crime rates?
  2. Household Income & Crime Rates: What is the association between mean household income and crime statistics?
  3. Education Level & Crime: How do neighbourhood education attainment levels correlate with crime incidents?
  4. Shelter Occupancy & Crime: Does shelter occupancy influence crime patterns at the neighbourhood level?
  5. Bike Racks & Bike Thefts: How does the presence of bike parking infrastructure affect bike theft occurrences?

🗄️ Data Sources

Place the following CSV files in a data/ folder at the project root:

  • Police Budget: Gross Operating Budget.csv (converted to converted_budget.csv)
  • Crime Rates: neighbourhood-crime-rates.csv (converted to converted_crime.csv)
  • Income & Education: neighbourhood-profiles-2021-158-model.csv
  • Shelter Occupancy: Data_Shelter_Occupancy_Merged.csv
  • Geospatial Mapping: Address Points_Neighbourhoods.csv
    Note: This file is very large (~1.8 GB). We recommend downloading only the neighbourhood subset or hosting it externally and loading via URL, rather than including it directly in the repo.
  • Bike Racks: Bicycle Parking Racks Data - 4326.csv (cleaned as Cleaned_Bicycle_Parking_Data.csv)

Additional intermediate CSVs generated by the notebook (e.g., crime_long2.csv, capacity_query.csv, bike_theft_area.csv) are stored in data/ after preprocessing.

📁 Uploading Data Files to Git

To include your data files in the repository and push them to remote GitHub:

  1. Add files

    git add data/Address\ Points_Neighbourhoods.csv \
            data/Bicycle\ Parking\ Racks\ Data\ -\ 4326.csv \
            data/Data_Shelter_Occupancy_Merged.csv \
            data/Gross\ Operating\ Budget.csv \
            data/neighbourhood-crime-rates.geojson \
            data/neighbourhood-profiles-2021-158-model.csv

    Or to add all files in the data/ folder:

    git add data/*
  2. Commit changes

    git commit -m "Add raw data files for analysis"
  3. Push to remote

    git push origin main

Handling Very Large Files

For files like Address Points_Neighbourhoods.csv (~1.8 GB), consider one of these approaches:

a) Use Git Large File Storage (LFS)

git lfs install
git lfs track "data/Address Buttons_Neighbourhoods.csv"
git add .gitattributes
git add "data/Address Points_Neighbourhoods.csv"
git commit -m "Add Address Points file using Git LFS"
git push origin main

b) Host Externally and Download Programmatically

import requests

url = "https://your-bucket.s3.amazonaws.com/Address%20Points_Neighbourhoods.csv"
response = requests.get(url, stream=True)
with open("data/Address Points_Neighbourhoods.csv", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

⚙️ Dependencies

Create a Python (>=3.8) virtual environment and install required packages:

pip install -r requirements.txt

requirements.txt:

pandas
numpy
sqlalchemy
mysql-connector-python
matplotlib
seaborn
statsmodels
scipy
geopandas
shapely
folium
plotly

💾 Database Setup

The notebook uploads cleaned tables into a MySQL database. Update the connection string in the first code cell:

engine = create_engine(
    "mysql+mysqlconnector://<user>:<password>@<host>:<port>/<database>"
)

🚀 Usage

  1. Clone the repository:
    git clone https://github.com/<username>/<repo-name>.git
    cd <repo-name>
  2. Place all raw and cleaned CSVs into data/.
  3. Activate your virtual environment and install dependencies.
  4. Open Jupyter Notebook and run:
    jupyter notebook DATA_604_L01_05_Final_Report.ipynb
  5. Execute cells sequentially to reproduce data cleaning, analysis, and visualizations.

📑 Notebook Structure

  1. Setup & Imports: Load Python libraries and configure database engine.
  2. Data Cleaning & Loading: Read raw CSVs, clean variables, and push to MySQL.
  3. Exploratory Analysis: Visualize individual relationships for each guiding question.
  4. Geospatial Mapping: Create neighbourhood maps with crime and infrastructure overlays.
  5. Statistical Modeling: Fit regression and correlation models to quantify associations.
  6. Discussion & Conclusions: Summarize key findings and policy implications.

🤝 Contributing

Contributions and improvements are welcome:

  1. Fork this repository.
  2. Create a branch: git checkout -b feature-name.
  3. Commit your changes: git commit -m "Add feature".
  4. Push: git push origin feature-name.
  5. Open a Pull Request for review.

📜 License

This project is licensed under the MIT License. See LICENSE for details.


Prepared by DATA 604 Group L01-05

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors