This is an end-to-end project. It gets data from City of Denver Open Data Catalog does some processing of the data, and uploads the data to Kaggle as the Denver Traffic Accidents dataset.
Note: The City of Denver now hosts their data on an ESRI geodata server with an EmberJS front end. Unfortunately this means I need to figure out Embers's dynamic web pages in order to get around ERSI's default 2000 record limit. This doesn't affect the data on Kaggle now, just updating it.
Project layout:
srcfoldercallreq.py- downloads csv of all data with requestspipeline.py- transfrorms data
notebooksfolder with:Denver_Traffic_Accidents_All_In_OneDenver_Traffic_Accidents_Data_Displaynotebooks
imagesfolder with images for both Kaggle and GitHubtestsfolder
According to the CDC, traffic accidents cost Colorado 943M$ in 2018 alone and that number doesn't seem to include the property loss. In 2022 an article in Denverite noted there were just over 3100 wrecks on streets with a speed limit of 25 mph. Of those 3100+ accidents, 84 resulted in a fatality. As a result, the city approved the "20 is plenty" ordinance, cutting the speed limit to 20mph. Both of these articles are good and might be a start on additional visualizations.
Kaggle wants an EDA notebook included with submissions for the dataset to get a perfect rating. I find this annoying as what gets passed off as EDA is often just the output of a generic, minimal script (head(), shape(), describe(), etc). I try to put more thought into it but I don't live in Denver and haven't been there since I was there for a data science program. So I don't have much at stake but I also don't have any local knowledge that could be the basis of deeper insights. I went with questions I wanted the answers to but more as a way of displaying data rather than telling a story with data.
Still, there are a few takeaways.
- While there are some common accident sites (shown here with a heatmap) the entire city is fair game.
- Covid had a major positive impact on accidents.
- Ruling out non-causes such as "No Apparent", "Other" and "Pending Investigation and/or Court Hearing" we can see aggressive and distracted driving are major issues. Notably, cell phones are smaller than what one might expect.
- DUI/DWAI/DUID-related accidents match our intuition of happening more in the evening hours
- There is a slight degree of seasonality. This is also seen in the monthly chart above, which aggregates the 10 years.
- Kaggle: Dataset and accompanying notebook uploaded with infrequent updates
- Acquiring update infomation: 😖
- GitHub: Expanding and refactoring




