Skip to content

rarora04/datathon

Repository files navigation

datathon

Inspiration

We wanted to get into data science because we see this field as exciting, expanding, and relevant. Data science is a great way to tackle real-world challenges using the intersection of CS and Statistics.

What it does

Our project visualizes two aspects of adverse outcomes in the CAERS data - frequency and severity - to determine the most harmful products/categories both overall and with respect to different demographics.

How we built it

We used Jupyter notebook to import, tidy, transform, visualize, and communicate. We took advantage of numpy, pandas, regexlib, matplotlib, and seaborn to go about this process more effectively .

Challenges we ran into

We were unable to fully consolidate the data, since there were still formatting errors in the product names that we could not account for (e. g vitamind3 vs Vitamin D3). We made use of regex and we made partial progress, but ultimately we continued to run into errors.

Accomplishments that we're proud of

No one on our team had any experience with data science and we felt proud of ourselves for being able to make such valuable insights in such a short period of time. We are glad we were able make strides in area of computing, and we are excited to try this more in the future.

What we learned

How to effectively clean data, use pandas, use regex, visualize with seaborn, visualize with matplotlib, and draw conclusions to make informed decisions.

What's next for CAERSDataAnalysis

We want to work towards further cleaning the data, analyze more demographics, work in unique intersections within the dataset, and join this dataset with another (e.g income data, geospacial data) to find other relationships.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors