Inspiration
We wanted to get into data science because we see this field as exciting, expanding, and relevant. Data science is a great way to tackle real-world challenges using the intersection of CS and Statistics.
What it does
Our project visualizes two aspects of adverse outcomes in the CAERS data - frequency and severity - to determine the most harmful products/categories both overall and with respect to different demographics - sex and age. We were able to narrow down amongst product categories and specific product brands to identify which items require heavier FDA regulation.
How we built it
We used Jupyter notebook to import, clean, transform, visualize, and communicate. We took advantage of numpy, pandas, regexlib, matplotlib, and seaborn to go about this process more effectively .
Challenges we ran into
We were unable to fully consolidate the data, since there were still formatting errors in the product names that we could not account for (e. g vitamind3 vs Vitamin D3). We made use of regex and we made partial progress, but ultimately we continued to run into errors.
Accomplishments that we're proud of
No one on our team had any experience with data science and we felt proud of ourselves for being able to make such valuable insights in such a short period of time. We are glad we were able make strides in this area of computing, and we are excited to try this more in the future.
What we learned
How to effectively clean data, use pandas, use regex, visualize with seaborn, visualize with matplotlib, and draw conclusions to make informed decisions using an unknown dataset.
What's next for CAERSDataAnalysis
We want to work towards further cleaning and condensing the data, analyzing more demographics, working in unique intersections within the dataset, and joining this dataset with another dataset (e.g income data, geospacial data) to find other relationships.
Log in or sign up for Devpost to join the conversation.