CAERSDataAnalysis

Frequency of Harmful Product Categories
Severity Score Table
Frequency of Harmful Cosmetic Products
Frequency of Harmful Vitamin/Mineral/Protein/Other Diet Products
Cosmetics Frequency Table
Vitamin Frequency Table
Severity Scores Across Each Category

Inspiration

We wanted to get into data science because we see this field as exciting, expanding, and relevant. Data science is a great way to tackle real-world challenges using the intersection of CS and Statistics.

What it does

Our project visualizes two aspects of adverse outcomes in the CAERS data - frequency and severity - to determine the most harmful products/categories both overall and with respect to different demographics - sex and age. We were able to narrow down amongst product categories and specific product brands to identify which items require heavier FDA regulation.

How we built it

We used Jupyter notebook to import, clean, transform, visualize, and communicate. We took advantage of numpy, pandas, regexlib, matplotlib, and seaborn to go about this process more effectively .

Challenges we ran into

We were unable to fully consolidate the data, since there were still formatting errors in the product names that we could not account for (e. g vitamind3 vs Vitamin D3). We made use of regex and we made partial progress, but ultimately we continued to run into errors.

Accomplishments that we're proud of

No one on our team had any experience with data science and we felt proud of ourselves for being able to make such valuable insights in such a short period of time. We are glad we were able make strides in this area of computing, and we are excited to try this more in the future.

What we learned

How to effectively clean data, use pandas, use regex, visualize with seaborn, visualize with matplotlib, and draw conclusions to make informed decisions using an unknown dataset.

What's next for CAERSDataAnalysis

We want to work towards further cleaning and condensing the data, analyzing more demographics, working in unique intersections within the dataset, and joining this dataset with another dataset (e.g income data, geospacial data) to find other relationships.