We wanted to get into data science because we see this field as exciting, expanding, and relevant. Data science is a great way to tackle real-world challenges using the intersection of CS and Statistics.
Our project visualizes two aspects of adverse outcomes in the CAERS data - frequency and severity - to determine the most harmful products/categories both overall and with respect to different demographics.
We used Jupyter notebook to import, tidy, transform, visualize, and communicate. We took advantage of numpy, pandas, regexlib, matplotlib, and seaborn to go about this process more effectively .
We were unable to fully consolidate the data, since there were still formatting errors in the product names that we could not account for (e. g vitamind3 vs Vitamin D3). We made use of regex and we made partial progress, but ultimately we continued to run into errors.
No one on our team had any experience with data science and we felt proud of ourselves for being able to make such valuable insights in such a short period of time. We are glad we were able make strides in area of computing, and we are excited to try this more in the future.
How to effectively clean data, use pandas, use regex, visualize with seaborn, visualize with matplotlib, and draw conclusions to make informed decisions.
We want to work towards further cleaning the data, analyze more demographics, work in unique intersections within the dataset, and join this dataset with another (e.g income data, geospacial data) to find other relationships.