Skip to content

hdanish/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science

This repository features a number of examples of work I did while pursuing a Master of Information and Data Science at UC Berkeley. I completed the MIDS degree in August 2016.

Information regarding the final Capstone Project for the degree can be found here: https://www.ischool.berkeley.edu/projects/2016/brand-buzz-reddit. This was a group project completed with my colleagues Vincent Chio, Filip Krunic and Ritesh Soni. Our final presentable, which can be accessed through the previous link, can also be viewed directly here: http://team-gilded-dashboard.herokuapp.com/

The contents of this repo are divided as follows:

  • Audio Quality Experiment: My colleague Jasen Jones and I designed an experiment with the aim to answer the question of whether audio quality affects a user’s enjoyment of music. Included is a link to our final report findings, the data we collected as well as some of the code for the analysis I did using R.
  • Machine Learning Examples: These examples are presented using Python in Jupyter notebooks and they make use of re (regex module), numpy, matplotlib and scikit-learn
  • Streaming Tweet Processing: Having set up a Spark cluster on SoftLayer, I use Scala and Spark streaming to build a Twitter popular topic and user reporting system.
  • Click-Through Rate (CTR) Prediction: Using a Jupyter notebook template with Python and Spark, I use the Criteo Labs dataset from a Kaggle competion for featurizing categorical data using one-hot encoding (OHE) and predicting CTR.

About

A collection of data science related code and examples using technologies such R, Python, Scikit-learn, Spark and Scala

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages