Binary Classification with Apache Spark / HDFS

↖data source

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

`munge` :

Aggregate columns using RDD transformations
Create a column that indicates which of those column aggregations are outliers.

`fit_predict` :

Model data with Spark Machine Learning package
Predict on test data

`munge_fit_predict` :

Run this as is to use the toy data set example

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
img		img
src		src
README.md		README.md
toyExample.ipynb		toyExample.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary Classification with Apache Spark / HDFS

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

`munge` :

`fit_predict` :

`munge_fit_predict` :

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Binary Classification with Apache Spark / HDFS

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

munge :

fit_predict :

munge_fit_predict :

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`munge` :

`fit_predict` :

`munge_fit_predict` :

Packages