This is nocoder101's ML 101 repo. It includes some book examples and sample codes about ML that I've run. Related book list are below; And I'm also learning Hadoop and relevant platforms&technologies like Apache Spark, Kafka to implement
- Hands-on Machine Learning with Scikit-Learn & TensorFlow by Aurelien Geron (O'reilly Media Inc., 2017)
- Data Science Handbook by Field Cady (John Wiley&Sons.Inc., 2017)
- Real-world Machine Learning by Henrik Brink, Joseph W.Rechards and Mark Feiheroff(Manning, 2016)
- (reserved)
- Advanced Analytics with Spark by Sanford Ryza, Uri Laserson, Sean Owen, and Joshua Wills(O'relly Media Inc., 2017, 2nd edition)
- Scikit-learn(ML in Python)
-
Intro
- http://scikit-learn.org/
- Classification, Regression, Clustering, Dimensionality Reduction, Model Selection, Preprocessing
- BSD license: Open source, Commercially usable
-
Categories
- Classification: identifying to which category an object belongs to
- http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
- (spam detection, image recognition)
- SVM(Support Vector Machine)
- nearest neighbors
- random forest
- Regression: prdeicting a continulus-valued attribute associated with an object
- http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
- (drug response, stock prices)
- SVR(Support Vector Regression)
- ridge regression
- lasso
- Clustering: automatic grouping of similar objects into sets
- http://scikit-learn.org/stable/modules/clustering.html#clustering - (customer segmentation, group experiment outcomes) - k-means
- spectral clustering
- mean-shift
- Dimensionality Reduction: recuding the number random variables to consider
- http://scikit-learn.org/stable/modules/decomposition.html#decompositions - ㄴ visualization, increased efficiency - PCA
- reature selection
- non-negative matrix factorization
- Model Selection: comparing, validating and choosing parameters and models
- http://scikit-learn.org/stable/model_selection.html#model-selection - ㄴ improved accuracy via parameter tuning
- grid search
- cross validation
- metrics
- Preprocessing: feature extraction and normalization
- ㄴ tranforming input data such as text for use with ML algorithms
- preprocessing
- feature extraction
- Classification: identifying to which category an object belongs to
-
Regression
-
- TensorFlow
- https://www.tensorflow.org/
- An open source machine learning framework for everyone