- S1 = 60 minutes each before lunch
- S2 = 75’ before lunch
- S3, S4 = 90 minutes each after lunch
- Who we are
- What we believe
- Data Science use cases at GOJEK?
- How to spot DS opportunities?
- AI - Big Data - Machine Learning
- How all these three are related?
- Machine Learning:
- What are different ML model classes
- Data Science Flow
- The three DS core. (Biz, Math, Coding)
- DS Problem formulation (northstar metrics to solve: BCR)
- ML 101 (Linear regression): (use Jupyter notebook for demo)
- Parameter estimations:
- Loss minimization (intuitively)
- Bias-Variance trade-offs
- Model Evaluation:
- Briefly about model evaluation
- Parameter estimations:
- Intro to Case Studies 15’
- Intro to Pandas (notebook) 45’
- Definitions
- Basic functions
- Merging dataframes
- Intro to Data Visualization (notebook) 45’
- Introducing matplotlib and seaborn with some basic charts
- A bit more not so simple charts (with interaction, sankey flow, etc)
- Pandas Exercises
- Viz Exercises
*use collabs
Keywords : Problem Definition, Correlation, IID, Practice
Gradient Descent
Keywords : Problem Definition, Information Theory, Entropy, Type of Classification, Problem, Practice (add
-
40 mins - Feature Engineering & Feature Selection
-
50 mins - Ensemble learning (bagging & boosting) - hands on
Keywords : Averaging model, OneHot, Bootstrap, Bagging, Boosting
Keywords : Cost Function, Bias-Variance-Trade Off, Cross Validation, F1 Score, Recall, (other metrics)
- Regression: using new dataset with non-poly linear reg + plot
- Classification: using new imbalanced dataset, discuss other classification metrics, also using medium dataset (F1 score, recall, etc)
- Ensemble: boosting (adaboost) hands-on, using medium dataset(bias-variance tradeoff)
- 10 mins - Theory
- 50 mins - Guided K-Means algorithm creation (hands-on exercise)
- 60 mins - Application to K-Means on an image (hands-on exercise)
- 10 mins - Other unsupervised learning algorithms theory
- 5 mins - Homework explanation (wine data)
- 10 mins - Why an what for NN.
- 10 mins - What are computation graphs and derivations over computation graphs
- 30 mins: Hands on : Implement logistic regression in NN style.
- 10 mins: NN representation
- 5 mins: Activation functions
- 20 mins: Gradient descent and Backpropagation
- 10 mins: continue Gradient descent and Backpropagation
- 10 mins: Initialization of weights
- 30 mins: Hands on: Implement shallow NN using python and numpy
- 10 mins: Explain homework, building blocks of DNN, forward and back prop for DNN
- 5 mins: Softmax regression
- 30 mins: Hands on: Softmax regression using Keras
- S1 - Intro to the Kaggle problem, participants make group, start
- S2 - Kaggle-ing
- S4 - Presentation from the top 5 of the teams
For some module, you are required to install this environment. For the rest, we will use Google Colab
For Mac:
brew install pyenv
For Ubuntu, you need to clone the repo.
cd
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Then, modify your environment variables.
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/<.bash_profile/.zshrc>
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/<.bash_profile/.zshrc>
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/<.bash_profile/.zshrc>
Note for Ubuntu, modify your ~/.bashrc instead of ~/.bash_profile.
Lastly, restart your shell so the path changes take effect.
exec "$SHELL"
- Refer to pyenv for more details
Install dependencies: (you have to be connected to Gojek Integration VPN in order to install dependencies from http://artifactory-gojek.golabs.io)
make env