A repository that outlines how to train a Random Forest regressor using Mllib Pyspark.
The basic steps followed are as below:
- Install Spark on Google Colab and load a dataset in PySpark
- Describe and clean your dataset
- Create a Random Forest pipeline to predict car prices
- Create a cross validator for hyperparameter tuning
- Train your model and predict test set car prices
- Evaluate your model’s performance via several metrics
The link to the Coursera Project is below: https://www.coursera.org/projects/spark-machine-learning-pipeline-python