An end-to-end data science project, completed as a the capstone project for the IBM Data Science Professional Certificate program (course 10), including code notebooks, a dashboard application, and a summary presentation.
- SpaceX brings an innovative ability to reuse the 1st stage of its Falcon 9 rocket, which lowers launch price by ~70% (~$100M per launch).
- Determining 1st-stage landing outcome enables us to determine launch cost.
- Our goal is to implement a workflow to predict 1st-stage landing outcome.
- Which factors affect 1st-stage landing outcome and in what way?
- What is the rate of successful landings over time?
- Which learning algorithm performs best in this problem?
- Data Collection via REST-API (notebook) and web scraping (notebook).
- Data Wrangling (notebook).
- Exploratory Data Analysis (EDA) via data visualization (notebook) and SQL (notebook).
- Interactive Map using Folium (notebook).
- Dashboard Building using Plotly Dash (script).
- Predictive Analysis (Classification) (notebook).
For a detailed account of the methodology, see summary presentation.
See summary presentation.
- Not all data is relevant for the problem – only some features affect success rate.
- Launches with large payloads generally have higher success rates.
- ES-L1, SSO, HEO, GEO, and VLEO orbits all have very high success rates.
- General success rate shows a clear trend of increase over time.
- KSC LC-39A launch site has the highest success rate.
- Launch sites are located in proximity to the coast and equator.
- All models performed equally well, yet the Decision Tree model was slightly more generalizable for this problem.
- Collection of more data is needed to evaluate model generalizability to unseen data.
- Additional feature engineering may improve our model efficiency and performance.
- Ensemble methods like Random Forest and boosting were not used, yet it is highly likely they can be wielded to improve model performance.