Skip to content

OrHostezky/winning-space-race-with-data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Winning Space Race with Data Science

An end-to-end data science project, completed as a the capstone project for the IBM Data Science Professional Certificate program (course 10), including code notebooks, a dashboard application, and a summary presentation.

Introduction

Context:

  • SpaceX brings an innovative ability to reuse the 1st stage of its Falcon 9 rocket, which lowers launch price by ~70% (~$100M per launch).
  • Determining 1st-stage landing outcome enables us to determine launch cost.
  • Our goal is to implement a workflow to predict 1st-stage landing outcome.

Key Questions:

  • Which factors affect 1st-stage landing outcome and in what way?
  • What is the rate of successful landings over time?
  • Which learning algorithm performs best in this problem?

Methodology

  • Data Collection via REST-API (notebook) and web scraping (notebook).
  • Data Wrangling (notebook).
  • Exploratory Data Analysis (EDA) via data visualization (notebook) and SQL (notebook).
  • Interactive Map using Folium (notebook).
  • Dashboard Building using Plotly Dash (script).
  • Predictive Analysis (Classification) (notebook).

For a detailed account of the methodology, see summary presentation.

Results

See summary presentation.

Conclusions

  • Not all data is relevant for the problem – only some features affect success rate.
  • Launches with large payloads generally have higher success rates.
  • ES-L1, SSO, HEO, GEO, and VLEO orbits all have very high success rates.
  • General success rate shows a clear trend of increase over time.
  • KSC LC-39A launch site has the highest success rate.
  • Launch sites are located in proximity to the coast and equator.
  • All models performed equally well, yet the Decision Tree model was slightly more generalizable for this problem.

Limitations and Future Work:

  • Collection of more data is needed to evaluate model generalizability to unseen data.
  • Additional feature engineering may improve our model efficiency and performance.
  • Ensemble methods like Random Forest and boosting were not used, yet it is highly likely they can be wielded to improve model performance.

About

An end-to-end data science project, completed as the capstone project for the IBM Data Science Professional Certificate program, including code notebooks, a dashboard application, and a summary presentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors