Skip to content

Marveric-18/SpecexInsights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to SpaceX Insights

SpaceX Insight is a project build upon SpaceX data - where ETL pipeline works over to extract data related to Launch, Rockets and Payloads and transformes it to show valuable insights.

Prerequisites

Setup Check

  • Make sure node.js version is installed : In your terminal run node -v
  • Make sure mongodb service is running and is on port 27017 (if different port kindly change db link inside spacex-backend/src/db/index.js
  • Make sure no service is running on PORT 3000 and 8082 on your local

How to Run

Start Backend Service

  • Open SpacexInsight project in your choice of code editor
  • Open Terminal and go to spacex-backend directoy using cd spacex-backend
  • Install dependencies using command npm run install
  • run command npm run start If any issue revisit the Setup checks

Start Frontend Service

  • Open another Terminal and go to spacex-insights directoy using cd spacex-backend
  • Install dependencies using command npm run install
  • run command npm run start If any issue revisit the Setup checks

How it works

The project is devided into 3 parts

  • ETL Pipeline
  • Serving data through Backend Service
  • Showing Insights through Frontend Service

ETL Pipeline

To build a ETL Pipeline I have used ES6 class which has methods for each stage of ETL (Extract, Transform and Load)

Extract

  • To extract data we are using SpaceX-GraphQL Server and API Version (v4).
  • The only reason to use API Version is that current version of SpaceX-GraphQL server is having some issue fetching payload data and some of the data of launch.
  • I have fetched 3 datasets. Rocket Data using GraphQL Query & Launch and Payload Data with v4 api request.
  • Simply fetching data via both mediums and then saving it inside class variables.

Transform

  • For all fetched data of launch, rockets and payload I have transformation functions (mappers) which are converting it to more readable and efficient format.

Load

  • Cleaning all collections from database
  • Load all transformed data to the database

Database Schema

To efficiently connect Launch, Rocket and Payloads Data I have used the following schema

Launch {
    launch_id : PK of Launch Data,
    rocket_id : Reference to rocket,
    ...Rest of Launch Data
}

Rocket {
    rocket_id : PK of Rocket Data,
    ...Rest of Rocket Data
}

Payload {
    payload_id : PK of Payload Data,
    launch_id : Reference to Launch,
    ...Rest of Payload Data
}

This ensures that connect all the records and query it.

spacex-backend

REST Api Setup

Get Methods

  • /insights/reload-data: Reloads the data via running the ETL pipeline , cleaning older data and fetching new version of data
  • /insights/payload-statistics: Fetch Payload Statistics sent to outerspace
  • /insights/launch-frequency-by-year: Fetch Launch Frequency by Year with Success and Failure counts in each year
  • /insights/launch-frequency-by-month: Fetch Launch Frequency by Each Month
  • /insights/rocket-payload-efficiency: Fetch Insight between Rocket and Payload Efficiency and Spent revenue
  • /insights/orbitwise-risk: Fetch Orbit wise data with success rate and total payload weight

spacex-insights

  • React with React Charts to show insights
  • Please checkout this video on how it works.

Further Improvements

  • Show extra statistics in tabular format.
  • Apply rate limiter on extract stage of ETL pipeline, allowing only 1 call at a time.
  • Apply cache on Backend extract with 24 hours time since spacex data dont get changed that often
  • Track and log ETL job status
  • Cron job to fetch and refresh/reload data in a timely manner.
  • Handle connection issues while fetching data, handle timeout and delay.
  • Increase insightful data by adding more graphs like orbitwise-risk data.
  • Cache API responses on the frontend to avoid unnecessary re-fetching.
  • Store the relation of Launch, Rocket, and Payload into a separate table to easily fetch data.
  • Add more comments

About

A project consuming Spacex Data and ETL pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors