SpaceX Insight is a project build upon SpaceX data - where ETL pipeline works over to extract data related to Launch, Rockets and Payloads and transformes it to show valuable insights.
- Node.js Version (18 or above) Installation Guide
- MongoDB Service Installation Guide
- Make sure node.js version is installed : In your terminal run node -v
- Make sure mongodb service is running and is on port 27017 (if different port kindly change db link inside spacex-backend/src/db/index.js
- Make sure no service is running on PORT 3000 and 8082 on your local
- First clone SpecexInsights repo to your local computer using git clone https://github.com/Marveric-18/SpecexInsights.git
- Open SpacexInsight project in your choice of code editor
- Open Terminal and go to spacex-backend directoy using cd spacex-backend
- Install dependencies using command npm run install
- run command npm run start If any issue revisit the Setup checks
- Open another Terminal and go to spacex-insights directoy using cd spacex-backend
- Install dependencies using command npm run install
- run command npm run start If any issue revisit the Setup checks
The project is devided into 3 parts
- ETL Pipeline
- Serving data through Backend Service
- Showing Insights through Frontend Service
To build a ETL Pipeline I have used ES6 class which has methods for each stage of ETL (Extract, Transform and Load)
- To extract data we are using SpaceX-GraphQL Server and API Version (v4).
- The only reason to use API Version is that current version of SpaceX-GraphQL server is having some issue fetching payload data and some of the data of launch.
- I have fetched 3 datasets. Rocket Data using GraphQL Query & Launch and Payload Data with v4 api request.
- Simply fetching data via both mediums and then saving it inside class variables.
- For all fetched data of launch, rockets and payload I have transformation functions (mappers) which are converting it to more readable and efficient format.
- Cleaning all collections from database
- Load all transformed data to the database
To efficiently connect Launch, Rocket and Payloads Data I have used the following schema
Launch {
launch_id : PK of Launch Data,
rocket_id : Reference to rocket,
...Rest of Launch Data
}
Rocket {
rocket_id : PK of Rocket Data,
...Rest of Rocket Data
}
Payload {
payload_id : PK of Payload Data,
launch_id : Reference to Launch,
...Rest of Payload Data
}
This ensures that connect all the records and query it.
- /insights/reload-data: Reloads the data via running the ETL pipeline , cleaning older data and fetching new version of data
- /insights/payload-statistics: Fetch Payload Statistics sent to outerspace
- /insights/launch-frequency-by-year: Fetch Launch Frequency by Year with Success and Failure counts in each year
- /insights/launch-frequency-by-month: Fetch Launch Frequency by Each Month
- /insights/rocket-payload-efficiency: Fetch Insight between Rocket and Payload Efficiency and Spent revenue
- /insights/orbitwise-risk: Fetch Orbit wise data with success rate and total payload weight
- React with React Charts to show insights
- Please checkout this video on how it works.
- Show extra statistics in tabular format.
- Apply rate limiter on extract stage of ETL pipeline, allowing only 1 call at a time.
- Apply cache on Backend extract with 24 hours time since spacex data dont get changed that often
- Track and log ETL job status
- Cron job to fetch and refresh/reload data in a timely manner.
- Handle connection issues while fetching data, handle timeout and delay.
- Increase insightful data by adding more graphs like orbitwise-risk data.
- Cache API responses on the frontend to avoid unnecessary re-fetching.
- Store the relation of Launch, Rocket, and Payload into a separate table to easily fetch data.
- Add more comments