This project builds an ETL (Extract, Transform, Load) pipeline using the Spotify API and AWS cloud services. The pipeline extracts playlist data, transforms it into a structured format, and loads it into an AWS data store for analysis.
Extract: Fetches data from Spotify API. Transform: Cleans, processes, and structures the data. Load: Stores processed data in AWS S3, followed by schema inference and querying
The dataset is sourced from the Spotify API and contains information on:
Music artists Albums Songs
- Amazon S3 - Stores raw and transformed data.
- AWS Lambda - Extracts and transforms data automatically.
- Amazon CloudWatch - Monitors and triggers data extraction every hour.
- AWS Glue Crawler - Infers schema for structured storage.
- AWS Data Catalog - Stores metadata for better organization.
- Amazon Athena - Enables SQL-based queries on stored data.
pip install pandas pip install numpy pip install spotipy
Extract Data from API → Lambda Trigger (every 1 hour) → Run Extract Code
→ Store Raw Data → Trigger → Transform Data → Load It → Query Using Athena
git clone https://github.com/maimran786/Spotify-ETL-Pipeline.git cd Spotify-ETL-Pipeline
python scripts/spotify_api_data_extract.py
python scripts/spotify_transformation_load_function.py
