Spotify-ETL-Pipeline

Introduction

This project builds an ETL (Extract, Transform, Load) pipeline using the Spotify API and AWS cloud services. The pipeline extracts playlist data, transforms it into a structured format, and loads it into an AWS data store for analysis.

Architecture

ETL Pipeline Flow

Extract: Fetches data from Spotify API. Transform: Cleans, processes, and structures the data. Load: Stores processed data in AWS S3, followed by schema inference and querying

About Dataset/API

The dataset is sourced from the Spotify API and contains information on:

Music artists Albums Songs

Services Used

Amazon S3 - Stores raw and transformed data.
AWS Lambda - Extracts and transforms data automatically.
Amazon CloudWatch - Monitors and triggers data extraction every hour.
AWS Glue Crawler - Infers schema for structured storage.
AWS Data Catalog - Stores metadata for better organization.
Amazon Athena - Enables SQL-based queries on stored data.

Install Dependencies

pip install pandas pip install numpy pip install spotipy

Project Execution Flow

Extract Data from API → Lambda Trigger (every 1 hour) → Run Extract Code
→ Store Raw Data → Trigger → Transform Data → Load It → Query Using Athena

How to Run the Project Locally

Clone the Repository(bash):

git clone https://github.com/maimran786/Spotify-ETL-Pipeline.git cd Spotify-ETL-Pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
Docs		Docs
Notebooks		Notebooks
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify-ETL-Pipeline

Introduction

Architecture

ETL Pipeline Flow

About Dataset/API

Services Used

Install Dependencies

Project Execution Flow

How to Run the Project Locally

Clone the Repository(bash):

Set up environment variables (AWS & Spotify credentials).

Run Extraction Script

Run Transformation & Load Script

Query Data in Athena (AWS Console)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spotify-ETL-Pipeline

Introduction

Architecture

ETL Pipeline Flow

About Dataset/API

Services Used

Install Dependencies

Project Execution Flow

How to Run the Project Locally

Clone the Repository(bash):

Set up environment variables (AWS & Spotify credentials).

Run Extraction Script

Run Transformation & Load Script

Query Data in Athena (AWS Console)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages