This project implements an ETL (Extract, Transform, Load) pipeline using Azure Data Factory (ADF) to process Nintendo Switch game data sourced from Kaggle. The pipeline:
- Extracts raw game data from Azure Blob Storage (CSV).
- Transforms the data using Mapping Data Flows (cleaning, filtering, aggregating, sorting).
- Loads the processed data back into Azure Blob Storage (aggregated by year and new dataset with year column).
✔ Data Ingestion – Pulls raw Nintendo Switch game data from Kaggle into Azure Blob Storage.
✔ Data Transformation – Cleans and structures data (e.g., extracts release year, filters nulls, aggregates by genre/year).
✔ Automated Workflow – Scheduled ADF pipeline for incremental updates.
✔ Output Storage – Stores processed data in Parquet/CSV for analytics.
| Component | Purpose |
|---|---|
| Copy Activity | Ingests raw data from source into Azure Blob Storage |
| Mapping Data Flow | Performs data transformations (cleaning, filtering, aggregation) |
| Blob Storage | Stores both raw and processed data in a structured format |
| Trigger | Schedules pipeline execution (e.g., weekly updates) |
- Ingest Raw Data by copy activity pulls data from Kaggle (or pre-uploaded Blob Storage).
- Data Flow: Transform & Clean by mapping Data Flow by extracting the year from the date and sorting by User Score.
- Aggregate by Year: Group data by year and count games per year.
- Saves New Output to Blob Storage.