Skip to content

arodriguezrivero/sql-data-warehouse-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Modern Data Warehouse Implementation: AeroCycle Global

SQL Architecture License

📌 Project Overview

This project demonstrates the end-to-end design and implementation of a Modern Data Warehouse for AeroCycle Global, a hypothetical bicycle manufacturer. The goal was to consolidate fragmented data from CRM and ERP systems into a unified analytical hub using the Medallion Architecture.

To see the Project Plan click here

🏗️ Data Architecture

The pipeline follows the Medallion framework to ensure data quality and lineage:

  1. Bronze Layer: Raw data ingestion from CRM (Sales, Products, Customers) and ERP (Locations, Categories) sources.
  2. Silver Layer: Data cleansing, deduplication, and schema standardization.
  3. Gold Layer: Business-ready dimensional model (Star Schema) optimized for BI and Reporting.
Architecture-DWH drawio

🛠️ Key Technical Features

1. Robust Naming Conventions

To maintain a production-grade environment, I implemented a strict naming policy:

  • Snake_case for all objects.
  • Layer Prefixes: crm_, erp_ for Bronze/Silver; dim_, fact_, agg_ for Gold.
  • Surrogate Keys: All dimensions use a _key suffix for high-performance joins.
  • Metadata Tracking: Every record includes dwh_load_date for auditability.

2. ETL & Orchestration

The data flow is managed via modular Stored Procedures (load_bronze, load_silver), allowing for:

  • Incremental or full refresh strategies.
  • Standardized error handling and logging.
  • Transformation logic (Mapping, Casting, and Formatting).

3. Data Catalog (Gold Layer)

The business layer is fully documented to empower analysts:

  • dim_customers: Enriched customer demographics and geography.
  • dim_products: Full product lifecycle and categorization.
  • fact_sales: Granular transactional data linked to dimensions.

📈 Advanced Analytics & Business Intelligence

The Gold Layer is powered by a suite of advanced SQL scripts designed to extract actionable KPIs. This goes beyond basic CRUD operations, implementing sophisticated analytical patterns:

1. Performance Analytics (YoY / MoM)

Using Window Functions (LAG, LEAD), the system compares current sales against previous periods to detect growth or decline at the product and category levels.

  • Logic: current_sales - LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year)

2. Customer Segmentation Logic

A dynamic segmentation engine classifies customers based on their behavioral data:

  • VIP: >12 months history AND >€5,000 spend.
  • Regular: >12 months history AND ≤€5,000 spend.
  • New: <12 months history.

3. Cumulative & Part-to-Whole Analysis

  • Running Totals: Track sales velocity over time.
  • Percentage of Total: Identifies high-impact categories (e.g., identifying "Bikes" as the primary revenue driver).

4. Comprehensive Reporting Views

I created consolidated views for Customers and Products that aggregate:

  • Recency: Months since the last transaction.
  • Lifespan: Total duration of the customer relationship.
  • Average Monthly Spend/Revenue: Normalized performance metrics.

📂 Project Structure

├── advanced/
│   └──advanced_data_analysis.sql
├── datasets/
│   └──  source_crm/          
│   └── source_erp/          
├── docs/
│   └──  Architecture-DWH.drawio.pdf
│   └── Data Flow Diagram.pdf
│   └── Data Model Star Schema.pdf
│   └── Integration_model.pdf
│   └── data_catalog.md
│   └── rules_naming_conventions.md
├── scripts/
│   ├── bronze/
|       └── ddl_bronze.sql
|       └── proc_load_bronze.sql
│   ├── gold/
|       └── ddl_gold.sql
│   ├── silver/
|       └── ddl_silver.sql
|       └── proc_load_silver.sql
|   └── init_database.sql
├── tests/
│   └──crm_cust_info__silver_transform.sql
│   └── crm_prd_info__silver_transform.sql
│   └── crm_sales_detail__silver_transform.sql
│   └── erp_cust_az12__silver_transform.sql
│   └── erp_loc_a101__silver_transform.sql
│   └── erp_px_cat_g1v2__silver_transform.sql
│   └── test_gold_layer.sql

🚀 How to Run

Clone this repository.

  1. Execute the init_database.sql script to set up the environment.

  2. Run the Stored Procedures in sequence: Bronze -> Silver -> Gold.

👩‍💻 About Me

I am a Senior Data Engineer focused on building scalable, well-documented data infrastructures. My goal is to transform messy data into strategic business assets.

Click here to see my LinkedIn Profile.

Click here to see my Upwork Profile.

About

Buiding a modern data warehouse using SQL Server, including ETL processes, data modeling, and analytics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages