This project demonstrates the end-to-end design and implementation of a Modern Data Warehouse for AeroCycle Global, a hypothetical bicycle manufacturer. The goal was to consolidate fragmented data from CRM and ERP systems into a unified analytical hub using the Medallion Architecture.
To see the Project Plan click here
The pipeline follows the Medallion framework to ensure data quality and lineage:
- Bronze Layer: Raw data ingestion from CRM (Sales, Products, Customers) and ERP (Locations, Categories) sources.
- Silver Layer: Data cleansing, deduplication, and schema standardization.
- Gold Layer: Business-ready dimensional model (Star Schema) optimized for BI and Reporting.
To maintain a production-grade environment, I implemented a strict naming policy:
- Snake_case for all objects.
- Layer Prefixes:
crm_,erp_for Bronze/Silver;dim_,fact_,agg_for Gold. - Surrogate Keys: All dimensions use a
_keysuffix for high-performance joins. - Metadata Tracking: Every record includes
dwh_load_datefor auditability.
The data flow is managed via modular Stored Procedures (load_bronze, load_silver), allowing for:
- Incremental or full refresh strategies.
- Standardized error handling and logging.
- Transformation logic (Mapping, Casting, and Formatting).
The business layer is fully documented to empower analysts:
dim_customers: Enriched customer demographics and geography.dim_products: Full product lifecycle and categorization.fact_sales: Granular transactional data linked to dimensions.
The Gold Layer is powered by a suite of advanced SQL scripts designed to extract actionable KPIs. This goes beyond basic CRUD operations, implementing sophisticated analytical patterns:
Using Window Functions (LAG, LEAD), the system compares current sales against previous periods to detect growth or decline at the product and category levels.
- Logic:
current_sales - LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year)
A dynamic segmentation engine classifies customers based on their behavioral data:
- VIP: >12 months history AND >€5,000 spend.
- Regular: >12 months history AND ≤€5,000 spend.
- New: <12 months history.
- Running Totals: Track sales velocity over time.
- Percentage of Total: Identifies high-impact categories (e.g., identifying "Bikes" as the primary revenue driver).
I created consolidated views for Customers and Products that aggregate:
- Recency: Months since the last transaction.
- Lifespan: Total duration of the customer relationship.
- Average Monthly Spend/Revenue: Normalized performance metrics.
├── advanced/
│ └──advanced_data_analysis.sql
├── datasets/
│ └── source_crm/
│ └── source_erp/
├── docs/
│ └── Architecture-DWH.drawio.pdf
│ └── Data Flow Diagram.pdf
│ └── Data Model Star Schema.pdf
│ └── Integration_model.pdf
│ └── data_catalog.md
│ └── rules_naming_conventions.md
├── scripts/
│ ├── bronze/
| └── ddl_bronze.sql
| └── proc_load_bronze.sql
│ ├── gold/
| └── ddl_gold.sql
│ ├── silver/
| └── ddl_silver.sql
| └── proc_load_silver.sql
| └── init_database.sql
├── tests/
│ └──crm_cust_info__silver_transform.sql
│ └── crm_prd_info__silver_transform.sql
│ └── crm_sales_detail__silver_transform.sql
│ └── erp_cust_az12__silver_transform.sql
│ └── erp_loc_a101__silver_transform.sql
│ └── erp_px_cat_g1v2__silver_transform.sql
│ └── test_gold_layer.sql
Clone this repository.
-
Execute the init_database.sql script to set up the environment.
-
Run the Stored Procedures in sequence: Bronze -> Silver -> Gold.
I am a Senior Data Engineer focused on building scalable, well-documented data infrastructures. My goal is to transform messy data into strategic business assets.
Click here to see my LinkedIn Profile.
Click here to see my Upwork Profile.