Skip to content

Latest commit

 

History

History
143 lines (116 loc) · 8.24 KB

File metadata and controls

143 lines (116 loc) · 8.24 KB

SQL Analytics Portfolio

A collection of advanced SQL queries solving real business problems against a retail & e-commerce dataset. Each query demonstrates a specific analytical technique — from revenue cohort analysis to customer lifetime value modeling — with clear business context, approach, and interpretation.

About This Project

This portfolio demonstrates production-level SQL skills used in finance analytics, operational reporting, and cloud data environments. Every query is written to be readable, well-commented, and representative of the kind of ad-hoc and recurring analysis expected of a Data Analyst at a Fortune 500 company.

Dataset: Brazilian E-Commerce Public Dataset by Olist — 100K+ orders from 2016–2018 across multiple sellers, products, and geographies.

Why this dataset: It mirrors real enterprise data — normalized across multiple tables (orders, items, payments, customers, sellers, products, reviews), includes timestamps for time-series analysis, and has enough volume and messiness to require real analytical thinking.

Skills Demonstrated

Technique Queries
Window Functions (ROW_NUMBER, RANK, LAG, LEAD, NTILE) Q1, Q3, Q5, Q8
Common Table Expressions (CTEs) All queries
Cohort Analysis Q2
Funnel / Conversion Analysis Q6
Year-over-Year & Period Comparisons Q3
Customer Segmentation (RFM) Q5
Revenue & Margin Analysis Q1, Q3, Q7
Self-Joins Q4
Running Totals & Moving Averages Q8
Data Quality Validation Q10
Seller / Vendor Performance Scoring Q9
Customer Lifetime Value (CLV) Q7

Repository Structure

sql-analytics-portfolio/
├── README.md
├── queries/
│   ├── q01_revenue_kpi_dashboard.sql
│   ├── q02_monthly_customer_cohort.sql
│   ├── q03_yoy_revenue_growth.sql
│   ├── q04_repeat_purchase_analysis.sql
│   ├── q05_rfm_customer_segmentation.sql
│   ├── q06_order_funnel_analysis.sql
│   ├── q07_customer_lifetime_value.sql
│   ├── q08_rolling_metrics_trends.sql
│   ├── q09_seller_performance_scorecard.sql
│   └── q10_data_quality_audit.sql
├── data/
│   └── schema_diagram.md
├── results/
│   └── sample_outputs.md
└── docs/
    └── setup_guide.md

Schema Overview

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  customers   │     │     orders       │     │   order_items    │
│──────────────│     │──────────────────│     │──────────────────│
│ customer_id  │────▶│ order_id         │◀────│ order_id         │
│ city         │     │ customer_id      │     │ product_id       │
│ state        │     │ order_status     │     │ seller_id        │
│ zip_code     │     │ purchase_date    │     │ price            │
└──────────────┘     │ approved_date    │     │ freight_value    │
                     │ delivered_date   │     └──────────────────┘
                     │ estimated_date   │              │
                     └──────────────────┘              │
                            │                          ▼
                     ┌──────────────────┐     ┌──────────────────┐
                     │  order_payments  │     │    products      │
                     │──────────────────│     │──────────────────│
                     │ order_id         │     │ product_id       │
                     │ payment_type     │     │ category_name    │
                     │ payment_value    │     │ weight_g         │
                     └──────────────────┘     │ length_cm        │
                            │                 └──────────────────┘
                     ┌──────────────────┐              │
                     │  order_reviews   │     ┌──────────────────┐
                     │──────────────────│     │    sellers       │
                     │ order_id         │     │──────────────────│
                     │ review_score     │     │ seller_id        │
                     │ review_date      │     │ city             │
                     └──────────────────┘     │ state            │
                                              └──────────────────┘

How to Run

These queries are written in standard SQL compatible with:

  • PostgreSQL (primary target)
  • Snowflake (minor syntax adjustments noted in comments)
  • BigQuery (minor syntax adjustments noted in comments)
  • DuckDB (fully compatible — great for local testing)

See docs/setup_guide.md for step-by-step instructions on loading the dataset locally with DuckDB or PostgreSQL.

Query Index

Q1 — Executive Revenue KPI Dashboard

Business Question: What are the key revenue metrics (total revenue, AOV, orders, revenue per customer) by month, and how do they trend? Techniques: CTEs, aggregation, window functions (LAG for MoM change)

Q2 — Monthly Customer Cohort Retention

Business Question: Of customers acquired in each month, what percentage return to purchase in subsequent months? Techniques: Cohort assignment, self-join, conditional aggregation, retention curve

Q3 — Year-over-Year Revenue Growth by Category

Business Question: Which product categories are growing or declining year-over-year, and by how much? Techniques: CTEs, LAG window function, YoY percentage calculation, ranking

Q4 — Repeat Purchase & Cross-Sell Analysis

Business Question: What percentage of customers make repeat purchases, and which product categories are most commonly bought together? Techniques: Self-join, DISTINCT counting, cross-tabulation

Q5 — RFM Customer Segmentation

Business Question: How can we segment customers into actionable groups based on recency, frequency, and monetary value? Techniques: NTILE window function, CASE-based segmentation, multi-CTE pipeline

Q6 — Order Funnel Conversion Analysis

Business Question: Where are orders dropping off in the fulfillment pipeline (approved → shipped → delivered)? Techniques: Funnel stage calculation, date diffs, conditional aggregation, conversion rates

Q7 — Customer Lifetime Value (CLV) Estimation

Business Question: What is the estimated lifetime value of customers by acquisition cohort and segment? Techniques: Cohort analysis, revenue per customer over time, cumulative sums

Q8 — Rolling Revenue & Moving Average Trends

Business Question: What does the 7-day and 30-day moving average revenue look like, and where are anomalies? Techniques: Window functions (AVG OVER with ROWS BETWEEN), running totals, trend detection

Q9 — Seller Performance Scorecard

Business Question: Which sellers are top performers across revenue, delivery speed, and customer satisfaction? Techniques: Multi-metric scoring, NTILE ranking, composite score calculation

Q10 — Data Quality Audit

Business Question: How clean is this dataset? What are the null rates, orphan records, and logical inconsistencies? Techniques: NULL analysis, referential integrity checks, date logic validation, UNION ALL reporting

Author

Mayur Gudala — Data Analyst | MS Business Analytics (4.0 GPA), Concordia University Wisconsin