Stack: Python · SQL · MySQL · Power BI · pandas · SQLAlchemy · ETL
Author: Mason Phung | Analysis Report | BI Report
AtliQ, a B2B hardware & peripherals manufacturer operating across India, reported declining sales with no clear visibility into root causes. The Sales Director lacked a reliable way to track where revenue was being lost across regions, products, and customers.
Goal: Identify the structural bottlenecks driving the decline and deliver a live dashboard so management can act without relying on manual reporting.
| Metric | Finding |
|---|---|
| Unprofitable transactions | 46% of all orders yield negative margins |
| Revenue concentration | 9 of 38 customers drive 80% of total revenue |
| Single-customer dependency | Cus006 alone accounts for 42% of all revenue |
| Central Zone YoY drop | -23.69% H1 2019 → H1 2020 |
| North Zone cash loss | -INR 14.1M in H1 2020 |
| Root cause (Central Zone) | 2 churned accounts =50% of the region's revenue gap |
| Customer retention | 100% retention — but order volumes are shrinking |
| Action | Target | Timeline |
|---|---|---|
| Deploy live Power BI dashboard | Cut management decision time by50% | 3–6 months |
| VIP account program for top-9 customers | Increase key account revenue by10% | 12 months |
| Migrate low-revenue accounts to low-touch model | Reduce operational costs by40% | 12 months |
A class-based Python ETL that runs end-to-end with a single command:
- Extract — pulls 4 relational tables from MySQL via SQLAlchemy
- Transform — normalizes USD→INR currency, removes invalid records, denormalizes into a master analytics table
- Load — writes the cleaned table back to MySQL
- Structured logging to file, environment-variable-based credentials (no hardcoded passwords)
Step by step data cleaning (manual version) can be found in notebooks/p1_init_data_cleaning.ipynb.
Advanced SQL executed in-notebook via ipython-sql:
- Window functions (
FIRST_VALUE,PARTITION BY,LAG) for cohort and MoM trend analysis - CTEs for multi-step product/customer segmentation
- 3-month rolling average + momentum delta to detect the 2019 early warning signal
- Pareto 80/20 customer concentration analysis
- YTD regional variance drill-down to isolate the Central Zone collapse
Interactive dashboard connected live to MySQL, replacing manual Excel reporting. Tracks revenue trends, regional performance, and customer concentration in real time. Provides AtLiQ Sales team a real-time KPI monitor tool and overcomes the former slow manual decision-making/
project-atliq/
├── src/
│ ├── ETLpipeline.py # Class-based ETL pipeline
│ └── createMySQLEngine.py # SQLAlchemy engine factory (env-var credentials)
├── notebooks/
│ ├── p0_data_overview.ipynb # ERD review & raw data profiling
│ ├── p1_init_data_cleaning.ipynb # SQL-based data cleaning
│ └── p2_sales_data_analysis.ipynb # Full diagnostic SQL analysis
├── powerbi/
│ ├── atliq_dashboard.pbix # PowerBI dashboard
├── dump/
│ └── db_dump.sql # Source database dump
├── .env.example # Credential template
├── requirements.txt
└── README.md
1. Clone and install dependencies
git clone <repo-url>
cd project-atliq
pip install -r requirements.txt2. Configure credentials
cp .env.example .env
# Edit .env with your MySQL credentials3. Load the database
mysql -u root -p < dump/db_dump.sql4. Run the ETL pipeline
cd src
python ETLpipeline.py5. Open notebooks in order: p0 → p1 → p2
| Category | Tool |
|---|---|
| Language | Python 3, SQL |
| Database | MySQL (local), DBeaver |
| Python libraries | pandas, SQLAlchemy, python-dotenv, ipython-sql |
| Analysis | Advanced SQL — CTEs, window functions, self-joins |
| Visualization | Power BI, Tableau |
| IDE | VS Code + Jupyter |
Problem statement and dataset from codebasics. Analysis, ETL design, and dashboard by Mason Phung.
Last updated: 2026/04 (ver 3.2 — refactored data cleaning to SQL, introduced class-based ETL pipeline with structured logging and secure credential management)
