SQL Server ETL Pipeline & Automated Reporting Dashboard

Stack: T-SQL · SQL Server / SSMS · Python · Excel (PowerQuery / PivotTables) · Power BI

What This Project Does

This project builds a complete data pipeline starting from raw CSV files and ending with automated reports. Here's the short version:

Raw CSVs (accounts, transactions, fees) are ingested by a Python ETL script
The ETL cleanses the data (removes duplicates, handles nulls, validates values)
Clean data is loaded into a SQL Server database with enforced schema constraints
SQL stored procedures and views calculate monthly KPIs (balances, variances, fees)
Excel connects via PowerQuery for live-refresh pivot reports
Power BI provides a visual dashboard for trend and variance analysis
A Python script auto-generates a PowerPoint summary, cutting manual reporting by ~80%

Project Structure

SQL-Server-ETL-Pipeline/
│
├── data/
│   ├── raw/                    # Original CSV files (source of truth)
│   │   ├── accounts.csv        # 38 rows (incl. duplicates + nulls for testing)
│   │   ├── transactions.csv    # 185 rows, Jan–Dec 2024
│   │   └── fees.csv            # 132 rows, monthly and ad-hoc fees
│   └── processed/              # Cleaned CSVs (generated, not in repo)
│
├── sql/
│   ├── 01_create_database.sql  # Creates FinancialReportingDB
│   ├── 02_create_tables.sql    # All tables with constraints + FK relationships
│   ├── 03_stored_procedures.sql # Monthly KPI rollup procedures
│   ├── 04_views.sql            # Reporting views (used by Excel + Power BI)
│   └── 05_sample_queries.sql   # Validation and spot-check queries
│
├── python/
│   ├── etl_pipeline.py         # Main ETL script: CSV → SQL Server
│   ├── generate_report.py      # Auto-generates monthly PowerPoint report
│   └── requirements.txt        # Python dependencies
│
├── docs/
│   ├── data_dictionary.md      # Every table, column, constraint documented
│   ├── data_flow.md            # End-to-end data flow and design decisions
│   ├── excel_powerquery_setup.md # How to connect Excel + M code for queries
│   └── powerbi_setup.md        # Power BI dashboard setup + DAX measures
│
├── output/                     # Generated PowerPoint reports (not committed)
└── README.md

Technologies Used

Technology	Purpose
SQL Server Express / SSMS	Database engine, query development
T-SQL	DDL (tables, constraints), stored procedures, views
Python 3.11	ETL pipeline, PowerPoint generation
pandas	Data cleansing and transformation
pyodbc	SQL Server connectivity from Python
python-pptx	Automated PowerPoint report generation
Excel (PowerQuery)	Live-refresh pivot reports connected to SQL Server
Power BI Desktop	Interactive dashboard for trend/variance visualization

Setup Instructions

Prerequisites

SQL Server Express (free): download from Microsoft
SQL Server Management Studio (SSMS)
Python 3.10+
ODBC Driver 17 for SQL Server
Microsoft Excel 2016+
Power BI Desktop (free from Microsoft Store)

1. Set Up the Database

Open SSMS, connect to localhost\SQLEXPRESS, and run the SQL scripts in order:

-- In SSMS, open and run each script:
sql/01_create_database.sql    -- creates the database
sql/02_create_tables.sql      -- creates tables with constraints
sql/03_stored_procedures.sql  -- creates KPI stored procedures
sql/04_views.sql              -- creates reporting views

2. Install Python Dependencies

pip install -r python/requirements.txt

3. Run the ETL Pipeline

python python/etl_pipeline.py

Expected output:

===================================================
  ETL Pipeline - FinancialReportingDB
  Started: 2025-02-XX XX:XX:XX
===================================================

Connecting to SQL Server...
Connected!

--- Loading Accounts ---
  Raw rows: 38
  Exact duplicate rows removed: 5
  Duplicate account_ids removed (kept first): 0
  Rows skipped (missing required fields): 3
  Clean rows ready to insert: 30
  Result: 30 inserted, 3 skipped (cleansing), 0 failed (DB error)

--- Loading Transactions ---
  ...

--- Loading Fees ---
  ...

After the ETL runs, you can verify with:

-- In SSMS:
EXEC dbo.usp_GetMonthlyKPISummary @Year = 2024, @Month = 12;
SELECT * FROM dbo.vw_ETLRunHistory;

4. Generate the PowerPoint Report

python python/generate_report.py

Output will be saved to output/monthly_report_2024_12.pptx.

To change the report period, edit these lines in generate_report.py:

REPORT_YEAR  = 2024
REPORT_MONTH = 12

5. Connect Excel and Power BI

See the docs folder:

Key Design Decisions

Why store procedures over inline SQL in Python? I wanted to keep business logic in the database layer so both Excel and Python report scripts produce identical results. If the KPI formula changes, I update one stored procedure and both reports automatically use the new logic.

Why allow nullable balance_after in Transactions? The source data doesn't always include this field. Rather than reject those rows, I accept the transactions and note that balance can be reconstructed as a running total from the amount column if needed.

Why row-by-row insert instead of bulk insert? For a project this size (< 200 rows) it doesn't matter. In a real production system I'd use BULK INSERT or the pandas to_sql() method with a SQLAlchemy engine for much better performance.

Sample Queries

-- Monthly KPI summary for a specific month
EXEC dbo.usp_GetMonthlyKPISummary @Year = 2024, @Month = 6;

-- Month-over-month balance variance for 2024
EXEC dbo.usp_GetBalanceVariance @Year = 2024;

-- Accounts currently in overdraft
SELECT * FROM dbo.vw_AccountBalanceSnapshot WHERE is_overdraft = 1;

-- Total fee revenue by type (2024)
SELECT fee_type, SUM(collected_amount) AS total_collected
FROM dbo.vw_MonthlyFeeSummary
WHERE fee_year = 2024
GROUP BY fee_type
ORDER BY total_collected DESC;

Data Quality Notes

The raw CSV data intentionally includes some quality issues to demonstrate the ETL cleansing process:

Issue	Count	How Handled
Duplicate account rows	5	Removed by ETL
Accounts missing required fields	3	Excluded, logged
Accounts with null optional fields	2	Defaulted to 0
Overdraft accounts (negative balance)	2	Valid — allowed by schema

All cleansing decisions are logged to dbo.ETL_Log and documented in docs/data_flow.md.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/raw		data/raw
docs		docs
python		python
sql		sql
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Server ETL Pipeline & Automated Reporting Dashboard

What This Project Does

Project Structure

Technologies Used

Setup Instructions

Prerequisites

1. Set Up the Database

2. Install Python Dependencies

3. Run the ETL Pipeline

4. Generate the PowerPoint Report

5. Connect Excel and Power BI

Key Design Decisions

Sample Queries

Data Quality Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SQL Server ETL Pipeline & Automated Reporting Dashboard

What This Project Does

Project Structure

Technologies Used

Setup Instructions

Prerequisites

1. Set Up the Database

2. Install Python Dependencies

3. Run the ETL Pipeline

4. Generate the PowerPoint Report

5. Connect Excel and Power BI

Key Design Decisions

Sample Queries

Data Quality Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages