This project focuses on improving the reliability of business reporting by performing data profiling, validation, and KPI standardization on a real-world transactional dataset. The objective is to identify missing values, inconsistencies, and reporting gaps that can lead to misleading business insights.
- Profile structured transactional data to assess completeness and structure
- Identify missing values and data inconsistencies
- Validate data using business rules
- Detect reporting gaps caused by invalid data
- Standardize KPI definitions for consistent performance tracking
The dataset contains retail transaction records with the following key fields:
- InvoiceNo
- StockCode
- Description
- Quantity
- InvoiceDate
- UnitPrice
- CustomerID
- Country
The data includes real-world quality issues such as missing values, returns, and cancelled transactions.
Initial profiling was conducted to understand data structure, data types, and missing values. Key findings included missing customer identifiers and incomplete product descriptions, highlighting the need for validation before reporting.
Business rules were applied to validate records:
- Quantity must be greater than zero
- UnitPrice must be non-negative
- Invoices starting with "C" indicate cancellations and were excluded
Invalid records were identified as major contributors to reporting inaccuracies.
Revenue was first calculated without validation, resulting in misleading metrics due to the inclusion of returns and cancelled transactions. This demonstrated a reporting gap where business performance was understated.
KPIs were recalculated using only validated sales records:
- Valid Sales: Quantity > 0, UnitPrice β₯ 0, non-cancelled invoices
- Revenue: Calculated from validated transactions only
- Active Customers: Unique customers with valid purchases
This ensured reliable and consistent business reporting.
- Identified data quality issues impacting reporting accuracy
- Quantified the impact of invalid data on revenue metrics
- Improved reliability of business reports through KPI standardization
- Enabled insight-driven decision making
- Python (Pandas)
- Google Colab
- Exploratory Data Analysis
- Business Rule Validation
- KPI Definition & Standardization
This project demonstrates the importance of data profiling and validation before business reporting. By standardizing data definitions and KPIs, reporting accuracy and consistency were significantly improved.