Data Warehouse for Pharmaceutical Quality Management System Analytics using Star Schema
Problem β’ Solution β’ Architecture β’ Installation β’ Use Cases β’ Contact
A Star Schema data warehouse designed for analytical decision support in pharmaceutical Quality Management System (QMS), implemented in PostgreSQL following Kimball dimensional modeling principles.
Pharmaceutical QMS analytics require traceability of events like deviations and NCRs across dimensions such as time, product, and process. This warehouse enables trend analysis and pattern detection without operational workflow enforcement.
Disclaimer: The following metrics are based on industry benchmarks and case studies from pharmaceutical QMS implementations. Actual results vary significantly based on:
- Current quality system maturity
- Data completeness and accuracy
- Organizational commitment to data-driven decisions
| Metric | Industry Average | With OLAP System* | Improvement |
|---|---|---|---|
| Deviation Detection | 2-4 weeks | 2-5 days | ~70-90% faster |
*Results from published case studies in pharmaceutical quality management
What This Means:
- Not a guarantee, but a realistic target
- Requires clean data and proper implementation
- ROI typically realized over 6-12 months
-
Delayed Issue Detection
Quality events like equipment drifts or process failures often remain unnoticed for months, leading to batch rejections. -
Fragmented Data Analysis
Manual aggregation from scattered sources delays root cause identification and trend spotting. -
Limited Traceability
Historical changes in products or processes are not easily tracked, complicating compliance audits. -
Data-Driven Decisions
Management lacks reliable metrics for resource allocation and risk prioritization.
A OLAP-focused warehouse that:
β
Supports early pattern detection through dimensional analysis.
β
Enables historical traceability with SCD Type 2.
β
Provides reusable dimensions for cross-event analysis.
β
Scales for educational to production use without redesign.
βββββββββββββββββββββββ
β dim_time β
β (Calendar Days) β
β βͺ id_time (PK) β
β βͺ full_date (UK) β
β βͺ year, quarter β
β βͺ month, week β
ββββββββββββ¬βββββββββββ
β
β FK: id_time
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FACT: fact_deviation β
β (Quality Deviation Events) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Primary Key: id_deviation (surrogate key) β
β Business Key: deviation_number (DEV-YYYY-NNN) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Foreign Keys (6): β
β βͺ id_time β dim_time β
β βͺ id_product β dim_product β
β βͺ id_equipment β dim_equipment (NULL allowed) β
β βͺ id_organization β dim_organization β
β βͺ id_process_step β dim_process_step β
β βͺ id_root_cause β dim_root_cause (NULL allowed) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Measures (Aggregatable): β
β βͺ affected_quantity (units impacted) β
β βͺ rejected_quantity (units scrapped) β
β βͺ rework_quantity (units reworked) β
β βͺ financial_impact (USD) β
β βͺ downtime_minutes (production stoppage) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Attributes (Descriptive): β
β βͺ severity_level (Critical/Major/Minor) β
β βͺ detection_method (where caught) β
β βͺ status (Open/Under Investigation/Closed) β
β βͺ is_reportable (FDA/EMA regulatory flag) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β β
β β β β β
ββββββββββ ββββββββ ββββββββ ββββββββ ββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β dim_ β β dim_ β β dim_ β β dim_ β β dim_ β
β product β βequipmentβ β org β β process β β root β
β β β β β β β step β β cause β
βββββββββββ€ βββββββββββ€ βββββββββββ€ βββββββββββ€ βββββββββββ€
βProduct β βEquipmentβ βOrg Unitsβ βProcess β β6M Cause β
βCatalog β βInventoryβ βHierarchyβ βWorkflow β βTaxonomy β
β β β β β β β β β β
ββͺ SKU β ββͺ Calibr.β ββͺ Sites β ββͺ Critic.β ββͺ Man β
ββͺ Formulaβ ββͺ Status β ββͺ Depts β ββͺ Valid. β ββͺ Machineβ
ββͺ Versionβ ββͺ Type β ββͺ Parent β ββͺ Order β ββͺ Method β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
SCD Type 2 SCD Type 2 SCD Type 2 SCD Type 2 SCD Type 2
(History) (History) (History) (History) (History)
Schema Characteristics:
- Design Pattern: Star Schema (Kimball)
- Fact Grain: One row = One deviation event
- Dimension Type: SCD Type 2 (full history tracking)
- Relationships: 6 many-to-one from fact to dimensions
- NULL FK: Equipment and Root Cause (business logic allows)
| Table | Purpose | Key Features |
|---|---|---|
| dim_time | Calendar context | Day/Week/Month/Quarter/Year/Fiscal periods |
| dim_organization | Org hierarchy | Sites, departments, parent-child structure |
| dim_product | Product catalog | Formulations, regulatory status, versions |
| dim_equipment | Equipment inventory | Calibration tracking, operational status |
| dim_process_step | Process definitions | Criticality levels, validation requirements |
| dim_root_cause | 6M taxonomy | Man/Machine/Material/Method/Measurement/Environment |
fact_deviation: One row per quality deviation event
Measures:
affected_quantity,rejected_quantity,rework_quantityfinancial_impact(USD)downtime_minutes
Attributes:
severity_level: Critical / Major / Minordetection_method: In_Process / Final_Inspection / Customer_Complaint / Stability_Testing / Audit_Findingstatus: Open / Under_Investigation / CAPA_Required / Closedis_reportable: FDA/EMA regulatory flag
Business Logic Constraints:
-- Quantity validation
CONSTRAINT chk_quantities CHECK (
rejected_quantity + rework_quantity <= affected_quantity
)
-- Date sequence validation
CONSTRAINT chk_dates_sequence CHECK (
reported_date >= deviation_date AND
investigation_completed_date >= reported_date AND
closed_date >= reported_date
)
-- Status-date correlation
CONSTRAINT chk_closed_logic CHECK (
(status = 'Closed' AND closed_date IS NOT NULL) OR
(status != 'Closed' AND closed_date IS NULL)
)Indexes:
- 6 Foreign key indexes (fast joins)
- 4 Filter indexes (severity, status, batch)
- 2 Composite indexes (common queries)
21 CFR Part 11 Ready:
- Complete audit trail via SCD Type 2
- Historical snapshots preserved
- Traceability to source systems
ICH E2A Compliant:
is_reportableflag for adverse events- Severity classification
- Timeline tracking
- PostgreSQL 12+
- Git
-
Clone Repository
git clone https://github.com/hadiii1i/pharma-qms-olap cd pharma-qms-olap -
Create Database
createdb pharma_qms
-
Create Schema
psql -d pharma_qms -c "CREATE SCHEMA qms;" -
Run DDL Script
psql -d pharma_qms -f schemas/qms_complete_schema.sql
-
Verify
psql -d pharma_qms -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'qms';"Expected: 7 tables
python scripts/generate_dim_time.py --start-year 2020 --end-year 2025 --output data/seed/dim_time.sql
psql -d pharma_qms -f data/seed/dim_time.sqlSolution: Check PostgreSQL status and start if needed.
Solution: Drop and recreate schema if exists.
Solution: Load dimensions before facts.
Solution: Use EXPLAIN ANALYZE and update statistics.
Focus on technical contributions:
- Sample data generators
- Query library
- ETL scripts
- Documentation
Process: Fork, branch, commit, PR.
MIT License - See LICENSE file.
Hadi Yabari
Data Architect | QMS Analytics Specialist
GitHub Discussions | Report Bug | Request Feature
- 6 dimensions with SCD Type 2
- 1 fact table with constraints
- Indexes and documentation
- Additional fact tables (NCR, CAPA, audit)
- Python ETL framework
- Analytical query library
Focused on analytical correctness and traceability ππ