Skip to content

hadiii1i/QualityLabs-Analytics_DB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏭 Pharmaceutical QMS OLAP Analytics

PostgreSQL Star Schema License

Data Warehouse for Pharmaceutical Quality Management System Analytics using Star Schema

Problem β€’ Solution β€’ Architecture β€’ Installation β€’ Use Cases β€’ Contact


πŸ“Š Project Overview

A Star Schema data warehouse designed for analytical decision support in pharmaceutical Quality Management System (QMS), implemented in PostgreSQL following Kimball dimensional modeling principles.

What This Addresses

Pharmaceutical QMS analytics require traceability of events like deviations and NCRs across dimensions such as time, product, and process. This warehouse enables trend analysis and pattern detection without operational workflow enforcement.

πŸ“ˆ Potential Business Impact

Disclaimer: The following metrics are based on industry benchmarks and case studies from pharmaceutical QMS implementations. Actual results vary significantly based on:

  • Current quality system maturity
  • Data completeness and accuracy
  • Organizational commitment to data-driven decisions

Industry Benchmarks

Metric Industry Average With OLAP System* Improvement
Deviation Detection 2-4 weeks 2-5 days ~70-90% faster

*Results from published case studies in pharmaceutical quality management

What This Means:

  • Not a guarantee, but a realistic target
  • Requires clean data and proper implementation
  • ROI typically realized over 6-12 months

🎯 The Problem

Core Challenges

  1. Delayed Issue Detection
    Quality events like equipment drifts or process failures often remain unnoticed for months, leading to batch rejections.

  2. Fragmented Data Analysis
    Manual aggregation from scattered sources delays root cause identification and trend spotting.

  3. Limited Traceability
    Historical changes in products or processes are not easily tracked, complicating compliance audits.

  4. Data-Driven Decisions
    Management lacks reliable metrics for resource allocation and risk prioritization.


βœ… The Solution

Key Features

A OLAP-focused warehouse that:

βœ… Supports early pattern detection through dimensional analysis.
βœ… Enables historical traceability with SCD Type 2.
βœ… Provides reusable dimensions for cross-event analysis.
βœ… Scales for educational to production use without redesign.


πŸ—οΈ Architecture

Star Schema Structure

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     dim_time        β”‚
β”‚   (Calendar Days)   β”‚
β”‚  β–ͺ id_time (PK)     β”‚
β”‚  β–ͺ full_date (UK)   β”‚
β”‚  β–ͺ year, quarter    β”‚
β”‚  β–ͺ month, week      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”‚ FK: id_time
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FACT: fact_deviation                      β”‚
β”‚                  (Quality Deviation Events)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Primary Key:   id_deviation (surrogate key)                 β”‚
β”‚  Business Key:  deviation_number (DEV-YYYY-NNN)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Foreign Keys (6):                                           β”‚
β”‚    β–ͺ id_time              β†’ dim_time                         β”‚
β”‚    β–ͺ id_product           β†’ dim_product                      β”‚
β”‚    β–ͺ id_equipment         β†’ dim_equipment (NULL allowed)     β”‚
β”‚    β–ͺ id_organization      β†’ dim_organization                 β”‚
β”‚    β–ͺ id_process_step      β†’ dim_process_step                 β”‚
β”‚    β–ͺ id_root_cause        β†’ dim_root_cause (NULL allowed)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Measures (Aggregatable):                                    β”‚
β”‚    β–ͺ affected_quantity    (units impacted)                   β”‚
β”‚    β–ͺ rejected_quantity    (units scrapped)                   β”‚
β”‚    β–ͺ rework_quantity      (units reworked)                   β”‚
β”‚    β–ͺ financial_impact     (USD)                              β”‚
β”‚    β–ͺ downtime_minutes     (production stoppage)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Attributes (Descriptive):                                   β”‚
β”‚    β–ͺ severity_level       (Critical/Major/Minor)             β”‚
β”‚    β–ͺ detection_method     (where caught)                     β”‚
β”‚    β–ͺ status               (Open/Under Investigation/Closed)  β”‚
β”‚    β–ͺ is_reportable        (FDA/EMA regulatory flag)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚           β”‚           β”‚           β”‚           β”‚
           β”‚           β”‚           β”‚           β”‚           β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”˜
  β”‚             β”‚           β”‚           β”‚           β”‚
  β–Ό             β–Ό           β–Ό           β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  dim_   β”‚ β”‚  dim_   β”‚ β”‚  dim_   β”‚ β”‚  dim_   β”‚ β”‚  dim_   β”‚
β”‚ product β”‚ β”‚equipmentβ”‚ β”‚  org    β”‚ β”‚ process β”‚ β”‚  root   β”‚
β”‚         β”‚ β”‚         β”‚ β”‚         β”‚ β”‚  step   β”‚ β”‚  cause  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Product  β”‚ β”‚Equipmentβ”‚ β”‚Org Unitsβ”‚ β”‚Process  β”‚ β”‚6M Cause β”‚
β”‚Catalog  β”‚ β”‚Inventoryβ”‚ β”‚Hierarchyβ”‚ β”‚Workflow β”‚ β”‚Taxonomy β”‚
β”‚         β”‚ β”‚         β”‚ β”‚         β”‚ β”‚         β”‚ β”‚         β”‚
β”‚β–ͺ SKU    β”‚ β”‚β–ͺ Calibr.β”‚ β”‚β–ͺ Sites  β”‚ β”‚β–ͺ Critic.β”‚ β”‚β–ͺ Man    β”‚
β”‚β–ͺ Formulaβ”‚ β”‚β–ͺ Status β”‚ β”‚β–ͺ Depts  β”‚ β”‚β–ͺ Valid. β”‚ β”‚β–ͺ Machineβ”‚
β”‚β–ͺ Versionβ”‚ β”‚β–ͺ Type   β”‚ β”‚β–ͺ Parent β”‚ β”‚β–ͺ Order  β”‚ β”‚β–ͺ Method β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 SCD Type 2   SCD Type 2   SCD Type 2   SCD Type 2   SCD Type 2
 (History)    (History)    (History)    (History)    (History)

Schema Characteristics:

  • Design Pattern: Star Schema (Kimball)
  • Fact Grain: One row = One deviation event
  • Dimension Type: SCD Type 2 (full history tracking)
  • Relationships: 6 many-to-one from fact to dimensions
  • NULL FK: Equipment and Root Cause (business logic allows)

Database Structure

Dimension Tables (SCD Type 2)

Table Purpose Key Features
dim_time Calendar context Day/Week/Month/Quarter/Year/Fiscal periods
dim_organization Org hierarchy Sites, departments, parent-child structure
dim_product Product catalog Formulations, regulatory status, versions
dim_equipment Equipment inventory Calibration tracking, operational status
dim_process_step Process definitions Criticality levels, validation requirements
dim_root_cause 6M taxonomy Man/Machine/Material/Method/Measurement/Environment

Fact Table (Transaction Grain)

fact_deviation: One row per quality deviation event

Measures:

  • affected_quantity, rejected_quantity, rework_quantity
  • financial_impact (USD)
  • downtime_minutes

Attributes:

  • severity_level: Critical / Major / Minor
  • detection_method: In_Process / Final_Inspection / Customer_Complaint / Stability_Testing / Audit_Finding
  • status: Open / Under_Investigation / CAPA_Required / Closed
  • is_reportable: FDA/EMA regulatory flag

πŸ”§ Technical Specifications

Schema Highlights

Data Integrity

Business Logic Constraints:

-- Quantity validation
CONSTRAINT chk_quantities CHECK (
    rejected_quantity + rework_quantity <= affected_quantity
)

-- Date sequence validation  
CONSTRAINT chk_dates_sequence CHECK (
    reported_date >= deviation_date AND
    investigation_completed_date >= reported_date AND
    closed_date >= reported_date
)

-- Status-date correlation
CONSTRAINT chk_closed_logic CHECK (
    (status = 'Closed' AND closed_date IS NOT NULL) OR
    (status != 'Closed' AND closed_date IS NULL)
)

Performance Optimized

Indexes:

  • 6 Foreign key indexes (fast joins)
  • 4 Filter indexes (severity, status, batch)
  • 2 Composite indexes (common queries)

Regulatory Compliance

21 CFR Part 11 Ready:

  • Complete audit trail via SCD Type 2
  • Historical snapshots preserved
  • Traceability to source systems

ICH E2A Compliant:

  • is_reportable flag for adverse events
  • Severity classification
  • Timeline tracking

πŸš€ Quick Start

Prerequisites

  • PostgreSQL 12+
  • Git

Installation

  1. Clone Repository

    git clone https://github.com/hadiii1i/pharma-qms-olap
    cd pharma-qms-olap
  2. Create Database

    createdb pharma_qms
  3. Create Schema

    psql -d pharma_qms -c "CREATE SCHEMA qms;"
  4. Run DDL Script

    psql -d pharma_qms -f schemas/qms_complete_schema.sql
  5. Verify

    psql -d pharma_qms -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'qms';"

    Expected: 7 tables

Load Sample Data (Optional)

python scripts/generate_dim_time.py --start-year 2020 --end-year 2025 --output data/seed/dim_time.sql
psql -d pharma_qms -f data/seed/dim_time.sql

❓ Common Issues & Solutions

Connection Issues

Solution: Check PostgreSQL status and start if needed.

Schema Errors

Solution: Drop and recreate schema if exists.

FK Violations

Solution: Load dimensions before facts.

Query Performance

Solution: Use EXPLAIN ANALYZE and update statistics.


🀝 Contributing

Focus on technical contributions:

  • Sample data generators
  • Query library
  • ETL scripts
  • Documentation

Process: Fork, branch, commit, PR.


πŸ“„ License

MIT License - See LICENSE file.


πŸ“ž Contact

Hadi Yabari
Data Architect | QMS Analytics Specialist

LinkedIn | Email | GitHub

GitHub Discussions | Report Bug | Request Feature


πŸ”„ Roadmap

v1.0 - Core Schema (Complete)

  • 6 dimensions with SCD Type 2
  • 1 fact table with constraints
  • Indexes and documentation

v1.1 - Extended Facts

  • Additional fact tables (NCR, CAPA, audit)

v2.0 - ETL & Queries

  • Python ETL framework
  • Analytical query library

🌟 Star History

Star History Chart

Focused on analytical correctness and traceability πŸ­πŸ’Š

⬆️ Back to Top

About

🏭 Turn scattered pharma quality data into actionable insights | Prevent batch rejections | Automate compliance reporting | Open-source OLAP solution that saves millions | Built by QMS data professionals

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors