Skip to content

analondhe/cancer_insights_analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cancer Insights Analytics - dbt Project

A comprehensive dbt project for analyzing cancer patient data using DuckDB as the data warehouse.# Cancer Insights Analytics

A dbt project analyzing cancer patient data to identify severity patterns and rank risk factors.

The transformed dataset address:

  • Which demographics have higher cancer severity rates?
  • What factors correlate most with cancer severity? Screenshot 2026-02-10 at 8 54 12 PM

Data Source

Project Structure

├── seeds/
│   └── cancer_patient_data_sets.csv
├── models/
│   ├── staging/
│   │   └── stg_patients.sql
│   ├── intermediate/
│   │   ├── int_patient_lifestyle.sql
│   │   └── int_patient_non_lifestyle.sql
│   └── marts/
│       ├── severity_by_gender.sql
│       ├── severity_by_age_group.sql
│       └── risk_factor_analysis.sql
└── macros/
    └── age_bucket.sql

Data Flow

seed → staging → intermediate → marts
                      │
         ┌───────────┴───────────┐
         │                       │
   int_patient            int_patient
   _lifestyle             _non_lifestyle
         │                       │
         └───────────┬───────────┘
                     │
    ┌────────────────┼────────────────┐
    │                │                │
severity_by    severity_by     risk_factor
_gender        _age_group      _analysis

Layer Descriptions

Layer Purpose
Staging Clean and rename columns
Intermediate Split into lifestyle vs non-lifestyle factors
Marts Business-ready aggregations and rankings

Mart Models

Model Description
severity_by_gender Severity distribution by gender
severity_by_age_group Severity distribution by age group
risk_factor_analysis Ranks factors by correlation with severity

Key Findings

  • Alcohol use has the strongest correlation with severity
  • Obesity ranks second
  • Gender 1 has 42% high severity vs Gender 2 at 28%
  • Severity increases with age

Setup

# Install dbt with DuckDB
pip install dbt-duckdb

# Run the project
dbt seed
dbt run
dbt test

# Generate docs
dbt docs generate
dbt docs serve

Tech Stack

  • Transformation: dbt Core
  • Warehouse: DuckDB
  • Language: SQL + Jinja

About

This a dbt project for Lung cancer dataset, where we do transformation from raw to analysis ready datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors