A comprehensive dbt project for analyzing cancer patient data using DuckDB as the data warehouse.# Cancer Insights Analytics
A dbt project analyzing cancer patient data to identify severity patterns and rank risk factors.
The transformed dataset address:
- Which demographics have higher cancer severity rates?
- What factors correlate most with cancer severity?

- Source: Kaggle - Cancer Patients and Air Pollution
- Records: 1,000 patients
- Features: Demographics, lifestyle factors, environmental factors, severity level
├── seeds/
│ └── cancer_patient_data_sets.csv
├── models/
│ ├── staging/
│ │ └── stg_patients.sql
│ ├── intermediate/
│ │ ├── int_patient_lifestyle.sql
│ │ └── int_patient_non_lifestyle.sql
│ └── marts/
│ ├── severity_by_gender.sql
│ ├── severity_by_age_group.sql
│ └── risk_factor_analysis.sql
└── macros/
└── age_bucket.sql
seed → staging → intermediate → marts
│
┌───────────┴───────────┐
│ │
int_patient int_patient
_lifestyle _non_lifestyle
│ │
└───────────┬───────────┘
│
┌────────────────┼────────────────┐
│ │ │
severity_by severity_by risk_factor
_gender _age_group _analysis
| Layer | Purpose |
|---|---|
| Staging | Clean and rename columns |
| Intermediate | Split into lifestyle vs non-lifestyle factors |
| Marts | Business-ready aggregations and rankings |
| Model | Description |
|---|---|
severity_by_gender |
Severity distribution by gender |
severity_by_age_group |
Severity distribution by age group |
risk_factor_analysis |
Ranks factors by correlation with severity |
- Alcohol use has the strongest correlation with severity
- Obesity ranks second
- Gender 1 has 42% high severity vs Gender 2 at 28%
- Severity increases with age
# Install dbt with DuckDB
pip install dbt-duckdb
# Run the project
dbt seed
dbt run
dbt test
# Generate docs
dbt docs generate
dbt docs serve- Transformation: dbt Core
- Warehouse: DuckDB
- Language: SQL + Jinja