Public Health Mortality Modeling for Policy Analysis

TL;DR

This project analyzes county-level public health data to understand what drives premature mortality and how those signals can inform policy decisions.
Across models, five factors consistently showed the strongest association with higher mortality rates:

Limited access to primary care and higher preventable hospital stays
Socioeconomic disadvantage (income, employment, education proxies)
Population-level chronic health risks
Behavioral risk factors tied to long-term disease burden
Structural and environmental conditions affecting health outcomes

The goal is not prediction for its own sake, but identifying which levers matter most.

Problem Context

Public health policy decisions are often made without clear visibility into which county-level factors most strongly influence mortality outcomes.
This project asks a focused question: given observable public health indicators, which factors are most predictive of premature death, and how stable are those relationships across models?

The intent is to support prioritization and policy planning, not to replace expert judgment.

Key Findings

Feature importance analysis showed that premature mortality is driven primarily by structural and access-related conditions, rather than any single behavioral or clinical metric.
Healthcare access and socioeconomic factors consistently dominated model influence, while purely clinical indicators explained only part of the variation.

This reinforces that effective mortality reduction requires upstream interventions, not only medical treatment.

Dataset

This analysis uses a County Health Rankings (CHR)–style analytic dataset provided as a CSV file.
In the notebook, the dataset is referenced as:

analytic_data2025_v2.csv

The CHR analytic codebook explains variable naming conventions (e.g., v###_rawvalue), along with numerator, denominator, and confidence interval fields. This documentation is essential for interpreting feature meaning and avoiding misuse of derived metrics.

High-Level Method

The notebook follows a standard analytics workflow:

Load the analytic dataset and select the target variable

Target: Premature Death (raw value)

Clean and preprocess features

remove non-informative identifiers
handle missing values and normalize inputs as needed

Split data into training and test sets
Benchmark multiple models

Linear Regression (interpretable baseline)
KNN Regression (local similarity baseline)
Random Forest Regression (captures non-linear interactions)

Evaluate using variance explained (R²) and error magnitude (MSE)

This structure allows results to be compared across modeling assumptions while keeping the analysis grounded in policy interpretation.

Overall Results

Several regression models were benchmarked to test robustness of findings across modeling assumptions.
While ensemble methods captured non-linear relationships more effectively than linear baselines, the relative importance of the top contributing factors remained consistent.

Model accuracy metrics are discussed in the notebook, but the primary value of this analysis is explanatory rather than predictive.

How to Use

This repository is designed to be exploratory and reproducible.

Review the notebook to understand feature selection, modeling choices, and interpretation logic
Run the notebook end-to-end to reproduce preprocessing, modeling, and evaluation
Use feature importance outputs to reason about policy-relevant drivers of mortality
Extend the analysis with interpretability tools or geographic stratification as needed

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
2025 CHR CSV SAS Analytic Documentation.pdf		2025 CHR CSV SAS Analytic Documentation.pdf
README.md		README.md
analytic_data2025_v2.csv		analytic_data2025_v2.csv
mortality_prediction.ipynb		mortality_prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Public Health Mortality Modeling for Policy Analysis

TL;DR

Problem Context

Key Findings

Dataset

High-Level Method

Overall Results

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Public Health Mortality Modeling for Policy Analysis

TL;DR

Problem Context

Key Findings

Dataset

High-Level Method

Overall Results

How to Use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages