MedGraph

Graph Neural Network
Synthetic Data Patient 300 Graph
Planning Whiteboard

Inspiration

MedGraph was conceived to tackle a critical gap in healthcare analytics: the need for advanced, privacy-preserving techniques to predict disease progression in individuals. Traditional methods often treat patient encounters as isolated events, missing the dynamic interplay between encounters, conditions, and medications. Inspired by breakthroughs in graph neural networks and the promise of federated learning, we set out to create a system that leverages containerized, separated data to construct realistic, interconnected patient graphs capturing the evolution of health over time—all while ensuring sensitive data remains secure. Creating these patient GNN models on the hospital level allows us to create a Globalized Model for all individuals to model the key interplay between a host of conditions, medicines, and observations.

What It Does

MedGraph generates synthetic health data using Synthea from diverse regions and preprocesses this data to build detailed graphs for individual patients. This synthetic data is purely for proof of concept. In these graphs, nodes represent encounters enriched with observation feature vectors, while edges capture temporal relationships between encounters as well as link conditions and medications based on overlapping timelines. We then train a Graph Attention Network (GATNet) that focuses on START and END edges—critical markers that indicate when a condition begins or ends relative to patient encounters. The model uses these insights to predict which conditions might vanish or emerge at the next visit. Crucially, our federated learning setup allows hospitals to train models locally and share only encrypted model updates with a global server, ensuring that raw patient data is never centralized.

How We Built It

We built MedGraph using a rigorous multi-step pipeline that begins with synthetic data generation via Synthea, ensuring a diverse and realistic dataset. Custom Python scripts clean and preprocess the data into a standardized format, which we then use to construct patient-specific graphs that capture both temporal and overlapping relationships. The GATNet model, implemented with PyTorch Geometric, employs an attention mechanism engineered to focus on the START and END edges, highlighting key transitions in patient health. Federated learning is facilitated using frameworks such as PySyft, enabling each hospital to train its local model and securely contribute encrypted updates to a central aggregation server. This approach not only improves prediction accuracy but also upholds strict data privacy standards.

Why This Fits Into Data Science

MedGraph embodies the core principles of the data science track by transforming complex synthetic healthcare data into actionable insights. It leverages powerful data analytics to convert raw electronic health records into structured patient graphs, revealing temporal patterns and relationships that would otherwise remain hidden. By integrating advanced machine learning techniques—in particular, graph neural networks with specialized attention mechanisms—our project not only discovers critical trends in patient data but also drives precise predictive analysis. Furthermore, the use of privacy-preserving federated learning ensures that sensitive data is securely analyzed, a key consideration in modern data science. This synthesis of data engineering, visualization, and predictive modeling makes MedGraph a quintessential data science solution, serving as a navigational compass in the intricate landscape of healthcare.

Challenges We Ran Into

Developing MedGraph presented several challenges:

Accurately reflecting the complexities of real-world EHRs with synthetic data required extensive cleaning and validation.
Designing a graph structure capable of capturing nuanced temporal relationships—especially with overlapping timelines—demanded careful thought and iterative tuning.
Fine-tuning the GATNet's attention mechanism on START and END edges involved rigorous experimentation to balance predictive performance and interpretability.
Implementing a robust federated learning framework that aggregates encrypted weights seamlessly, without compromising efficiency or data security, was a significant technical challenge.

Accomplishments That We're Proud Of

We are proud to have built a system that:

Enhances the accuracy of disease progression predictions while maintaining uncompromising patient privacy.
Demonstrates how synthetic data, advanced graph neural network techniques, and federated learning can work in concert to provide actionable clinical insights.
Innovatively uses START and END edge attention within a GATNet model, coupled with a secure federated aggregation pipeline, proving that it is possible to innovate at the intersection of healthcare, AI, and cybersecurity.

What We Learned

Throughout the development of MedGraph, we:

Deepened our understanding of capturing temporal dynamics in patient data and the intricacies of working with synthetic yet realistic EHR models.
Honed our skills in building and optimizing graph neural networks, particularly in leveraging attention mechanisms for critical event detection.
Learned valuable lessons in implementing federated learning systems that protect data privacy while enabling collaborative model improvement across institutions.
Recognized the importance of iterative testing, rigorous evaluation, and interdisciplinary collaboration to balance innovation with security.

What's Next for MedGraph

Looking forward, we plan to:

Further enrich our patient graphs by integrating more granular data sources, such as lab results, clinical notes, and imaging data, to capture even finer details of a patient’s health trajectory.
Refine the GATNet's attention mechanism for better sensitivity and specificity in prediction.
Expand our federated learning network to include more hospitals and clinical partners, ultimately rolling out MedGraph as a real-time decision support tool that empowers clinicians with predictive insights while adhering to the highest standards of data privacy and security.