(Our project is under the Maker Track, in contention for the Best Healthcare overlay)
Inspiration
Our project emerged from the need to bridge the gap between state‐of‐the‐art medical AI research and its practical deployment in low-resource clinical environments. By leveraging advanced vision-language models, our goal was to create a system that transforms standard medical scans into high-quality diagnostic outputs. We were driven by the concept of "code to connect" to connect disparate technologies and datasets to deliver a unified, scalable solution for medical imaging. Our aim was to empower clinicians with a tool that enhances, converts, and interprets medical scans, making high-end diagnostic capabilities accessible and affordable. Additionally, we aimed to gain hands-on experience implementing a vision-language large language model from end to end: including model integration, image and text preprocessing, inference handling, and result display in a web-based interface. To meet strict healthcare data privacy requirements, we chose not to use commercialized black-box models like OpenAI’s GPT that just take API calls. HIPAA compliance mandates full control over data flow and processing, so we relied on open-source, self-hosted models such as HealthGPT. This allowed us to maintain transparency, security, and auditability of patient-related image data.
What It Does
Clarimed is a unified medical imaging assistant that leverages HealthGPT-M3, a medical large vision-language model to:
- Enhance Images: Improve the resolution and clarity of low-quality scans.
- Modality Conversion: Convert images from one modality to another (e.g., CT-to-MRI and MRI-to-CT), simulating advanced imaging modalities.
- Diagnostic Reporting: Generate detailed, interpretable diagnostic reports from standard medical images.
By unifying comprehension (text-based diagnosis) and generation (image synthesis) within a single autoregressive framework, Clarimed provides clinicians with actionable insights and high-quality visual outputs from standard imaging devices.
How We Built It
Backend API with FastAPI:
We developed the backend using FastAPI that loads HealthGPT-M3 locally. The API dynamically switches between comprehension and generation modes using lightweight H-LoRA adapters. This server handles all image preprocessing (resizing, normalization, and tokenization) and model inference, returning either JSON-formatted diagnostic reports or static image URLs.Frontend Integration with Next.js:
Our Next.js frontend, built with React and Ant Design, provides an intuitive interface where clinicians can upload scans, select processing tasks, and view results side-by-side. The frontend communicates with the FastAPI backend via RESTful HTTP requests.Model Optimization & Weight Management:
By employing Heterogeneous Low-Rank Adaptation (H-LoRA), we decoupled the knowledge required for comprehension and generation, enabling dynamic weight switching without reloading the entire model. This design minimizes latency and supports multiple inference tasks with a unified model.
Challenges We Ran Into
Model Latency and Local Deployment:
Due to budget constraints, we had to load the model locally on our hardware rather than deploying it on a scalable cloud platform like AWS. This meant that each test, and sometimes even a single inference, took upwards of 30 minutes, significantly impacting our development and testing cycles. Deploying on AWS or another cloud service would offer faster inference and better scalability, which would be ideal for real-world use cases.Image Conversion and Data Type Issues:
We encountered numerous issues related to image conversion. Medical images come in various formats and resolutions, and ensuring that the image dimensions and data types were correctly preprocessed for the model was a significant challenge. Mismatched dimensions, incorrect normalization, and type discrepancies between CPU and GPU tensors led to frequent runtime errors. We had to implement so many checks and explicit device transfers to resolve these issues.Seamless Integration Across Components:
Integrating the complex HealthGPT inference pipeline with our web API and frontend required careful abstraction. The challenge was to preserve the original model’s logic while adapting it to a production-ready API architecture. Managing asynchronous API calls, dynamic weight switching, and error handling, without compromising performance, proved to be a non-trivial engineering task.
Accomplishments That We're Proud Of
Unified, Scalable Pipeline:
We developed an end-to-end system that unifies medical image comprehension and generation within a single, adaptable pipeline. This solution connects advanced AI with practical diagnostic tools, providing high-quality outputs on standard hardware.Efficient API-Frontend Integration:
Our solution demonstrates how minimal modifications to existing open-source code can create a robust, interactive platform that seamlessly connects the backend AI with a modern web interface.Real-World Impact:
By leveraging efficient parameter-adaptation techniques (H-LoRA), our system delivers advanced diagnostic capabilities to settings where high-end imaging equipment is unavailable, potentially improving patient outcomes in resource-constrained environments.
What We Learned
Bridging Research and Production:
We discovered that successfully integrating research-level AI with production-level applications requires a deep understanding of both system design and model internals. Abstracting complex model logic into modular, API-accessible components was critical for creating a deployable solution.Optimizing State and Device Management:
Fine-tuning state management in React and ensuring that all model computations occur on the correct device (GPU vs. CPU) were vital lessons that helped us overcome performance bottlenecks and runtime errors.Dynamic Adaptation Techniques:
Implementing dynamic weight switching with H-LoRA adapters demonstrated the importance of flexible model design. This approach enabled the same base model to handle diverse tasks with minimal overhead, a key insight for future multi-modal AI systems.
What's Next for Clarimed
- Enhance the Frontend:
Integrate advanced image editing tools (e.g., brightness, contrast adjustments, cropping) to provide clinicians with real-time manipulation capabilities for uploaded scans. - Expand Functionalities:
Incorporate additional diagnostic features, such as automated segmentation and treatment suggestions, and even more modality configurations to create a more comprehensive clinical support system.
Log in or sign up for Devpost to join the conversation.