Skip to content

CarnegieJ/onc-clinical-intel-agent

Repository files navigation

🏥 Oncology Clinical Intelligence Agent Backend (Demo/Prototype)

Clinical Intelligence Agent Backend

Important

🛑 STOP & READ: Before running the demo notebooks, you must configure your demo or development environment (Windows/Linux/Mac). Tips found in the guide.

👉 Click here to read the full GUIDE.md

Open in GitHub Codespaces

Status Tech Stack

Author: [Carnegie Johnson/IAYF Consulting]
License: MIT (Attribution Required)

🎯 Executive Summary

This repository demonstrates the Backend Architecture for a secure Clinical Decision Support Agent. Designed for high-compliance healthcare environments, it integrates Microsoft Fabric (OneLake) for data storage with Azure AI for secure inference.

Unlike standard chatbots, this architecture prioritizes:

  1. Data Lineage: Traceable ETL pipelines from public sources (cBioPortal) to Silver Delta Tables.
  2. Clinical Validity: Statistical checks for "Artificial Capping" and outliers before inference.
  3. Safety First: A dedicated Middleware layer that sanitizes PII (Protected Health Information) before it reaches the LLM.

🏗️ Architecture

Architecture Insight: The Medallion Model

In Data Engineering, best practice is the Medallion Architecture:

  • 🥉 Bronze Layer (Raw): Raw .tar.gz files sitting in a folder. They are hard to query and "messy."
  • 🥈 Silver Layer (Clean): Clean the headers, and organize them into Delta Tables (high-performance SQL tables).
  • 🥇 Gold Layer (Curated): Aggregated data ready for dashboards and AI agents.
  1. Ingestion Layer (Phase 1): Python scripts fetch raw .tar.gz archives from cBioPortal and stream them into Microsoft OneLake.
Data summary Data distribution
  1. Processing Layer (Fabric): PySpark notebooks transform raw files into queryable Delta Tables (Silver Layer).
Delta Table Lakehouse
  1. Analysis Layer (Phase 2): Local Python (VS Code) connects via ODBC/SQL to validate data distributions. Age vs Survival

  2. Safety Layer (Phase 3): A Hybrid Guardrails system uses Regex + Azure Content Safety to block toxic or PII-laden prompts.


⚙️ Configuration & Setup

Prerequisites:

  • Azure account
  • Microsoft Fabric workspace
  • Python 3.10+
  • Visual Studio Code (Jupyter notebooks)
  • Azure CLI: Run az login to authenticate.
  • ODBC Driver: You MUST install the ODBC Driver 18 for SQL Server for Phase 2 to work.

1. Clone & Environment

To get started quickly, we provide a template for your environment variables.

  1. Clone the repo:

    git clone [https://github.com/CarnegieJ/onc-clinical-intel-agent/](https://github.com/CarnegieJ/onc-clinical-intel-agent)[CarnegieJ]/clinical-intelligence-agent.git
    cd clinical-intelligence-agent
  2. Configure Secrets:

    • Locate the file named .env.template in the root directory.
    • Rename it to .env.
    • Open it and fill in your specific Azure values (Workspace ID, Connection Strings, etc.).
    # Example Command (Terminal)
    cp .env.template .env

About

Backend architecture for a secure Clinical Decision Support Agent. Features interactive data validation (Jupyter/EDA), automated ETL (cBioPortal → Microsoft Fabric OneLake), Hybrid RAG logic, and custom PII/Safety middleware using the Azure AI Foundry.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors