Implementation of our AAAI (Main Track) Paper: PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

A novel framework that simulates pre-consultation dialogues between two Vision-Language Models (VLMs) - PatientVLM and DocVLM - to enhance medical diagnosis efficiency. This work was presented at AAAI 2026.

Overview

This repository implements a Pre-Consultation Dialogue Framework (PCDF) where:

PatientVLM: Simulates patient perspective, describing symptoms, patient history, and medical images
DocVLM: Acts as a clinical expert, asking diagnostic questions and making assessments

The dialogue-based approach improves diagnostic accuracy by leveraging the complementary strengths of different VLMs.

Installation Guide

1. Clone the Repository

git clone https://github.com/vl2g/pcdf.git
cd pcdf

2. Setup Conda Environment and Install Dependencies

conda create --name pcdf python=3.11.14 -y
conda activate pcdf
pip install -r requirements.txt

Data Preparation

Prepare MedMNIST Dataset

Process the MedMNIST dataset for your desired medical specialty:

python scripts/process_medmnist.py --dataset dermamnist

Supported Datasets:

dermamnist - Dermatology images
pathmnist - Pathology images
retinamnist - Retinal fundus images
pneumoniamnist - Chest X-rays (Radiology/Pneumonia)

Pre-Consultation Dialogue Generation

Generate dialogues between PatientVLM and DocVLM using the PCDF framework.

Default Command

export PYTHONPATH="${PYTHONPATH}:$(pwd)"

python scripts/run_pcdf.py \
    --speciality Dermatology \
    --doc_vlm Qwen25VL \
    --patient_vlm mPLUGOwl3 \
    --config dermamnist \
    --split train

Arguments

Argument	Description	Default	Other Options
`--speciality`	Medical specialty domain	`Dermatology`	`Pathology`, `Ophthalmology`, `Radiology`
`--doc_vlm`	Vision-Language Model for doctor role	`Qwen25VL`	`Gemma3`, `MedGemma`, `InternVL3`, `mPLUGOwl3`
`--patient_vlm`	Vision-Language Model for patient role	`mPLUGOwl3`	`Gemma3`, `MedGemma`, `InternVL3`, `Qwen25VL`
`--config`	Dataset configuration file	`dermamnist`	`pathmnist`, `retinamnist`, `pneumoniamnist`
`--split`	Dataset split to process	`train`	`test`, `val`

Supported Vision-Language Models

Gemma3: Google's Gemma 3 vision-language model
MedGemma: Medical domain-adapted version of Gemma
InternVL3: InternVL version 3 multimodal model
Qwen25VL: Qwen 2.5 Vision-Language model
mPLUGOwl3: Alibaba's mPLUG-Owl 3 model

Specialty-Dataset Mapping

Specialty	Recommended Config	Description
`Dermatology`	`dermamnist`	Skin lesion classification
`Pathology`	`pathmnist`	Histopathology image analysis
`Ophthalmology`	`retinamnist`	Retinal fundus imaging
`Radiology`	`pneumoniamnist`	Pneumonia detection from chest X-rays

DocVLM Inference and Evaluation

After finetuning the DocVLM on pre-consultation dialogues, evaluate the DocVLM's diagnostic performance.

1. Setup Inference Environment

Create a separate environment for inference (requires Python 3.10):

conda create --name docvlm python=3.10.19 -y
conda activate docvlm
pip install -r inference/requirements.txt

2. Run DocVLM Inference

Execute inference using the trained DocVLM:

python inference/qwen25vl_inference.py

This script processes the generated PCDF dialogues and produces diagnostic predictions.

3. Evaluate Results

python inference/evaluate.py --json_path results/pcdf_dermamnist.json

Arguments:

--json_path: Path to the inference results JSON file

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{lokesh2026patientvlm,
  title     = {PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis},
  author    = {Lokesh, K and Penamakuri, Abhirama Subramanyam and Agarwal, Uday and Challa, Apoorva and Gowda, Shreya K and Gupta, Somesh and Mishra, Anand},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  year      = {2026},
  publisher = {AAAI Press}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/dermamnist		data/dermamnist
experiments		experiments
inference		inference
pcdf		pcdf
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of our AAAI (Main Track) Paper: PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

Table of Contents

Overview

Installation Guide

1. Clone the Repository

2. Setup Conda Environment and Install Dependencies

Data Preparation

Prepare MedMNIST Dataset

Pre-Consultation Dialogue Generation

Default Command

Arguments

Supported Vision-Language Models

Specialty-Dataset Mapping

DocVLM Inference and Evaluation

1. Setup Inference Environment

2. Run DocVLM Inference

3. Evaluate Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementation of our AAAI (Main Track) Paper: PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

Table of Contents

Overview

Installation Guide

1. Clone the Repository

2. Setup Conda Environment and Install Dependencies

Data Preparation

Prepare MedMNIST Dataset

Pre-Consultation Dialogue Generation

Default Command

Arguments

Supported Vision-Language Models

Specialty-Dataset Mapping

DocVLM Inference and Evaluation

1. Setup Inference Environment

2. Run DocVLM Inference

3. Evaluate Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages