GSEM is a graph-based experience memory framework for clinical reasoning agents. It extracts reusable experiences from reasoning trajectories, organizes them into a dual-layer memory graph, retrieves applicable experiences for new cases, and continuously calibrates memory quality and inter-experience relations through online feedback.
Large language model agents can benefit from reusing prior decision experience, but flat memory banks often store experiences as isolated records. This makes it difficult to:
- verify whether a retrieved experience is truly applicable under the current clinical conditions;
- model how multiple experiences should be jointly used;
- continuously refine memory reliability after deployment.
GSEM addresses these challenges with a three-stage framework:
- Memory Construction: extract structured experiences from successful and failed reasoning trajectories, validate their initial reliability, and build a dual-layer memory graph.
- Memory Retrieval: perform hybrid seed recall and graph-based multi-seed traversal to retrieve boundary-aware, composition-compatible experiences.
- Memory Evolution: update node quality and edge weights using task feedback, enabling the memory graph to self-evolve over time.
- Dual-layer memory graph
- Entity layer models the internal decision structure of each experience.
- Experience layer models relations across experiences.
- Experience types
- Indication: reusable successful decision knowledge.
- Contraindication: reusable failure-derived knowledge that highlights what should be avoided.
- Applicability-aware retrieval
- combines entity-based recall, embedding-based recall, reranking, and graph traversal.
- Online self-evolution
- calibrates experience quality and relation weights without rewriting experience content.
GSEM/
├── main_phase1.py # Phase 1 entry: experience extraction
├── main_phase2.py # Phase 2 entry: graph construction
├── main_phase3.py # Phase 3 entry: online evolution
├── experiences.jsonl # Extracted experience data
├── requirements.txt
├── .env.example
├── src/
│ ├── shared/ # Shared modules (config, logger, utils)
│ ├── phase1/ # Phase 1: experience extraction pipeline
│ │ ├── pipeline.py
│ │ ├── prompts.py
│ │ ├── prompt_provider.py
│ │ ├── stages/ # Rollout, normalization, deduplication, ERV, etc.
│ │ └── agents/ # ReAct agent
│ ├── phase2/ # Phase 2: graph construction
│ │ └── graph/ # Entity extraction, normalization, similarity scoring, export
│ └── phase3/ # Phase 3: online evolution
│ ├── ttl/ # Online evolution pipeline and reasoning agent
│ └── retrieval/ # Graph-based experience retrieval
├── data/ # Intermediate and processed data
└── evaluation/
└── medrb/
└── data/ # Evaluation test splits
Runs a multi-stage pipeline (rollout → normalization → deduplication → ERV) to extract structured experiences from agent trajectories.
python main_phase1.pyBuilds the dual-layer memory graph by extracting entities, computing similarity signals, and exporting the graph structure.
python main_phase2.pyRuns the online TTL pipeline: the agent retrieves relevant experiences from the graph, solves new cases, and incrementally updates memory quality and relation weights.
python main_phase3.pypip install -r requirements.txtcp .env.example .env
# Edit .env and fill in your API keys- Sample multiple reasoning trajectories for each case.
- Distill successful trajectories into Indication experiences.
- Distill failure-success divergences into Contraindication experiences.
- Run Experience Reliability Validation (ERV) to initialize experience quality.
- Construct the dual-layer memory graph.
- Use entity-based recall to match decision-relevant conditions.
- Use embedding-based recall to capture semantic similarity.
- Merge and rerank retrieved candidates.
- Start from multiple seeds and perform graph traversal to collect compatible experiences.
- Use task feedback after each case.
- Update node quality scores for activated experiences.
- Update edge weights for co-activated experience pairs.
- Optionally insert newly extracted experiences into the graph.
Title: GSEM: Graph-based Self-Evolving Memory for Experience-Augmented Clinical Reasoning
If you use this repository, please cite the corresponding paper.
@article{han2026gsem,
title={GSEM: Graph-based Self-Evolving Memory for Experience-Augmented Clinical Reasoning},
author={Han, Xiao and Fan, Yuzheng and Zhao, Sendong and Wang, Haochun and Qin, Bing},
journal={arXiv preprint arXiv},
year={2026}
}- This repository currently focuses on the code framework for the three stages of GSEM.
- Dataset preparation, environment variables, and model backends should be configured according to your local setup.
- The framework is designed for research use.
