Official code release for the ICLR 2026 paper: Automatic Image-Level Morphological Trait Annotation for Organismal Images.
🌐 Website: osu-nlp-group.github.io/sae-trait-annotation
🤗 Dataset: osunlp/bioscan-traits
This repository provides an end-to-end pipeline to:
- preprocess BIOSCAN-5M into an
ImageFolderlayout, - train a Sparse Autoencoder (SAE) on DINOv2 activations,
- identify species-level prominent latents, and
- generate natural-language morphological trait annotations using MLLMs (Qwen2.5-VL).
The SAE training/inference stack is adapted from the public SAEV repository and is included here for convenience.
.
|-- preprocess_bioscan.py
|-- create_trait_dataset_mllm_sae.py
|-- create_trait_dataset_mllm.py
|-- saev/ # SAEV codebase (vendored)
`-- utils/
|-- create_train_json.py
`-- convert_trait_wds.py
This project uses Python 3.11 and uv.
pip install uvDependencies are installed automatically when running commands via uv run.
pip install bioscan-dataset
python - <<'PY'
from bioscan_dataset import BIOSCAN5M
_ = BIOSCAN5M("~/Datasets/bioscan-5m", download=True)
PYcreate_trait_dataset_* scripts expect a train/ subdirectory under --data-dir.
python preprocess_bioscan.py \
--csv-file /path/to/bioscan-5m/metadata.csv \
--image-dir /path/to/bioscan-5m/images \
--out-dir /path/to/processed_bioscan/trainuv run python -m saev activations \
--vit-family dinov2 \
--vit-ckpt dinov2_vitb14 \
--vit-batch-size 1024 \
--d-vit 768 \
--n-patches-per-img 256 \
--vit-layers -2 \
--dump-to /path/to/activations \
--n-patches-per-shard 2_4000_000 \
data:image-folder-dataset \
--data.root /path/to/processed_bioscan/trainuv run python -m saev train \
--data.shard-root /path/to/activations \
--data.layer -2 \
--data.patches patches \
--data.scale-mean False \
--data.scale-norm False \
--sae.d-vit 768 \
--sae.exp-factor 32 \
--ckpt-path /path/to/sae_ckpt \
--lr 1e-3 > LOG.txt 2>&1vllm serve Qwen/Qwen2.5-VL-72B-Instruct \
--tensor-parallel-size 4 \
--max-model-len 32768 \
--gpu-memory-utilization 0.9 \
--host 0.0.0.0 \
--port <PORT>Single-image prompts (n-img-input=1):
uv run python -u create_trait_dataset_mllm_sae.py \
--data-dir /path/to/processed_bioscan \
--sae-ckpt-path /path/to/sae_ckpt/sae.pt \
--thresh 0.9 \
--trait-thresh 3e-3 \
--out-dir /path/to/output_mllm_sae \
--serve-choice qwen_72b \
--api-url http://0.0.0.0:<PORT>/v1/chat/completions > LOG.txt 2>&1Multi-image prompts (n-img-input=3):
uv run python -u create_trait_dataset_mllm_sae.py \
--data-dir /path/to/processed_bioscan \
--sae-ckpt-path /path/to/sae_ckpt/sae.pt \
--thresh 0.9 \
--trait-thresh 3e-3 \
--out-dir /path/to/output_mllm_sae \
--serve-choice qwen_72b \
--api-url http://0.0.0.0:<PORT>/v1/chat/completions \
--n-img-input 3 > LOG.txt 2>&1Single-image prompts:
uv run python -u create_trait_dataset_mllm.py \
--data-dir /path/to/processed_bioscan \
--sae-ckpt-path /path/to/sae_ckpt/sae.pt \
--thresh 0.9 \
--trait-thresh 3e-3 \
--out-dir /path/to/output_mllm \
--serve-choice qwen_72b \
--api-url http://0.0.0.0:<PORT>/v1/chat/completions \
--n-img-input 1 > LOG.txt 2>&1Multi-image prompts:
uv run python -u create_trait_dataset_mllm.py \
--data-dir /path/to/processed_bioscan \
--sae-ckpt-path /path/to/sae_ckpt/sae.pt \
--thresh 0.9 \
--trait-thresh 3e-3 \
--out-dir /path/to/output_mllm \
--serve-choice qwen_72b \
--api-url http://0.0.0.0:<PORT>/v1/chat/completions \
--n-img-input 3 > LOG.txt 2>&1In --out-dir, key artifacts include:
latent_to_patch_map.json(MLLM+SAE pipeline),species_latents_prominent/latent_response.jsonl(model responses),- per-species annotated patch visualizations under
species_latents_prominent/<species_name>/.
Use --debug and --n-debug-ex in generation scripts for small-scale dry runs.
For downstream classifier training, we build on BioCLIP.
Preprocessing helpers in this repo:
utils/create_train_json.py: build train JSONs from CSV metadata.utils/convert_trait_wds.py: convert trait annotations to WebDataset format.
If you use this repository, please cite the paper:
@inproceedings{
pahuja2026automatic,
title={Automatic Image-Level Morphological Trait Annotation for Organismal Images},
author={Vardaan Pahuja and Samuel Stevens and Alyson East and Sydne Record and Yu Su},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=oFRbiaib5Q}
}