README_CN.md

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

虽然检索增强生成（RAG）在直接知识检索方面表现出色，但在需要抽象或多步推理的复杂查询面前却表现一般。为了弥补这一不足，我们开发了DIVER，这是一种专门针对这些推理密集型任务设计的检索管道。DIVER集成了四个阶段：文档预处理、迭代的LLM 驱动查询扩展、在复杂合成数据上进行微调的专用检索器，以及一种将listwise与pointwise相结合的新型重排序器。在BRIGHT基准测试中，DIVER创造了新的最佳成绩，显著优于其他推理感知模型（NDCG 45.8）。这些结果突显了将深度推理融入检索以解决复杂现实世界问题的有效性。更多详情请参阅Diver论文。

🎉 更新列表

[2025-11-20] 🚀 我们发布了我们的重排模型 Diver-GroupRank-7B 和 Diver-GroupRank-32B, 推理代码可在 ./Retriever/rerank_groupwise.py 找到. 我们的 GroupRank-32B 模型经过测试时增强后可在BRIGHT上达到 46.8 的分数，详见 paper。
[2025-10-20] 🚀 我们在 ModelScope和Hugging Face上发布了 DIVER-Retriever-4B-1020 模型，在 BRIGHT 基准上取得了 31.9 的成绩。
[2025-10-14] 🚀 我们在 ModelScope和Hugging Face上发布了 DIVER-Retriever-1.7B 模型，在 BRIGHT 基准上取得了 27.3 的成绩。
[2025-09-12] 🚀 我们发布了使用 Gemini 的 listwise 重排序代码；可以在 ./Retriever/rerank_listwise.py 找到，并在 BRIGHT 上取得了 43.9 的得分。
[2025-09-05] 🚀 我们在 ModelScope和Hugging Face上发布了 DIVER-Retriever-0.6B 模型，在 BRIGHT 基准上取得了 25.2 的成绩。
[2025-08-28] 🚀 我们在 ModelScope 上发布了 DIVER-Retriever-4B 模型。
[2025-08-24] 🏆 我们更新了Diver V2，在Bright Leaderboard效果进一步提升至45.8。
[2025-08-18] 🚀 我们开源了Diver的整体代码库包括推理和训练。

待办列表

⬜ 开源 DIVER-VL-Embedding 与 DIVER-VL-Reranker：发布源码与模型
✅ 开源 DIVER-Reranker：发布源码与模型

模型下载

您可以下载以下表格，以查看适用于您场景的各种参数。如果您位于中国大陆，我们还将在 ModelScope.cn 上提供该模型，以加快下载速度。

Model	#Total Params	Context Length	Download	BRIGHT
Diver-GroupRank-7B	7B	32K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-7B [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-7B
Diver-GroupRank-32B	32B	32K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-32B [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-32B	46.8
DIVER-Retriever-4B-1020	4B	40K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B-1020 [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B-1020	31.9
DIVER-Retriever-4B	4B	40K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B	28.9
DIVER-Retriever-1.7B	1.7B	40K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B	27.3
DIVER-Retriever-0.6B	0.6B	32K	[🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B [🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B	25.2

评估结果

整体评估结果

Diver在 BRIGHT榜单上与其他基线的性能比较。每个数据集的最佳结果均以粗体突出显示。

Method	Avg.	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.
Rank-R1-14B	20.5	31.2	38.5	21.2	26.4	22.6	18.9	27.5	9.2	20.2	9.7	11.9	9.2
Qwen1.5-7B with InteRank-3B	27.4	51.2	51.4	22.4	31.9	17.3	26.6	22.4	24.5	23.1	13.5	19.3	25.5
GPT4 with Rank1-32B	29.4	49.7	35.8	22.0	37.5	22.5	21.7	35.0	18.8	32.5	10.8	22.9	43.7
ReasonIR with QwenRerank	36.9	58.2	53.2	32.0	43.6	28.8	37.6	36.0	33.2	34.8	7.9	32.6	45.0
ReasonIR with Rank-R1-32B	38.8	59.5	55.1	37.9	52.7	30.0	39.3	45.1	32.1	17.1	10.7	40.4	45.6
RaDeR with QwenRerank	39.2	58.0	59.2	33.0	49.4	31.8	39.0	36.4	33.5	33.3	10.8	34.2	51.6
XRR2	40.3	63.1	55.4	38.5	52.9	37.1	38.2	44.6	21.9	35.0	15.7	34.4	46.2
ReasonRank	40.8	62.72	55.53	36.7	54.64	35.69	38.03	44.81	29.46	25.56	14.38	41.99	50.06
DIVER	41.6	62.2	58.7	34.4	52.9	35.6	36.5	42.9	38.9	25.4	18.3	40.0	53.1
BGE Reasoner	45.2	66.5	63.7	39.4	50.3	37	42.9	43.7	35.1	44.3	17.2	44.2	58.5
DIVER V2	45.8	68	62.5	42.0	58.2	41.5	44.3	49.2	34.8	32.9	19.1	44.3	52.6

Diver检索器评估结果

Method	Avg.	Bio.	Earth.	Econ.	Psy.	Rob.	Stack.	Sus.	Leet.	Pony	AoPS	TheoQ.	TheoT.
Evaluate Retriever with Original Query
BM25	14.5	18.9	27.2	14.9	12.5	13.6	18.4	15.0	24.4	7.9	6.2	10.4	4.9
SBERT	14.9	15.1	20.4	16.6	22.7	8.2	11.0	15.3	26.4	7.0	5.3	20.0	10.8
gte-Qwen1.5-7B	22.5	30.6	36.4	17.8	24.6	13.2	22.2	14.8	25.5	9.9	14.4	27.8	32.9
Qwen3-4B	5.6	3.5	8.0	2.3	2.0	1.6	1.0	4.4	2.1	0.1	4.9	18.0	19.2
OpenAI	17.9	23.3	26.7	19.5	27.6	12.8	14.3	20.5	23.6	2.4	8.5	23.5	11.7
Google	20.0	22.7	34.8	19.6	27.8	15.7	20.1	17.1	29.6	3.6	9.3	23.8	15.9
ReasonIR-8B	24.4	26.2	31.4	23.3	30.0	18.0	23.9	20.5	35.0	10.5	14.7	31.9	27.2
RaDeR-7B	25.5	34.6	38.9	22.1	33.0	14.8	22.5	23.7	37.3	5.0	10.2	28.4	35.1
Seed1.5-Embedding	27.2	34.8	46.9	23.4	31.6	19.1	25.4	21.0	43.2	4.9	12.2	33.3	30.5
DIVER-Retriever-0.6B	25.2	36.4	41.9	29.0	31.0	21.2	24.6	23.2	15.6	6.8	8.4	33.2	31.7
DIVER-Retriever-4B	28.9	41.8	43.7	21.7	35.3	21.0	21.2	25.1	37.6	13.2	10.7	38.4	37.3
Evaluate Retriever with GPT-4 REASON-query
BM25	27.0	53.6	54.1	24.3	38.7	18.9	27.7	26.3	19.3	17.6	3.9	19.2	20.8
SBERT	17.8	18.5	26.3	17.5	27.2	8.8	11.8	17.5	24.3	10.3	5.0	22.3	23.5
gte-Qwen1.5-7B	24.8	35.5	43.1	24.3	34.3	15.4	22.9	23.9	25.4	5.2	4.6	28.7	34.6
Qwen3-4B	5.5	1.3	17.3	2.5	6.2	1.0	4.8	4.5	3.0	5.9	0.0	7.2	12.5
OpenAI	23.3	35.2	40.1	25.1	38.0	13.6	18.2	24.2	24.5	6.5	7.7	22.9	23.8
Google	26.2	36.4	45.6	25.6	38.2	18.7	29.5	17.9	31.1	3.7	10.0	27.8	30.4
ReasonIR-8B	29.9	43.6	42.9	32.7	38.8	20.9	25.8	27.5	31.5	19.6	7.4	33.1	35.7
RaDeR-7B	29.2	36.1	42.9	25.2	37.9	16.6	27.4	25.0	34.8	11.9	12.0	37.7	43.4
DIVER-Retriever-4B	32.1	51.9	53.5	29.5	41.2	21.4	27.5	26.1	33.5	11.7	9.5	39.3	39.7
Evaluate retriever with DIVER-QExpand query
ReasonIR-8B	32.6	49.4	44.7	32.4	44.0	26.6	31.8	29.0	32.3	12.8	9.1	40.7	38.4
+BM25 (Hybrid)	35.7	56.8	53.5	33.0	48.5	29.4	34.2	32.0	35.2	16.8	12.9	39.3	36.8
DIVER-Retriever	33.9	54.5	52.7	28.8	44.9	25.1	27.4	29.5	34.5	10.0	14.5	40.7	44.7
+BM25 (Hybrid)	37.2	60.0	55.9	31.8	47.9	27.1	33.9	31.9	35.1	23.1	16.8	36.9	46.6

一键使用

推理

一键复现

sh run_all.sh

Retriever使用（Hugging Face Transformers)

使用Sentence Transformers

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("AQ-MedAI/Diver-Retriever-4B")


# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

vLLM usage

# Requires vllm>=0.8.5
import torch
import vllm
from vllm import LLM

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model = LLM(model="AQ-MedAI/Diver-Retriever-4B", task="embed")

outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)

训练

我们建议您使用 swift 来对我们的 DIVER-Retriever-4B 进行微调。在开始训练之前，请确保您的环境已正确配置好。

pip install ms-swift -U
# Install from source
pip install git+https://github.com/modelscope/ms-swift.git

pip install transformers -U

# Optional packages
pip install deepspeed # multi-GPU training
pip install liger-kernel # save GPU memory resources
pip install flash-attn --no-build-isolation

训练数据准备

# LLM
{"query": "sentence1", "response":  "sentence2"}
# MLLM
{"query": "<image>", "response":  "sentence", "images": "/some/images.jpg"}
{"query": "<image>sentence1", "response":  "<image>sentence2", "rejected_response": ["<image>sentence1", "<image>sentence2"], "images": ["/some/images.jpg", "/some/images.jpg", "/some/images.jpg", "/some/images.jpg"]}

训练命令

以infonce loss为例，完整的训练指令如下：

nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node \
swift sft \
    --model DIVER/DIVER-Retriever-4B \
    --task_type embedding \
    --model_type qwen3_emb \
    --train_type full \
    --dataset your_dataset \
    --split_dataset_ratio 0.05 \
    --eval_strategy steps \
    --output_dir output \
    --eval_steps 20 \
    --num_train_epochs 5 \
    --save_steps 20 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --learning_rate 6e-6 \
    --loss_type infonce \
    --label_names labels \
    --dataloader_drop_last true \
    --deepspeed zero3

引用

如果您觉得我们的工作有所帮助，请随时告知我们，我们会非常感激。

@misc{DIVER,
      title={DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval}, 
      author={Meixiu Long and Duolin Sun and Dan Yang and Junjie Wang and Yue Shen and Jian Wang and Peng Wei and Jinjie Gu and Jiahai Wang},
      year={2025},
      eprint={2508.07995},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2508.07995}, 
}

致谢

我们感谢之前的相关研究以及它们所发布的开源资源：BRIGHT, ReasonIR, RaDer, ThinkQE, Qwen3-Embedding。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

🎉 更新列表

待办列表

模型下载

评估结果

整体评估结果

Diver检索器评估结果

一键使用

推理

一键复现

Retriever使用（Hugging Face Transformers)

训练

训练数据准备

训练命令

引用

致谢

Star趋势

FilesExpand file tree

README_CN.md

Latest commit

History

README_CN.md

File metadata and controls

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

🎉 更新列表

待办列表

模型下载

评估结果

整体评估结果

Diver检索器评估结果

一键使用

推理

一键复现

Retriever使用 （Hugging Face Transformers)

训练

训练数据准备

训练命令

引用

致谢

Star趋势

Retriever使用（Hugging Face Transformers)