Skip to content

Latest commit

 

History

History
639 lines (578 loc) · 28.5 KB

File metadata and controls

639 lines (578 loc) · 28.5 KB

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

虽然检索增强生成(RAG)在直接知识检索方面表现出色,但在需要抽象或多步推理的复杂查询面前却表现一般。为了弥补这一不足,我们开发了DIVER,这是一种专门针对这些推理密集型任务设计的检索管道。DIVER集成了四个阶段:文档预处理、迭代的LLM 驱动查询扩展、在复杂合成数据上进行微调的专用检索器,以及一种将listwise与pointwise相结合的新型重排序器。在BRIGHT基准测试中,DIVER创造了新的最佳成绩,显著优于其他推理感知模型(NDCG 45.8)。 这些结果突显了将深度推理融入检索以解决复杂现实世界问题的有效性。更多详情请参阅Diver论文

🎉 更新列表

  • [2025-11-20] 🚀 我们发布了我们的重排模型 Diver-GroupRank-7BDiver-GroupRank-32B, 推理代码可在 ./Retriever/rerank_groupwise.py 找到. 我们的 GroupRank-32B 模型经过测试时增强后可在BRIGHT上达到 46.8 的分数,详见 paper
  • [2025-10-20] 🚀 我们在 ModelScopeHugging Face上发布了 DIVER-Retriever-4B-1020 模型,在 BRIGHT 基准上取得了 31.9 的成绩。
  • [2025-10-14] 🚀 我们在 ModelScopeHugging Face上发布了 DIVER-Retriever-1.7B 模型,在 BRIGHT 基准上取得了 27.3 的成绩。
  • [2025-09-12] 🚀 我们发布了使用 Gemini 的 listwise 重排序代码;可以在 ./Retriever/rerank_listwise.py 找到,并在 BRIGHT 上取得了 43.9 的得分。
  • [2025-09-05] 🚀 我们在 ModelScopeHugging Face上发布了 DIVER-Retriever-0.6B 模型,在 BRIGHT 基准上取得了 25.2 的成绩。
  • [2025-08-28] 🚀 我们在 ModelScope 上发布了 DIVER-Retriever-4B 模型。
  • [2025-08-24] 🏆 我们更新了Diver V2,在Bright Leaderboard效果进一步提升至45.8。
  • [2025-08-18] 🚀 我们开源了Diver的整体代码库包括推理和训练。

待办列表

  • ⬜ 开源 DIVER-VL-EmbeddingDIVER-VL-Reranker:发布源码与模型
  • ✅ 开源 DIVER-Reranker:发布源码与模型

模型下载

您可以下载以下表格,以查看适用于您场景的各种参数。如果您位于中国大陆,我们还将在 ModelScope.cn 上提供该模型,以加快下载速度。

Model #Total Params Context Length Download BRIGHT
Diver-GroupRank-7B 7B 32K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-7B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-7B
Diver-GroupRank-32B 32B 32K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-GroupRank-32B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-GroupRank-32B
46.8
DIVER-Retriever-4B-1020 4B 40K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B-1020
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B-1020
31.9
DIVER-Retriever-4B 4B 40K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-4B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-4B
28.9
DIVER-Retriever-1.7B 1.7B 40K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-1.7B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-1.7B
27.3
DIVER-Retriever-0.6B 0.6B 32K [🤗 HuggingFace]https://huggingface.co/AQ-MedAI/Diver-Retriever-0.6B
[🤖 ModelScope]https://www.modelscope.cn/models/AQ-MedAI/Diver-Retriever-0.6B
25.2

评估结果

整体评估结果

Diver在 BRIGHT榜单上与其他基线的性能比较。每个数据集的最佳结果均以粗体突出显示。

Method Avg. Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT.
Rank-R1-14B 20.5 31.2 38.5 21.2 26.4 22.6 18.9 27.5 9.2 20.2 9.7 11.9 9.2
Qwen1.5-7B with InteRank-3B 27.4 51.2 51.4 22.4 31.9 17.3 26.6 22.4 24.5 23.1 13.5 19.3 25.5
GPT4 with Rank1-32B 29.4 49.7 35.8 22.0 37.5 22.5 21.7 35.0 18.8 32.5 10.8 22.9 43.7
ReasonIR with QwenRerank 36.9 58.2 53.2 32.0 43.6 28.8 37.6 36.0 33.2 34.8 7.9 32.6 45.0
ReasonIR with Rank-R1-32B 38.8 59.5 55.1 37.9 52.7 30.0 39.3 45.1 32.1 17.1 10.7 40.4 45.6
RaDeR with QwenRerank 39.2 58.0 59.2 33.0 49.4 31.8 39.0 36.4 33.5 33.3 10.8 34.2 51.6
XRR2 40.3 63.1 55.4 38.5 52.9 37.1 38.2 44.6 21.9 35.0 15.7 34.4 46.2
ReasonRank 40.8 62.72 55.53 36.7 54.64 35.69 38.03 44.81 29.46 25.56 14.38 41.99 50.06
DIVER 41.6 62.2 58.7 34.4 52.9 35.6 36.5 42.9 38.9 25.4 18.3 40.0 53.1
BGE Reasoner 45.2 66.5 63.7 39.4 50.3 37 42.9 43.7 35.1 44.3 17.2 44.2 58.5
DIVER V2 45.8 68 62.5 42.0 58.2 41.5 44.3 49.2 34.8 32.9 19.1 44.3 52.6

Diver检索器评估结果

Method Avg. Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT.
Evaluate Retriever with Original Query
BM25 14.5 18.9 27.2 14.9 12.5 13.6 18.4 15.0 24.4 7.9 6.2 10.4 4.9
SBERT 14.9 15.1 20.4 16.6 22.7 8.2 11.0 15.3 26.4 7.0 5.3 20.0 10.8
gte-Qwen1.5-7B 22.5 30.6 36.4 17.8 24.6 13.2 22.2 14.8 25.5 9.9 14.4 27.8 32.9
Qwen3-4B 5.6 3.5 8.0 2.3 2.0 1.6 1.0 4.4 2.1 0.1 4.9 18.0 19.2
OpenAI 17.9 23.3 26.7 19.5 27.6 12.8 14.3 20.5 23.6 2.4 8.5 23.5 11.7
Google 20.0 22.7 34.8 19.6 27.8 15.7 20.1 17.1 29.6 3.6 9.3 23.8 15.9
ReasonIR-8B 24.4 26.2 31.4 23.3 30.0 18.0 23.9 20.5 35.0 10.5 14.7 31.9 27.2
RaDeR-7B 25.5 34.6 38.9 22.1 33.0 14.8 22.5 23.7 37.3 5.0 10.2 28.4 35.1
Seed1.5-Embedding 27.2 34.8 46.9 23.4 31.6 19.1 25.4 21.0 43.2 4.9 12.2 33.3 30.5
DIVER-Retriever-0.6B 25.2 36.4 41.9 29.0 31.0 21.2 24.6 23.2 15.6 6.8 8.4 33.2 31.7
DIVER-Retriever-4B 28.9 41.8 43.7 21.7 35.3 21.0 21.2 25.1 37.6 13.2 10.7 38.4 37.3
Evaluate Retriever with GPT-4 REASON-query
BM25 27.0 53.6 54.1 24.3 38.7 18.9 27.7 26.3 19.3 17.6 3.9 19.2 20.8
SBERT 17.8 18.5 26.3 17.5 27.2 8.8 11.8 17.5 24.3 10.3 5.0 22.3 23.5
gte-Qwen1.5-7B 24.8 35.5 43.1 24.3 34.3 15.4 22.9 23.9 25.4 5.2 4.6 28.7 34.6
Qwen3-4B 5.5 1.3 17.3 2.5 6.2 1.0 4.8 4.5 3.0 5.9 0.0 7.2 12.5
OpenAI 23.3 35.2 40.1 25.1 38.0 13.6 18.2 24.2 24.5 6.5 7.7 22.9 23.8
Google 26.2 36.4 45.6 25.6 38.2 18.7 29.5 17.9 31.1 3.7 10.0 27.8 30.4
ReasonIR-8B 29.9 43.6 42.9 32.7 38.8 20.9 25.8 27.5 31.5 19.6 7.4 33.1 35.7
RaDeR-7B 29.2 36.1 42.9 25.2 37.9 16.6 27.4 25.0 34.8 11.9 12.0 37.7 43.4
DIVER-Retriever-4B 32.1 51.9 53.5 29.5 41.2 21.4 27.5 26.1 33.5 11.7 9.5 39.3 39.7
Evaluate retriever with DIVER-QExpand query
ReasonIR-8B 32.6 49.4 44.7 32.4 44.0 26.6 31.8 29.0 32.3 12.8 9.1 40.7 38.4
+BM25 (Hybrid) 35.7 56.8 53.5 33.0 48.5 29.4 34.2 32.0 35.2 16.8 12.9 39.3 36.8
DIVER-Retriever 33.9 54.5 52.7 28.8 44.9 25.1 27.4 29.5 34.5 10.0 14.5 40.7 44.7
+BM25 (Hybrid) 37.2 60.0 55.9 31.8 47.9 27.1 33.9 31.9 35.1 23.1 16.8 36.9 46.6

一键使用

推理

一键复现

sh run_all.sh

Retriever使用 (Hugging Face Transformers)

使用Sentence Transformers

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("AQ-MedAI/Diver-Retriever-4B")


# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

vLLM usage

# Requires vllm>=0.8.5
import torch
import vllm
from vllm import LLM

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model = LLM(model="AQ-MedAI/Diver-Retriever-4B", task="embed")

outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)

训练

我们建议您使用 swift 来对我们的 DIVER-Retriever-4B 进行微调。 在开始训练之前,请确保您的环境已正确配置好。

pip install ms-swift -U
# Install from source
pip install git+https://github.com/modelscope/ms-swift.git

pip install transformers -U

# Optional packages
pip install deepspeed # multi-GPU training
pip install liger-kernel # save GPU memory resources
pip install flash-attn --no-build-isolation

训练数据准备

# LLM
{"query": "sentence1", "response":  "sentence2"}
# MLLM
{"query": "<image>", "response":  "sentence", "images": "/some/images.jpg"}
{"query": "<image>sentence1", "response":  "<image>sentence2", "rejected_response": ["<image>sentence1", "<image>sentence2"], "images": ["/some/images.jpg", "/some/images.jpg", "/some/images.jpg", "/some/images.jpg"]}

训练命令

以infonce loss为例,完整的训练指令如下:

nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node \
swift sft \
    --model DIVER/DIVER-Retriever-4B \
    --task_type embedding \
    --model_type qwen3_emb \
    --train_type full \
    --dataset your_dataset \
    --split_dataset_ratio 0.05 \
    --eval_strategy steps \
    --output_dir output \
    --eval_steps 20 \
    --num_train_epochs 5 \
    --save_steps 20 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --learning_rate 6e-6 \
    --loss_type infonce \
    --label_names labels \
    --dataloader_drop_last true \
    --deepspeed zero3

引用

如果您觉得我们的工作有所帮助,请随时告知我们,我们会非常感激。

@misc{DIVER,
      title={DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval}, 
      author={Meixiu Long and Duolin Sun and Dan Yang and Junjie Wang and Yue Shen and Jian Wang and Peng Wei and Jinjie Gu and Jiahai Wang},
      year={2025},
      eprint={2508.07995},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2508.07995}, 
}

致谢

我们感谢之前的相关研究以及它们所发布的开源资源:BRIGHT, ReasonIR, RaDer, ThinkQE, Qwen3-Embedding

Star趋势

Star History Chart