This repository contains code and data processing for the paper "ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation"
ArchRAG is a novel graph-based RAG approach by using attributed communities organized hierarchically, and introduce a novel LLM-based hierarchical clustering method. For more details, check out our paper.
Paper link: Arxiv
This project implements C-HNSW using a custom Faiss framework. Follow the steps below to set up the environment correctly.
We recommend using conda to manage the environment:
conda create -n archrag python=3.10 -y
conda activate archragInstall the required Python packages using:
pip install -r requirements.txtThe C-HNSW component requires a modified version of Faiss. Please refer to this README for installation instructions.
export PYTHONPATH=$(pwd):$PYTHONPATHUsing our ArchRAG framework requires a two-step, offline index and online retrieval.
Before constructing ArchRAG index, we first use Microsoft GraphRAG to extract KG from corpus, please refer to the source code and instruction.
We provide a bash for constructing ArchRAG index.
bash dataset/index.shWe provide a bash for online retrieval given a specific dataset.
bash dataset/query.shCorpus
{
"title": "FIRST TITLE",
"context": "FIRST TEXT",
"id": 0
}
{
"title": "SECOND TITLE",
"context": "SECOND TEXT",
"id": 1
}Question
{
"question": "QUESTION 1",
"options": "DICT-style options for multiple-choice questions (Optional)",
"answer": "ANSWER",
"answer_idx":"Answer options for multiple-choice questions (Optional)",
"id": 0
}One can use GraphRAG to construct the Knowledge graph and use the "final_entity" and "final_relationship" file.