GIF
Knowbot Interface
Gitlab Build History

About KnowBot: Multi-Agent Academic Research Assistant

Background and Motivation

While Retrieval-Augmented Generation (RAG) systems have grown popular in general web search and chatbots, they often miss key needs in academic research workflows:

Limitation in Existing Systems	How KnowBot Addresses It
❌ No source traceability	✅ Explicit citations and document-level attribution in every answer
❌ Broad, unfocused retrieval	✅ User-controlled, document-specific reference selection
❌ Poor document interaction	✅ Markdown previews, LLM summaries, and metadata tags
❌ Opaque reasoning	✅ Clear multi-source attribution for transparency
❌ Lack of academic workflow support	✅ Designed for research: reference refinement, follow-up Q&A, and controlled context

KnowBot was inspired by the need for a research assistant that acts as a thinking partner, guiding users through an iterative, transparent, and explainable academic exploration process — not just spitting out answers.

System Overview

KnowBot uses a multi-agent architecture to mimic how human researchers iteratively explore, refine, and deepen understanding of complex literature. Each agent specializes in part of the workflow, coordinated via a state graph.

Core Modules

1️⃣ Ingest : Load academic documents (PDF, JSON) into the system.

2️⃣ Process : Chunk documents, tag metadata, summarize chunks using Vertex AI, and embed text via HuggingFace BGE.

3️⃣ Index : Store vector embeddings and metadata in MongoDB Atlas with efficient vector search indexes.

4️⃣ Retrieve : Perform semantic search over vector DB; user controls which documents to include.

5️⃣ Interact : Multi-agent system executes query expansion, retrieval, answer generation, and decision-making with LLMs in a loop.

6️⃣ UI : Streamlit interface supports semantic search, document preview, manual reference selection, and follow-up questioning.

Multi-Agent System

Agent Workflow

agent_start: Decides exploration mode ("explore" vs. "direct").
agent_expand: Generates follow-up or expanded queries to deepen exploration.
agent_retrieve: Searches vector DB, retrieves relevant docs, and generates answers citing sources.
agent_decide: Determines whether to continue expanding or stop based on answer quality and iteration limits.

Key Features Enabled by Multi-Agent Design

Iterative Exploration: Agents cooperate to expand questions and refine answers over multiple turns.
Explainability: Each answer cites documents, maintaining a clear trace of sources.
Follow-up Suggestions: Expanded queries propose relevant follow-up questions to users.
Adaptive Depth: Decision agent controls the exploration length dynamically.

Components and Tech Stack

Component	Technology
Language Models	Vertex AI (LLMs)
Embeddings	HuggingFace BGE
Vector Search	MongoDB Atlas
Agent Orchestration	LangGraph
Frontend UI	Streamlit
Deployment	Containerized / Cloud-ready (CI/CD pipelines)

User Interface Highlights

Semantic Search & Filtering: Search documents with semantic relevance and filter by metadata.
Reference Selection: Manually add/remove documents as contextual references.
Preview & Summarization: Inline markdown previews and AI-generated summaries.
Follow-up Q&A: Ask iterative questions with answers tied explicitly to cited sources.
Traceability: Full transparency with detailed source attribution.

What We Learned

The power of multi-agent workflows in managing complex research tasks.
How to design stateful, iterative reasoning loops with LangGraph.
Importance of prompt engineering for reliable expansion and decision making.
Managing semantic drift via similarity scoring to keep queries focused.
Balancing explainability and usability in an academic AI assistant.

Challenges Faced

Maintaining shared state consistency across agents.
Crafting effective prompts for different agent roles.
Handling iterative query expansion without straying off-topic.
Designing a responsive UI for interactive document exploration.
Managing vector DB indexing and retrieval latency.

Limitations & Future Work

Current Limitations	Planned Improvements
Fixed chunk sizes breaking context	Dynamic chunking and context windowing
Basic metadata tagging	Advanced classification and smart tagging
Limited file type support	Add DOCX, HTML, and more
No quantitative benchmarking	Academic dataset evaluation

Licensing and Contribution

KnowBot is open-source under the MIT License. We welcome contributions and issues via our GitHub repository.

KnowBot empowers researchers with an explainable, multi-agent academic assistant that evolves with the user’s inquiry — making research exploration more precise, transparent, and interactive.

Built With

gitlab
langchain
mongodb
python
streamlit
vertexai

Updates

stephanie0324 Chiang started this project — Jun 16, 2025 08:52 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.