About KnowBot: Multi-Agent Academic Research Assistant

Background and Motivation

While Retrieval-Augmented Generation (RAG) systems have grown popular in general web search and chatbots, they often miss key needs in academic research workflows:

Limitation in Existing Systems How KnowBot Addresses It
❌ No source traceability ✅ Explicit citations and document-level attribution in every answer
❌ Broad, unfocused retrieval ✅ User-controlled, document-specific reference selection
❌ Poor document interaction ✅ Markdown previews, LLM summaries, and metadata tags
❌ Opaque reasoning ✅ Clear multi-source attribution for transparency
❌ Lack of academic workflow support ✅ Designed for research: reference refinement, follow-up Q&A, and controlled context

KnowBot was inspired by the need for a research assistant that acts as a thinking partner, guiding users through an iterative, transparent, and explainable academic exploration process — not just spitting out answers.


System Overview

KnowBot uses a multi-agent architecture to mimic how human researchers iteratively explore, refine, and deepen understanding of complex literature. Each agent specializes in part of the workflow, coordinated via a state graph.

Core Modules

1️⃣ Ingest : Load academic documents (PDF, JSON) into the system.

2️⃣ Process : Chunk documents, tag metadata, summarize chunks using Vertex AI, and embed text via HuggingFace BGE.

3️⃣ Index : Store vector embeddings and metadata in MongoDB Atlas with efficient vector search indexes.

4️⃣ Retrieve : Perform semantic search over vector DB; user controls which documents to include.

5️⃣ Interact : Multi-agent system executes query expansion, retrieval, answer generation, and decision-making with LLMs in a loop.

6️⃣ UI : Streamlit interface supports semantic search, document preview, manual reference selection, and follow-up questioning.


Multi-Agent System

Agent Workflow

  • agent_start: Decides exploration mode ("explore" vs. "direct").
  • agent_expand: Generates follow-up or expanded queries to deepen exploration.
  • agent_retrieve: Searches vector DB, retrieves relevant docs, and generates answers citing sources.
  • agent_decide: Determines whether to continue expanding or stop based on answer quality and iteration limits.

Key Features Enabled by Multi-Agent Design

  • Iterative Exploration: Agents cooperate to expand questions and refine answers over multiple turns.
  • Explainability: Each answer cites documents, maintaining a clear trace of sources.
  • Follow-up Suggestions: Expanded queries propose relevant follow-up questions to users.
  • Adaptive Depth: Decision agent controls the exploration length dynamically.

Components and Tech Stack

Component Technology
Language Models Vertex AI (LLMs)
Embeddings HuggingFace BGE
Vector Search MongoDB Atlas
Agent Orchestration LangGraph
Frontend UI Streamlit
Deployment Containerized / Cloud-ready (CI/CD pipelines)

User Interface Highlights

  • Semantic Search & Filtering: Search documents with semantic relevance and filter by metadata.
  • Reference Selection: Manually add/remove documents as contextual references.
  • Preview & Summarization: Inline markdown previews and AI-generated summaries.
  • Follow-up Q&A: Ask iterative questions with answers tied explicitly to cited sources.
  • Traceability: Full transparency with detailed source attribution.

What We Learned

  • The power of multi-agent workflows in managing complex research tasks.
  • How to design stateful, iterative reasoning loops with LangGraph.
  • Importance of prompt engineering for reliable expansion and decision making.
  • Managing semantic drift via similarity scoring to keep queries focused.
  • Balancing explainability and usability in an academic AI assistant.

Challenges Faced

  • Maintaining shared state consistency across agents.
  • Crafting effective prompts for different agent roles.
  • Handling iterative query expansion without straying off-topic.
  • Designing a responsive UI for interactive document exploration.
  • Managing vector DB indexing and retrieval latency.

Limitations & Future Work

Current Limitations Planned Improvements
Fixed chunk sizes breaking context Dynamic chunking and context windowing
Basic metadata tagging Advanced classification and smart tagging
Limited file type support Add DOCX, HTML, and more
No quantitative benchmarking Academic dataset evaluation

Licensing and Contribution

KnowBot is open-source under the MIT License. We welcome contributions and issues via our GitHub repository.


KnowBot empowers researchers with an explainable, multi-agent academic assistant that evolves with the user’s inquiry — making research exploration more precise, transparent, and interactive.

Built With

Share this project:

Updates