Skip to content

stephanie0324/Lumigo

Repository files navigation

Lumigo: A Q&A Chatbot for Academic Document Exploration

Build Build badge badge badge

Ask your question — Lumigo not only retrieves relevant, trusted documents, but also automatically expands its academic database when no matching content is found.

🌐 Live Demo


About the project

🔍 Tech Stack
Built with Streamlit, LangChain, FAISS, and Google Vertex AI.

Lumigo is an intelligent academic research assistant powered by Retrieval-Augmented Generation (RAG). Designed for precision and explainability, it helps users search, explore, and question academic documents with context-awareness.

Lumigo Framework

Lumigo Framework

Unlike generic RAG systems, Lumigo supports document-specific queries, generates follow-up questions, and provides transparent source attribution for all responses. It uses a local FAISS index for efficient, self-contained vector search, making it ideal for researchers, students, and professionals navigating complex literature with a focus on rigor and clarity.

It leverages:

🗃️ FAISS for local, high-speed vector search
🌐 HuggingFace Embeddings for semantic search
🤖 Vertex AI for LLM-based summarization and QA 🚀 Docker for containerized deployment.

Lumigo Interface

Lumigo Interface
Area Description
🔍 Search Panel Input your research question to perform semantic academic search across indexed papers.
📚 Source Documents Displays retrieved academic documents with LLM-generated summaries and an “Add to Ref” button for selective reference.
📌 Reference Docs Lists documents you've selected as references. These are used to generate answers and offer traceable, cited content.
Follow-up Questions Suggests relevant follow-up research questions based on your current query and selected references.

✨ Key Features

  • Document Ingestion Module: Supports loading academic documents in JSON and PDF formats. Upon ingestion, documents undergo chunking based on fixed character counts to create manageable text segments.
  • Metadata Tagging Module: Enriches each text chunk with relevant metadata, including document title, section headers, file source, and author details, enabling precise context retrieval and transparent source attribution.
  • Summarization Module: Utilizes Vertex AI large language models to generate concise summaries for each chunk, improving document preview capabilities and supporting efficient user exploration.
  • Embedding and Indexing Module: Employs HuggingFace embedding models to convert each chunk into dense semantic vectors. These embeddings and metadata are stored locally in a FAISS index, enabling rapid, self-contained similarity searches.
  • On-Demand Index Building: A "Build FAISS Index" button in the UI allows for easy, on-the-fly updates to the vector store whenever source documents are changed.
  • Retrieval-Augmented Generation (RAG) Module: Combines retrieved document chunks with large language models to generate contextually grounded, explainable answers. Users can interactively select which reference documents to include, ensuring transparency and control over the sources informing responses.

drawing
The Implementation of RAG

  • User Interface Module: Provides a responsive, user-friendly frontend where users can perform semantic searches, view document previews and summaries, manage reference documents, and pose targeted questions. The interface emphasizes clarity and traceability, with explicit source citations for all generated answers.
  • CI/CD Pipeline: GitLab pipelines automate testing, building, and deployment, ensuring continuous integration and delivery for robust development workflows.

New Updates

🎉 Lumigo is born on May 21, 2025

🆕 Recent Updates

  • 2025/07/22 – Migrated vector storage from MongoDB to a local FAISS index, simplifying setup and removing external database dependencies.
  • 2025/07/22 – Added a "Build FAISS Index" button to the UI, allowing users to re-index their source documents on-demand without restarting the application.
📜 Timeline (older updates)
  • 2025/06/15 – Added Multi-Agent workflow for better search and query expansion.

(back to top)

🚀 Getting Started

  1. Clone the repository:

    git clone https://github.com/stephanie0324/Lumigo.git
    cd Lumigo
  2. Install dependencies:

    bash script/build-docker-image.sh
  3. Build the FAISS Index:

    Before running the application for the first time, you must generate the vector index from your source documents. You can do this via the UI or the command line.

    python src/core/build_faiss_index.py
  4. Set up your .env file:

    • The .env file is under deploy/ folder
    • Fill in the required keys:
      • OPENAI_API_KEY
      • PROJECT_ID (for Google Cloud)
      • LOCATION (for Google Cloud)
      • GOOGLE_APPLICATION_CREDENTIALS (credential filepath)
      • HOST_PORT (optional, defaults to a standard port)
  5. Launch the app:

    cd deploy
    docker-compose up -d
    
    # for development
    bash script/run-dev-mode.sh
    streamlit run main.py --server.port=7860

    [! Note] Prod Mode (docker compose): the credential.json should be in the deploy/ folder Dev Mode: the credential file under src/ folder


🛠️ How to Use

  1. Search for academic content
    Enter your question in the search panel. The system performs a semantic search over the indexed documents.

  2. Explore and select references
    Browse the retrieved documents and their AI-generated summaries. Choose relevant ones by clicking "Add to Ref".

  3. Ask your question
    With references selected, ask a question. The system will use only those documents to generate an accurate, grounded answer.

  4. View source and follow-ups
    Check the sources of the answer, and explore suggested follow-up questions to continue your research.

💡 Tip: Use the "Build FAISS Index" button in the sidebar's "Developer Tools" section to re-index your documents after making changes to the src/data directory.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Other Projects

(back to top)

Connect with me:

steph0324 steph0324 steph0324 steph0324 steph0324

About

Lumigo is an AI-powered Q&A assistant that transforms your documents—PDFs, Markdown, CSVs—into a searchable knowledge base. Simply upload your data, and KnowBot processes it to answer your questions. Built with LangGraph, LangChain, and Streamlit, it offers an intuitive interface for seamless interaction.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors