Lumigo: A Q&A Chatbot for Academic Document Exploration

Ask your question — Lumigo not only retrieves relevant, trusted documents, but also automatically expands its academic database when no matching content is found.

🌐 Live Demo

About | News | Projects | Getting Started | Usage | Roadmap | Contributing | License

About the project

🔍 Tech Stack
Built with Streamlit, LangChain, FAISS, and Google Vertex AI.

Lumigo is an intelligent academic research assistant powered by Retrieval-Augmented Generation (RAG). Designed for precision and explainability, it helps users search, explore, and question academic documents with context-awareness.

Lumigo Framework

Unlike generic RAG systems, Lumigo supports document-specific queries, generates follow-up questions, and provides transparent source attribution for all responses. It uses a local FAISS index for efficient, self-contained vector search, making it ideal for researchers, students, and professionals navigating complex literature with a focus on rigor and clarity.

It leverages:

🗃️ FAISS for local, high-speed vector search
🌐 HuggingFace Embeddings for semantic search
🤖 Vertex AI for LLM-based summarization and QA 🚀 Docker for containerized deployment.

Lumigo Interface

Area	Description
🔍 Search Panel	Input your research question to perform semantic academic search across indexed papers.
📚 Source Documents	Displays retrieved academic documents with LLM-generated summaries and an “Add to Ref” button for selective reference.
📌 Reference Docs	Lists documents you've selected as references. These are used to generate answers and offer traceable, cited content.
❓ Follow-up Questions	Suggests relevant follow-up research questions based on your current query and selected references.

✨ Key Features

Document Ingestion Module: Supports loading academic documents in JSON and PDF formats. Upon ingestion, documents undergo chunking based on fixed character counts to create manageable text segments.
Metadata Tagging Module: Enriches each text chunk with relevant metadata, including document title, section headers, file source, and author details, enabling precise context retrieval and transparent source attribution.
Summarization Module: Utilizes Vertex AI large language models to generate concise summaries for each chunk, improving document preview capabilities and supporting efficient user exploration.
Embedding and Indexing Module: Employs HuggingFace embedding models to convert each chunk into dense semantic vectors. These embeddings and metadata are stored locally in a FAISS index, enabling rapid, self-contained similarity searches.
On-Demand Index Building: A "Build FAISS Index" button in the UI allows for easy, on-the-fly updates to the vector store whenever source documents are changed.
Retrieval-Augmented Generation (RAG) Module: Combines retrieved document chunks with large language models to generate contextually grounded, explainable answers. Users can interactively select which reference documents to include, ensuring transparency and control over the sources informing responses.

The Implementation of RAG

User Interface Module: Provides a responsive, user-friendly frontend where users can perform semantic searches, view document previews and summaries, manage reference documents, and pose targeted questions. The interface emphasizes clarity and traceability, with explicit source citations for all generated answers.
CI/CD Pipeline: GitLab pipelines automate testing, building, and deployment, ensuring continuous integration and delivery for robust development workflows.

New Updates

🎉 Lumigo is born on May 21, 2025

🆕 Recent Updates

2025/07/22 – Migrated vector storage from MongoDB to a local FAISS index, simplifying setup and removing external database dependencies.
2025/07/22 – Added a "Build FAISS Index" button to the UI, allowing users to re-index their source documents on-demand without restarting the application.

📜 Timeline (older updates)

2025/06/15 – Added Multi-Agent workflow for better search and query expansion.

(back to top)

🚀 Getting Started

Clone the repository:

git clone https://github.com/stephanie0324/Lumigo.git
cd Lumigo

Install dependencies:
```
bash script/build-docker-image.sh
```
Build the FAISS Index:

Before running the application for the first time, you must generate the vector index from your source documents. You can do this via the UI or the command line.
```
python src/core/build_faiss_index.py
```
Set up your .env file:
- The .env file is under deploy/ folder
- Fill in the required keys:
  - OPENAI_API_KEY
  - PROJECT_ID (for Google Cloud)
  - LOCATION (for Google Cloud)
  - GOOGLE_APPLICATION_CREDENTIALS (credential filepath)
  - HOST_PORT (optional, defaults to a standard port)
Launch the app:
```
cd deploy
docker-compose up -d

# for development
bash script/run-dev-mode.sh
streamlit run main.py --server.port=7860
```
[! Note] Prod Mode (docker compose): the credential.json should be in the deploy/ folder Dev Mode: the credential file under src/ folder

🛠️ How to Use

Search for academic content
Enter your question in the search panel. The system performs a semantic search over the indexed documents.
Explore and select references
Browse the retrieved documents and their AI-generated summaries. Choose relevant ones by clicking "Add to Ref".
Ask your question
With references selected, ask a question. The system will use only those documents to generate an accurate, grounded answer.
View source and follow-ups
Check the sources of the answer, and explore suggested follow-up questions to continue your research.

💡 Tip: Use the "Build FAISS Index" button in the sidebar's "Developer Tools" section to re-index your documents after making changes to the src/data directory.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Other Projects

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
deploy		deploy
img		img
script		script
src		src
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Lumigo-2.gif		Lumigo-2.gif
README.md		README.md
lumigo-framework.png		lumigo-framework.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lumigo: A Q&A Chatbot for Academic Document Exploration

About | News | Projects | Getting Started | Usage | Roadmap | Contributing | License

About the project

✨ Key Features

New Updates

🆕 Recent Updates

🚀 Getting Started

🛠️ How to Use

License

Other Projects

Connect with me:

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lumigo: A Q&A Chatbot for Academic Document Exploration

About | News | Projects | Getting Started | Usage | Roadmap | Contributing | License

About the project

✨ Key Features

New Updates

🆕 Recent Updates

🚀 Getting Started

🛠️ How to Use

License

Other Projects

Connect with me:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages