Retrieval-Augmented Generation (RAG) Q&A App using Streamlit and Wikipedia
WIKI-RAG is a Streamlit-powered Retrieval-Augmented Generation (RAG) app that answers user questions based on Wikipedia articles.
It uses:
- SentenceTransformers to create semantic embeddings.
- FAISS for efficient vector-based retrieval.
- Hugging Face Question-Answering model (
roberta-base-squad2) for final answer extraction.
- 🔍 Wikipedia Retrieval – Fetches and chunks Wikipedia content.
- 🧠 QA Model – Answers questions using Hugging Face's
roberta-base-squad2. - 🤖 Semantic Search – Dense embeddings using SentenceTransformers.
- ⚡ Vector Search – FAISS-powered similarity search.
WIKI-RAG/
├── https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip # Streamlit app
├── https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip # Notebook for testing in Colab
├── https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip # Dependencies
├── https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip # Documentation
└── .gitignore- Clone the Repository:
git clone https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip cd WIKI-RAG
2.Create Virtual Environment & Install Dependencies:
python -m venv venv
venv\Scripts\activate # Windows
# or
source venv/bin/activate # Mac/Linux
pip install -r https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip
3.Run the App
streamlit run https://github.com/BVNAHUSH/WIKI-RAG/raw/refs/heads/main/advocator/WIK-RAG-v1.3.zip4.Open in Browser:
http://localhost:8501.
☁ Run in Google Colab You can test the RAG pipeline in Google Colab:
🧪 Example Queries
Topic: Artificial Intelligence
Question: Who is known as the father of AI?
Topic: James Webb Space Telescope
Question: What is the mission of JWST?
Topic: World War II
Question: When did World War II start and who were the Axis powers?Workflow Steps & Functions:
-
get_wikipedia_content(topic)
→ Fetches Wikipedia article text for the selected topic. -
split_text(document)
→ Splits the article into smaller overlapping chunks for better retrieval. -
create_embeddings(chunks)
→ Generates embeddings for all chunks using SentenceTransformer. -
build_faiss_index(embeddings)
→ Stores embeddings in a FAISS vector index for fast similarity search. -
retrieve_chunks(query)
→ Encodes the user query and retrieves topksimilar chunks from FAISS. -
answer_question(query, context)
→ Uses Hugging Face QA pipeline (roberta-base-squad2) to find the answer from the retrieved chunks. -
Streamlit UI
→ Collects user input (topic + question) and displays: - Number of chunks - Retrieved chunks - Final answer