A sophisticated multi-agent system built with LangGraph that enables voice-to-voice product discovery through an intelligent pipeline of specialized agents. The application processes natural language queries (text or voice) and provides intelligent product recommendations using RAG (Retrieval-Augmented Generation) and web search capabilities.
This is the demo for showing ASR and TTS feature, receiving grounded product recommendations through natural spoken interaction.
demo_recording.mp4
- 🎤 Voice Input: Automatic Speech Recognition (ASR) using OpenAI Whisper
- 🔊 Voice Output: Text-to-Speech (TTS) using OpenAI's GPT-4o-mini-tts
- 📝 Text Input: Traditional text-based query interface
- 🤖 Multi-Agent Pipeline: Specialized agents working in sequence:
- Router Agent: Identifies tasks and extracts constraints
- Planner Agent: Creates retrieval strategies
- Retriever Agent: Fetches relevant data from multiple sources
- Answer/Critic Agent: Synthesizes final responses with citations
- 🔍 RAG-Based Search: Semantic search over private product catalog using ChromaDB
- 🌐 Web Search Integration: Live product comparison via Serper.dev API
- 💬 Streamlit UI: Interactive web interface for easy interaction
- 🔄 MCP Server: FastAPI-based Model Context Protocol server for tool exposure
- 🔌 Flexible LLM Support: Works with OpenAI or local Ollama models
-
Router Agent: Analyzes user input to identify:
- Main task/query
- Constraints (budget, materials, brands)
- Safety concerns
-
Planner Agent: Creates retrieval plan:
- Data source selection (private catalog, live web, or both)
- Fields to retrieve
- Comparison criteria
-
Retriever Agent: Executes retrieval:
- Calls RAG search on private ChromaDB catalog
- Optionally calls web search for live price comparison
- Aggregates results
-
Answer/Critic Agent: Synthesizes response:
- Creates final answer using retrieved knowledge
- Cites specific data points
- Flags safety concerns if present
The MCP (Model Context Protocol) server exposes tools via FastAPI:
POST /tools/rag.search- Semantic product searchPOST /tools/web.search- Web/shopping searchGET /tools- Tool discovery endpoint
- Python: 3.8-3.12
- ffmpeg: For audio processing
- API Keys:
- OpenAI API key (for LLM and TTS) OR Ollama (for local LLM)
- Serper API Key (for web search)
- ChromaDB: Vector database for product catalog (included in project)
git clone <repository-url>
cd GenAI_Agentic_VoiceToVoice_Product_Discoverypython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtmacOS:
brew install ffmpegLinux:
sudo apt-get install ffmpegWindows: Download from ffmpeg.org and add to PATH.
For microphone recording functionality:
pip install soundfile sounddeviceCreate a .env file in the project root:
# LLM Configuration
OPENAI_API_KEY=your_openai_api_key_here
MODEL_PROVIDER=openai # or "ollama" for local models
OPENAI_MODEL=gpt-4 # optional, default: gpt-4
OLLAMA_MODEL=llama3.1 # optional, default: llama3.1
OLLAMA_BASE_URL=http://localhost:11434 # optional, default: http://localhost:11434
SERPER_API_KEY=your_serper_api_key_here
# MCP Server
MCP_BASE_URL=http://0.0.0.0:8001If you need to rebuild the product index:
- Ensure
agents/tools/data/products.parquetexists - Run the build index notebook:
2. build_index.ipynb - The ChromaDB collection will be created at
agents/tools/data/chroma_toys/
First, start the MCP server (required for tool access):
cd agents
python mcp_server.pyThe server will run on http://0.0.0.0:8001.
In a new terminal:
- Start the MCP server (required for tool access):
python agents/mcp_server.pyThe MCP server will run on http://0.0.0.0:8001.
- In a separate terminal, start the Streamlit app:
streamlit run streamlit_app.pyThe application will open in your browser at http://localhost:8501.
You can also run the voice-to-voice pipeline directly:
cd agents
python main.pyThis will:
- Record/load audio from
agents/recording/recording0.wav - Transcribe using Whisper
- Process through the agent graph
- Output text response
- Generate TTS audio output
GenAI_Agentic_VoiceToVoice_Product_Discovery/
├── agents/
│ ├── graph/
│ │ ├── __init__.py
│ │ └── graph.py # LangGraph multi-agent pipeline
│ ├── recording/
│ │ └── recording0.wav # Sample audio input
│ └── tools/
│ ├── data/
│ │ ├── chroma_toys/ # ChromaDB persistent vector store
│ │ ├── products.parquet # Cleaned product metadata
│ ├── rag_search.py # RAG search (Chroma + filters)
│ ├── web_search.py # Serper.dev web/shopping search tool
│ └── __init__.py
│ ├── data_analysis.ipynb # Exploratory analysis (optional)
│ ├── llm_judge.ipynb # LLM evaluation or testing notebook
│ ├── main.py # Potential CLI entry (if used)
│ ├── mcp_server.py # FastAPI MCP server exposing rag.search & web.search
│ ├── tts.py # Text-to-speech implementation (gpt-4o-mini-tts)
│ ├── whisper_ars.ipynb # Whisper ASR exploration notebook
│ └── whisper_ars.py # Whisper “medium” ASR script
│
├── .env.example # Environment variable template
├── .gitignore
│
├── 1. data_preprocessing.ipynb # Clean Amazon dataset → features/ingredients/brand
├── 2. build_index.py # Build embeddings + Chroma index
├── 3. rag_logic.ipynb # Core RAG Engine
├── 4. eval_rag.ipynb # Recall@K and custom query evaluation
│
├── README.md # Project documentation
├── requirements.txt # Default environment
├── requirements_python12.txt # Python 3.12 compatible environment
└── streamlit_app.py # Streamlit UI for voice-to-voice demo
- Open the Streamlit app
- Select "Text Input"
- Enter: "I need an eco-friendly puzzle under $15"
- Click "Submit"
- View the agent's response with product recommendations
- Select "Record Audio" or "Audio Upload"
- Record/upload your query (e.g., "Find me a safe toy for a 3-year-old")
- The system will:
- Transcribe your voice
- Process through agents
- Return recommendations
- Optionally speak the response (if TTS enabled)
from agents.graph.graph import app
result = app.invoke({
"input": "I need a stainless steel cleaner under $20"
})
print(result['response'])OpenAI (Cloud):
- Models:
gpt-4,gpt-4o,gpt-4-turbo,gpt-4o-mini,gpt-3.5-turbo - Requires:
OPENAI_API_KEY
Ollama (Local):
- Models:
llama3.1,llama3.2,mistral,qwen2.5, etc. - Requires: Ollama installed and running locally
- Set:
MODEL_PROVIDER=ollamain.env
The default Whisper model is medium. To change:
- Edit
agents/whisper_ars.pyorstreamlit_app.py - Options:
tiny,base,small,medium,large
TTS uses OpenAI's gpt-4o-mini-tts model with voice coral. To customize:
- Edit
agents/tts.py - Available voices:
alloy,echo,fable,onyx,nova,shimmer,coral
The state object passed between agents:
{
"input": str, # User query
"response": str, # Final response
"done": bool, # Completion flag
"intent": Optional[str], # Task identification from Router
"plan": Optional[str], # Retrieval plan from Planner
"knowledge": Optional[str], # Retrieved data from Retriever
"retrieved_context": Optional[List[Dict[str, Any]]] # Tool messages from retrieval
}Input (RagSearchInput):
{
"query": str, # Required: Natural language query
"top_k": int, # Optional: Number of results (1-20, default: 3)
"price": Optional[object], # Optional: ChromaDB-style price filter
# Example: {"$lt": 30} or {"$gte": 10}
# Supported operators: $lt, $lte, $gt, $gte, $eq, $in
"rating": Optional[object], # Optional: ChromaDB-style rating filter (0-5)
# Example: {"$gte": 3.5}
# Supported operators: $lt, $lte, $gt, $gte, $eq
"brand": Optional[str] # Optional: Brand name filter
}Output (RagSearchOutput):
{
"products": [
{
"sku": "PROD001",
"title": "Eco-Friendly Puzzle Set",
"doc_id": "PROD001",
"price": 12.99,
"rating": 4.5,
"brand": "LEGO",
"category": "Puzzles",
"score": 0.85
}
]
}Input (WebSearchInput):
{
"query": "eco-friendly puzzle under $15",
"max_results": 5,
"mode": "shopping"
}Output (WebSearchOutput):
{
"results": [
{
"title": "Eco-Friendly Puzzle - Amazon",
"url": "https://amazon.com/...",
"snippet": "Sustainable puzzle made from...",
"price": 13.99,
"availability": "In stock",
"rating": 4.3,
"rating_count": 1250
}
],
"note": null
}- Create tool function in
agents/tools/ - Define input/output schemas
- Add endpoint to
agents/mcp_server.py - Update
retrieval_tool()inagents/graph/graph.pyif needed
Edit agent prompts in agents/graph/graph.py:
router_node(): Task identification logicplanner_node(): Retrieval planning logicanswer_critic_node(): Response synthesis logic
Test individual components:
# Test RAG search
from agents.tools.rag_search import rag_search_tool
result = rag_search_tool({"query": "puzzle", "top_k": 3})
# Test web search
from agents.tools.web_search import web_search_tool
result = web_search_tool({"query": "puzzle", "max_results": 3})- Ensure MCP server is running:
python agents/mcp_server.py - Check
MCP_BASE_URLin.envmatches server address - Verify port 8001 is not in use
- Ensure sufficient disk space (models can be 1-3GB)
- Check internet connection for first-time download
- Try smaller model (
baseinstead ofmedium)
- Run
2. build_index.pyto create the collection - Verify
agents/tools/data/chroma_toys/exists - Check collection name matches
products_toysinrag_search.py
- Install audio libraries:
pip install soundfile sounddevice - Check microphone permissions
- Verify audio device is available