GenAI Agentic Voice-to-Voice Product Discovery

A sophisticated multi-agent system built with LangGraph that enables voice-to-voice product discovery through an intelligent pipeline of specialized agents. The application processes natural language queries (text or voice) and provides intelligent product recommendations using RAG (Retrieval-Augmented Generation) and web search capabilities.

🎬 Demo

This is the demo for showing ASR and TTS feature, receiving grounded product recommendations through natural spoken interaction.

demo_recording.mp4

🎯 Features

🎤 Voice Input: Automatic Speech Recognition (ASR) using OpenAI Whisper
🔊 Voice Output: Text-to-Speech (TTS) using OpenAI's GPT-4o-mini-tts
📝 Text Input: Traditional text-based query interface
🤖 Multi-Agent Pipeline: Specialized agents working in sequence:
- Router Agent: Identifies tasks and extracts constraints
- Planner Agent: Creates retrieval strategies
- Retriever Agent: Fetches relevant data from multiple sources
- Answer/Critic Agent: Synthesizes final responses with citations
🔍 RAG-Based Search: Semantic search over private product catalog using ChromaDB
🌐 Web Search Integration: Live product comparison via Serper.dev API
💬 Streamlit UI: Interactive web interface for easy interaction
🔄 MCP Server: FastAPI-based Model Context Protocol server for tool exposure
🔌 Flexible LLM Support: Works with OpenAI or local Ollama models

🏗️ Architecture

System Components

Agent Pipeline Flow

Router Agent: Analyzes user input to identify:
- Main task/query
- Constraints (budget, materials, brands)
- Safety concerns
Planner Agent: Creates retrieval plan:
- Data source selection (private catalog, live web, or both)
- Fields to retrieve
- Comparison criteria
Retriever Agent: Executes retrieval:
- Calls RAG search on private ChromaDB catalog
- Optionally calls web search for live price comparison
- Aggregates results
Answer/Critic Agent: Synthesizes response:
- Creates final answer using retrieved knowledge
- Cites specific data points
- Flags safety concerns if present

MCP Server

The MCP (Model Context Protocol) server exposes tools via FastAPI:

POST /tools/rag.search - Semantic product search
POST /tools/web.search - Web/shopping search
GET /tools - Tool discovery endpoint

📋 Prerequisites

Python: 3.8-3.12
ffmpeg: For audio processing
API Keys:
- OpenAI API key (for LLM and TTS) OR Ollama (for local LLM)
- Serper API Key (for web search)
ChromaDB: Vector database for product catalog (included in project)

🚀 Installation

1. Clone the Repository

git clone <repository-url>
cd GenAI_Agentic_VoiceToVoice_Product_Discovery

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Install ffmpeg

macOS:

brew install ffmpeg

Linux:

sudo apt-get install ffmpeg

Windows: Download from ffmpeg.org and add to PATH.

5. Optional: Install Audio Libraries for Recording

For microphone recording functionality:

pip install soundfile sounddevice

6. Set Up Environment Variables

Create a .env file in the project root:

# LLM Configuration
OPENAI_API_KEY=your_openai_api_key_here
MODEL_PROVIDER=openai  # or "ollama" for local models
OPENAI_MODEL=gpt-4  # optional, default: gpt-4
OLLAMA_MODEL=llama3.1  # optional, default: llama3.1
OLLAMA_BASE_URL=http://localhost:11434  # optional, default: http://localhost:11434
SERPER_API_KEY=your_serper_api_key_here

# MCP Server
MCP_BASE_URL=http://0.0.0.0:8001

7. Set Up Product Catalog (Optional)

If you need to rebuild the product index:

Ensure agents/tools/data/products.parquet exists
Run the build index notebook: 2. build_index.ipynb
The ChromaDB collection will be created at agents/tools/data/chroma_toys/

🎮 Running the Application

Start the MCP Server

First, start the MCP server (required for tool access):

cd agents
python mcp_server.py

The server will run on http://0.0.0.0:8001.

Start the Streamlit App

In a new terminal:

Start the MCP server (required for tool access):

python agents/mcp_server.py

The MCP server will run on http://0.0.0.0:8001.

In a separate terminal, start the Streamlit app:

streamlit run streamlit_app.py

The application will open in your browser at http://localhost:8501.

Alternative: Command-Line Interface

You can also run the voice-to-voice pipeline directly:

cd agents
python main.py

This will:

Record/load audio from agents/recording/recording0.wav
Transcribe using Whisper
Process through the agent graph
Output text response
Generate TTS audio output

📁 Project Structure

GenAI_Agentic_VoiceToVoice_Product_Discovery/
├── agents/
│   ├── graph/
│   │   ├── __init__.py
│   │   └── graph.py              # LangGraph multi-agent pipeline
│   ├── recording/
│   │   └── recording0.wav        # Sample audio input
│   └── tools/
│       ├── data/
│       │   ├── chroma_toys/      # ChromaDB persistent vector store
│       │   ├── products.parquet  # Cleaned product metadata
│       ├── rag_search.py         # RAG search (Chroma + filters)
│       ├── web_search.py         # Serper.dev web/shopping search tool
│       └── __init__.py
│    ├── data_analysis.ipynb       # Exploratory analysis (optional)
│    ├── llm_judge.ipynb           # LLM evaluation or testing notebook
│    ├── main.py                   # Potential CLI entry (if used)
│    ├── mcp_server.py             # FastAPI MCP server exposing rag.search & web.search
│    ├── tts.py                    # Text-to-speech implementation (gpt-4o-mini-tts)
│    ├── whisper_ars.ipynb         # Whisper ASR exploration notebook
│    └── whisper_ars.py            # Whisper “medium” ASR script
│
├── .env.example                  # Environment variable template
├── .gitignore
│
├── 1. data_preprocessing.ipynb   # Clean Amazon dataset → features/ingredients/brand
├── 2. build_index.py             # Build embeddings + Chroma index
├── 3. rag_logic.ipynb            # Core RAG Engine
├── 4. eval_rag.ipynb             # Recall@K and custom query evaluation
│
├── README.md                     # Project documentation
├── requirements.txt              # Default environment
├── requirements_python12.txt     # Python 3.12 compatible environment
└── streamlit_app.py              # Streamlit UI for voice-to-voice demo

💡 Usage Examples

Example 1: Text Query via Streamlit

Open the Streamlit app
Select "Text Input"
Enter: "I need an eco-friendly puzzle under $15"
Click "Submit"
View the agent's response with product recommendations

Example 2: Voice Query

Select "Record Audio" or "Audio Upload"
Record/upload your query (e.g., "Find me a safe toy for a 3-year-old")
The system will:
- Transcribe your voice
- Process through agents
- Return recommendations
- Optionally speak the response (if TTS enabled)

Example 3: Direct API Usage

from agents.graph.graph import app

result = app.invoke({
    "input": "I need a stainless steel cleaner under $20"
})

print(result['response'])

🔧 Configuration

Model Provider Options

OpenAI (Cloud):

Models: gpt-4, gpt-4o, gpt-4-turbo, gpt-4o-mini, gpt-3.5-turbo
Requires: OPENAI_API_KEY

Ollama (Local):

Models: llama3.1, llama3.2, mistral, qwen2.5, etc.
Requires: Ollama installed and running locally
Set: MODEL_PROVIDER=ollama in .env

Whisper Model

The default Whisper model is medium. To change:

Edit agents/whisper_ars.py or streamlit_app.py
Options: tiny, base, small, medium, large

TTS Configuration

TTS uses OpenAI's gpt-4o-mini-tts model with voice coral. To customize:

Edit agents/tts.py
Available voices: alloy, echo, fable, onyx, nova, shimmer, coral

📊 Data Models

AgentState

The state object passed between agents:

{
  "input": str,              # User query
  "response": str,           # Final response
  "done": bool,             # Completion flag
  "intent": Optional[str],   # Task identification from Router
  "plan": Optional[str],    # Retrieval plan from Planner
  "knowledge": Optional[str], # Retrieved data from Retriever
  "retrieved_context": Optional[List[Dict[str, Any]]] # Tool messages from retrieval
}

RAG Search

Input (RagSearchInput):

{
  "query": str,                    # Required: Natural language query
  "top_k": int,                    # Optional: Number of results (1-20, default: 3)
  "price": Optional[object],       # Optional: ChromaDB-style price filter
                                    #   Example: {"$lt": 30} or {"$gte": 10}
                                    #   Supported operators: $lt, $lte, $gt, $gte, $eq, $in
  "rating": Optional[object],      # Optional: ChromaDB-style rating filter (0-5)
                                    #   Example: {"$gte": 3.5}
                                    #   Supported operators: $lt, $lte, $gt, $gte, $eq
  "brand": Optional[str]           # Optional: Brand name filter
}

Output (RagSearchOutput):

{
    "products": [
        {
            "sku": "PROD001",
            "title": "Eco-Friendly Puzzle Set",
            "doc_id": "PROD001",
            "price": 12.99,
            "rating": 4.5,
            "brand": "LEGO",
            "category": "Puzzles",
            "score": 0.85
        }
    ]
}

Web Search

Input (WebSearchInput):

{
    "query": "eco-friendly puzzle under $15",
    "max_results": 5,
    "mode": "shopping"
}

Output (WebSearchOutput):

{
    "results": [
        {
            "title": "Eco-Friendly Puzzle - Amazon",
            "url": "https://amazon.com/...",
            "snippet": "Sustainable puzzle made from...",
            "price": 13.99,
            "availability": "In stock",
            "rating": 4.3,
            "rating_count": 1250
        }
    ],
    "note": null
}

🛠️ Development

Adding New Tools

Create tool function in agents/tools/
Define input/output schemas
Add endpoint to agents/mcp_server.py
Update retrieval_tool() in agents/graph/graph.py if needed

Customizing Agents

Edit agent prompts in agents/graph/graph.py:

router_node(): Task identification logic
planner_node(): Retrieval planning logic
answer_critic_node(): Response synthesis logic

Testing

Test individual components:

# Test RAG search
from agents.tools.rag_search import rag_search_tool
result = rag_search_tool({"query": "puzzle", "top_k": 3})

# Test web search
from agents.tools.web_search import web_search_tool
result = web_search_tool({"query": "puzzle", "max_results": 3})

🐛 Troubleshooting

MCP Server Connection Error

Ensure MCP server is running: python agents/mcp_server.py
Check MCP_BASE_URL in .env matches server address
Verify port 8001 is not in use

Whisper Model Loading Issues

Ensure sufficient disk space (models can be 1-3GB)
Check internet connection for first-time download
Try smaller model (base instead of medium)

ChromaDB Collection Not Found

Run 2. build_index.py to create the collection
Verify agents/tools/data/chroma_toys/ exists
Check collection name matches products_toys in rag_search.py

Audio Recording Not Working

Install audio libraries: pip install soundfile sounddevice
Check microphone permissions
Verify audio device is available

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
agents		agents
.env.example		.env.example
.gitignore		.gitignore
1. data_preprocessing.ipynb		1. data_preprocessing.ipynb
2. build_index.py		2. build_index.py
3. rag_logic.ipynb		3. rag_logic.ipynb
4. eval_rag.ipynb		4. eval_rag.ipynb
README.md		README.md
demo_recording.mp4		demo_recording.mp4
requirements.txt		requirements.txt
requirements_python12.txt		requirements_python12.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

GenAI Agentic Voice-to-Voice Product Discovery

🎬 Demo

🎯 Features

🏗️ Architecture

System Components

Agent Pipeline Flow

MCP Server

📋 Prerequisites

🚀 Installation

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Install ffmpeg

5. Optional: Install Audio Libraries for Recording

6. Set Up Environment Variables

7. Set Up Product Catalog (Optional)

🎮 Running the Application

Start the MCP Server

Start the Streamlit App

Alternative: Command-Line Interface

📁 Project Structure

💡 Usage Examples

Example 1: Text Query via Streamlit

Example 2: Voice Query

Example 3: Direct API Usage

🔧 Configuration

Model Provider Options

Whisper Model

TTS Configuration

📊 Data Models

AgentState

RAG Search

Web Search

🛠️ Development

Adding New Tools

Customizing Agents

Testing

🐛 Troubleshooting

MCP Server Connection Error

Whisper Model Loading Issues

ChromaDB Collection Not Found

Audio Recording Not Working

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages