An interactive AI-powered podcast co-host that can engage in natural conversations, process context from documents, and respond with voice.
-
🎙️ Voice Interaction
- Real-time voice recording
- Speech-to-text using OpenAI's Whisper API
- Text-to-speech using ElevenLabs API
-
📚 Document Processing
- PDF upload and processing
- Vector storage using FAISS
- Context-aware responses
-
🤖 AI Conversation
- GPT-4 powered responses
- Memory of conversation history
- Natural, podcast-style interaction
-
💻 Modern Interface
- Clean UI with Tailwind CSS
- Real-time audio visualization
- Easy-to-use controls
- Backend: FastAPI (Python)
- Frontend: HTML, JavaScript, Tailwind CSS
- AI/ML:
- OpenAI GPT-4 & Whisper
- ElevenLabs TTS
- LangChain
- FAISS Vector Store
-
Clone the repository:
git clone https://github.com/yourusername/ai-podcast-cohost.git cd ai-podcast-cohost -
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile in the project root:OPENAI_API_KEY=your_openai_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key -
Run the application:
python app/main.py
-
Open your browser and navigate to
http://127.0.0.1:53997
-
Upload Context (Optional)
- Click "Upload PDF" to add background knowledge
- The system will process and index the content
-
Start Conversation
- Use the voice recorder: Click "Start Recording" to speak
- Or type messages in the text input
-
Interact with AI
- View your transcribed speech
- See AI responses in text form
- Listen to AI responses through the audio player
ai-podcast-cohost/
├── app/
│ ├── static/
│ │ ├── uploads/ # Temporary file storage
│ │ └── audio/ # Generated audio files
│ ├── templates/
│ │ └── index.html # Main UI template
│ ├── main.py # FastAPI application
│ └── utils.py # Core functionality
├── requirements.txt
└── .env
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.