AI-powered search for Lex Fridman podcast.
This is a testbed for exploring Langchain functionality.
Start with episode transcriptions from Whisper via @karpathy for first 325 episodes:
https://karpathy.ai/lexicap/index.html
Text splitting and OpenAI embeddings done via Langchain in scripts/get_data.ipynb.
Store embeddings in Pinecone.
Use Langchain VectorDBQAChain to embed the user query and perform similarity search on Pinecone embeddings. Synthesize the answer from relevant chunks with ChatGPT. The relevant chunks with metadata (links) are displayed as source documents in the UI.
This build on the excellent: https://github.com/mckaywrigley/wait-but-why-gpt
Thanks to Mckay Wrigley for his work on the UI and app design.
Of course, thanks for Lex Fridman for the excellent podcast and Karapthy for the Whisper transcriptions.
If you have any questions, feel free to reach out to me on Twitter!