Inspiration

Many students have started using ChatGPT in their courses, myself included. My experience with ChatGPT is a lack of course specific knowledge. Answers to questions won't be at the right level of detail or may use notation I'm not familiar with. ChatGPT just doesn't know the course I'm studying! We thought, how can we build a system like ChatGPT but tuned for each & every course.

What it does

TA Chat is an AI chatbot designed to answer course specific questions. There's a separate chatbot for every course, essentially acting as a TA. Whether you need explanations, clarifications, or code, TA Chat can help!

  • Tuned to each course - Our AI is uniquely specialized for every course you're enrolled in, ensuring it has an in-depth understanding of your specific course material.
  • Familiar Notation - You'll receive responses in a format that's familiar to your coursework.
  • Cites your textbook - Need references? No problem! TA Chat provides citations to your textbook to back up its responses.

How we built it

TA Chat uses a RAG (retrieval augmented generation) system built on top of Cohere's LLMs. RAG works by finding pages in your textbook related to your question, & sending all of that to an LLM. This allows the LLM responses to be tailored to your textbook.

The RAG system uses langchain & FAISS for retrieving pages. We used Cohere's beta Chat API for query generation as well as generating responses from your question & textbook pages.

The whole system was deployed on Streamlit.

Challenges we ran into

Cohere's Chat API is so new that it's not in Langchain yet! We had to use Cohere's raw API to send requests & connect that with Langchain.

Like most ML problems, data quality is a big challenge. Figuring out how to split textbooks into parts is a critical part in this RAG system. If we simply split by page, we might lose relevant information that should be together. Converting PDFs into Markdown then splitting on section headers could be a way to avoid this problem. However, free PDF to Markdown libraries weren't very good at handling complex textbooks. We didn't go down this route, sticking with splitting by page. However, our research suggests paid software may perform significantly better.

Accomplishments that we're proud of

Launching a product that is usable! I hope people find it useful!

What we learned

How to use Cohere, Langchain and related tech! We've never used these technologies before.

What's next for TA Chat

More courses! Hopefully better retrieval & better course specific responses.

Built With

  • cohere
  • faiss
  • langchain
  • llms
  • python
  • rag
  • streamlit
Share this project:

Updates