GPTutor

This project was created for HackNYU 2023 . It aims to create a tutor in the form of a bot. This bot would use the GPT model's ability to parse natural language, yet use the course content as the source of ground truth. This would allow the bot to answer questions that are specific to the course content, and provide a more personalized experience for the student. Additionally, the professor would be able to add to the bot's knowledge base by adding correct answers to questions that the bot gets wrong. Hence, the bot learns and can then answer questions better in the future.

Demo

Note that this is all information either from the professor's website/slides/lectures, or the texts that he has prescribed. There is no connection to the internet.

Answering course-specific questions

Answering questions about the subject material

Answering with code

Asking for confirmation

Using this command, the professor can be called to fact check what the bot has stated. If the answer is incorrect, the professor will correct it, and finally using /done will retrain the model for future similar queries.

Instant Retraining for future queries

Since there is no connection to the internet, the bot cannot answer this question. However, it can call on the Professor to answer it, and once he has, the bot is instantly capable of answering similar questions in the future. We achieve this by not retraining the entire model, but simply adding the embeddings to the knowledge base, thus achieving the same effect.

Inspiration

Each professor teaches their course material in a unique manner. They might pick and choose the concepts they teach or use technical vocabulary in unconventional ways. Furthermore, the order in which they teach these concepts varies from professor to professor. All this can make searching for answers on external resources confusing for students, who may be inundated with concepts they haven't learned yet, or with concepts they have learned but which are now discussed in a different manner or context than they were in class. And since we cannot increase the number of teachers, or the amount of extra time teachers spend in office hours, our GPTBot can fill this gap by providing students a source of inquiry that is always available to them.

What it does

Professors upload their course material (readings, lectures, syllabus, etc.) to our server. Our system preprocesses this data, including transcribing the lectures to text. Once we construct a corpus from all the lesson materials taught up to the present moment, we use it to fine-tune a GPT-3 instance.

Students can then interact with a bot on their class's Discord server and ask it questions in the same way they would ask their professor. In this regard, it's like having infinite clones of their professor! Except these clones are available 24/7 and have instant, accurate recall.

Furthermore, students can improve the bot by giving correct answers a thumbs-up, which adds the Q&A interaction to the model's corpus, meaning that whenever that question is asked in the future, the bot will know the correct answer. Answers students think are incorrect can be given a thumbs-down, in which case the professor will be pinged. Once the professor gives the correct answer, this answer is added to the corpus in the same way as above.

How we built it

We first built a proof of concept by utilizing OpenAI GPT-3 (Davinci) on a small website created by a professor to check whether we can get answers to our prompt.
Next we built a discord bot using discord.py and Python
Next we took the proof of concept code and turned it into a Flask application so we can service prompts coming in from the discord bot
We also used the Huggingface Speech2Text/Google Cloud Speech API for lecture transcription which we added to extend our proof of concept work.

Technical Specification

We first parse the entire course material into embeddings. This would allow us to find relevant content for the queries in future steps.
Next, we extract embeddings from the query. By using a similarity score, we can extract all relevant sections from the knowledge base.
Using prompt engineering, and using the strong natural language processing capabilities of GPT-3, we can generate answers to the query. Along with the query, we also pass the relevant sections from the knowledge base to the model. This allows the model to generate answers that are relevant to the query, and the course material.

Challenges we ran into

Some challenges we ran into were:

Rate limits with the OpenAI API calls on the free tier
Massive data and limits to the query that GPT3 can handle
Embedding mappings for all the text
Setting up the discord bot and creating a thread, and having the bot reply within a thread.

Accomplishments that we're proud of

The discord bot
The GPT model that answers questions relevant to the slides/lectures

What we learned

Prompt engineering
Discord Bot development
A full stack experience of combing an AI model, a flask app and a discord client app

What's next for Tutor Bot

Integration into Brightspace, Campuswire, and any other platform primarily used by NYU students to interact with their coursework, professors, and classmates.
Train the GPTutor model on additional lecture videos, homework/assignments for each course for individual professors to get more context for our answers.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
api		api
assets		assets
discordBot		discordBot
training_data		training_data
.gitignore		.gitignore
Lab 2_ Shell.html		Lab 2_ Shell.html
README.md		README.md
extractSubtitles.py		extractSubtitles.py
os poc.ipynb		os poc.ipynb
poc.ipynb		poc.ipynb
scrape.ipynb		scrape.ipynb
splitter.ipynb		splitter.ipynb
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPTutor

Demo

Inspiration

What it does

How we built it

Technical Specification

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Tutor Bot

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPTutor

Demo

Inspiration

What it does

How we built it

Technical Specification

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Tutor Bot

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages