Inspiration

The idea of choosing courses for a new semester of learning is exciting but sometimes difficult. In a large school like UVA, students have access to thousands of offerings of courses each semester. Having to choose just 4-5 classes of this vast bunch often leaves students in a tough spot.

Current tools for discovering new courses at UVA include SIS (the official UVA course catalog and scheduling software) and Lou’s List (a great third party UVA course catalog). However, due to the large nature of the course database, it is difficult to use the traditional keyword searches or course filtering to find courses that exactly match what students are looking for.

To help streamline the process of discovering new courses, we decided to build UVA Course Explorer. By accessing the power of large language models, UVA Course explorer allows students to naturally describe what they want to learn and returns courses which best fit their description.

What it does

UVA Course Explorer is a semantic search engine for UVA courses. Using the same family of deep learning models powering ChatGPT, we are able to process user queries with strong natural language understanding and match these queries with relevant courses.

Our engine supports queries that traditional keyword-based search engines perform well with. Examples of such queries include: “C programming” or “Neuroscience”.

Additionally our engine can answer more general, complex queries that are grounded in natural-flowing language. Examples of such queries include:

  • “Learn to make a make a rocket that can go to mars and beyond”
  • “ A discussion about the limits of our understanding of the observable universe”
  • “A class about the impacts of European colonization in the Americas”
  • “How do tiny mutations along DNA result in cells becoming malignant?”
  • "What are the societal impacts of artificial intelligence?"

The engine works by creating vector embeddings of the English course descriptions and the query that a user enters. These embedding vectors can be viewed as representing what an AI “thinks about” some piece of text. By finding course descriptions that yield similar embeddings to the query vector, we are able to find courses most relevant to a user’s query.

How we built it

The application is split into two parts: a React-based front end and a Flask-based back end. The React front end is responsible for rendering and presenting the data to the user in a friendly and efficient manner. We made use of React’s component architecture to streamline rendering and create a fast UX. The application also makes use of concurrency and asynchronous updates to different parts of the UI, allowing the user to quickly access and interact with the data before all the API calls return.

We used the Student Information System (SIS) API to acquire data for all UVA courses in the upcoming semester. We used the OpenAI Embeddings API to generate vector representations of course descriptions we obtained. Whenever the user enters a query, we use the same API to generate a query embedding and calculate the cosine similarity between the query and descriptions of all the courses. We then present the top 10 closest matching course descriptions to the user, along with course timings, up-to-date enrollment metrics, and other relevant information.

At the center of our UI, we also have an interactive visualization that depicts the embeddings of all courses in 3D space. We use the Principal Component Analysis (PCA) algorithm to reduce the dimensions of the embedding vectors (which have 1536 dimensions) to 3 dimensions. Users can filter the output of this graph to visually explore and compare courses from different subject areas. We built this component of our project using the Plotly framework.

Challenges we ran into

We had a few challenges handling client server communication and dynamically updating client elements while maintaining a clean UI. Some interesting design decisions include creating a drop down card for each course and embedding the 3D Plotly graph with our React based client application.

There was a learning curve to working with SIS API. We were fortunate to have found some documentation on the API from UVA’s SDE2 course, but we still needed to do trial and error to figure out how to structure our API calls and what routes to access. Furthermore, there is a bit of latency with the SIS API, so we had to be clever about how to load our data efficiently and create a smooth experience for the user.

Accomplishments that we're proud of

We are proud of our integration between the backend and the polished and aesthetic custom front end, both of which come together in a clean, usable application for the user.

What we learned

Starting with the basics, we learned a lot about web development including languages such as JavaScript and HTML/CSS/JSX, frameworks such as React, and client-server interactions. We also learned how to create functional applications that are designed to serve and help users in meaningful and efficient ways.

We also learned some basic methods to manage latency of APIs to create a smooth experience for the user.

What's next for UVA Course Explorer

We would like to host UVA Course Explorer and make it accessible to the broader UVA community. We believe UVA Course Explorer can make the course selection process easier and more enjoyable. To do this, we will need to use a vector embeddings database to store and efficiently search across the course descriptions. We will also need to deploy our backend onto a scalable cloud service.

Acknowledgements

We would also like to acknowledge Stanford’s Cardinal Compass (https://www.searchclasses.org/), a semantic search engine built for Stanford courses. This project served as proof to us that a search engine built on embedding matching could yield useful results for finding courses.

We would also like to express our sincere gratitude to GitHub Copilot and ChatGPT for their assistance in this project. We hope we have been good users ☺️.

Share this project:

Updates