Inspiration

CollegeKit came out of our collective annoyance at the college application process, as it was cumbersome and disorganized. Moreover, it was difficult to connect with other students who had already gone through the college application process. To address these challenges, we created CollegeKit as a comprehensive tool that provides organized information and connects students with valuable resources and peer knowledge.

What it does

CollegeKit serves as a starter pack for students applying to college or currently enrolled in college and includes the following three tools:

Profile Matcher: The Profile Matcher helps students find similar students on Reddit through r/collegeresults: a subreddit where students share their demographics, high school experience, college application results, and advice for future students. By exploring profiles of successful applicants, students can gain insights into their extracurricular activities, essays, and more. This feature is particularly useful, as students can learn what they need to do to increase their chances of admission to their dream universities.

24/7 Essay Reviewer: Getting good feedback from others was challenging when we were writing our college essays, so we made an Essay Reviewer trained on data from College Essay Guy, a renowned resource for essay help. Students can rely on this to guide them through the essay writing process and offer suggestions and feedback.

Search Tool: The Search Tool was made to simplify the process of finding specific information about colleges. College websites are extremely disorganized and this makes it difficult to locate specific details. he Search Tool makes it easy to find the information students need, such as application deadlines, program details, or specific admission requirements.

How we built it

For the user interface, CollegeKit was built using Streamlit, which made it extremely easy to create a simple and elegant solution. Here's how we built each tool:

Profile Matcher: To create the Profile Matcher, Reddit posts were scraped using Langchain's Reddit Posts Loader. The Sentence Transformers library from HuggingFace was used to embed the data, and Pinecone was used to store the embeddings. Finally, semantic search was used to match the user's profile with the stored embeddings.

Essay Review: The essay review functionality was implemented using ChatGPT and details from CollegeEssayGuy's guide for what admissions officers look for in college essays were added to the prompt. To accommodate longer essays, the gpt-3.5-turbo model with the 16k context window was used.

Search Tool: The Search Tool leveraged the DuckDuckGo Search Wrapper from Langchain. When a user entered a query, the Search Tool fetched the search results from DuckDuckGo. The obtained results were then fed into ChatGPT to generate a response that included relevant information and links.

Challenges we ran into

Vectorization with HuggingFace Sentence Transformers: Sentence Transformers requires additional packages to be downloaded, so it posed a memory limitation issue when deploying the app on Streamlit. Streamlit has a 1 GB memory limit and users could only match with a few profiles before the app crashed. To overcome this challenge, we deployed our app on Hugging Face spaces, which provided 16 GB of memory. This solution allowed us to handle the memory requirements of Sentence Transformers effectively.

Reddit Post Scraping: Initially, we planned to use the PRAW package for scraping Reddit posts. However, we discovered that Langchain had a Reddit document loader, which seemed to be a more suitable option. However, due to Reddit's API limitations and ongoing API changes, we were only able to retrieve 815 results. This presented a challenge in obtaining a comprehensive dataset for the Profile Matcher.

Custom Search Agent: For the search tool, our vision was to create a Langchain agent with DuckDuckGo's wrapper that would provide results to users. We wanted to give our agent a custom prompt and other features, however we kept getting errors. Then we realized that we could still fetch DuckDuckGo results but instead of creating an agent, we could just give the search results and user's question to the ChatGPT API and just have it return the answer.

Accomplishments that we're proud of

One of the accomplishments that we're most proud of is overcoming our limited programming knowledge. Despite our initial limitations, we persevered and were able to create an application that incorporated many different technologies.

What we're most proud of is that we made something that we would actually use. We've faced the challenges of the college application process and are proud that we've been able to make something that helps students just like us.

What we learned

Simple is better: Initially, two of our team members wanted to use complex frameworks like React, Next.js, and Tailwind, but then we decided to leverage the power of Streamlit, which provided a user-friendly interface and fast deployment. Embracing simplicity allowed us to quickly deploy our ideas and iterate upon them.

Combining tools and technologies: We learned how to combine various tools and technologies, like Langchain, Sentence Transformers, Reddit, DuckDuckGo and more.

DEADLINES!!!: We misread the deadline, but were fortunate to receive an extension. Other times, we may not be so lucky. From this experience, we will definitely never be late for a hackathon ever again :)

What's next for CollegeKit

Before marketing, we want to add a few features:

Expanding the Profile Matcher: We want to add more data for the profile matcher. For instance, one of our team members is Bengali and wasn't able to find any Bengali students included in the database. To accomplish this, we may have to include posts from other platforms, not just r/collegeresults.

Adding Authentication and Chat History: While we know how to create these features, for the hackathon, we focused on reducing barriers. However, as we move forward, we plan to implement these features to enhance user experience.

To gain users, we plan to do the following.

Collecting Feedback and Conducting User Testing: We will share our app with are friends who are already in college or currently applying to college, and will also conduct user testing with them to further refine and improve the platform.

Marketing and Promotion: Once we've reached out to our network of friends, we'll start actually promoting CollegeKit. First, we plan to post about our tools and features on relevant subreddits, such as r/ApplyingToCollege and r/collegeresults, where students actively seek information and resources. Then, we will begin marketing on TikTok as it has broad reach and a high potential for virality.

Built With

  • huggingface
  • langchain
  • openai
  • pinecone
  • python
  • sentence-transformers
  • streamlit
Share this project:

Updates

posted an update

We resolved the 1000 post limit! With datetime libraries we can scrape from specific time periods and get over one hundred-thousand unique posts. The other solution involves using unique keywords with praw with search capabilities which we can use to get similar results.

Log in or sign up for Devpost to join the conversation.