Inquery

Search engine results screen
Home screen

Inspiration

Inquery is inspired by challenges students may face when it comes to connecting with professors who suit their research goals and aspirations. The Purdue CS Undergrad Research page categorizes professors into 11 different areas of research. This provides a general idea of the research happening at Purdue but fails to consider specifics and nuances within each category. Our goal is to make the process of finding and getting in contact with professors more efficient and seamless.

What it does

Inquery web scrapes Purdue's CS faculty site to build a database of professors. This database stores their name, education, displayed publications, and external links as well as research areas associated with each professor. We used beautiful soup to query data from the Purdue CS research page and parsed through it, utilizing different natural language processing libraries like NLTK and Spacy.

Users can easily look up a research topic and if it has any similarity to indices in the database, corresponding details about professors and their projects/research will be displayed in a user-friendly format.

How we built it

Each team member focused on writing code to complete each necessary step. Once each step was complete, we modified our scripts to communicate with other scripts.

To get data from the web page to the search engine, we scraped the webpages for necessary information and stored it on Firebase. The search engine uses this database and an inputted research topic to find results for the user.

Description of web scraping:

In order to obtain data for each professor, we used Beautiful Soup to scrape the faculty websites. Beautiful Soup works by downloading the HTML code of the webpage, allowing Python to read it.

Finding the title and the corresponding link was much harder. Only extracting the text from the page would get the titles, but not any corresponding links. So we had to convert the HTML code to a string, and use find and remove commands to isolate the titles and links. There are some recurring tags, but there are inconsistencies as well. This is why a long list of replace commands is needed.

Description of database:

To store our data, we used Cloud Firestore: a database that stores info as documents within a collection. Each document represents a professor and stores their data as individual variables. We created a Python method that takes in parameters and uploads them to the database. This is done by setting a reference to a document within the profdata collection and adding data to it. If the document does not exist already, Firebase will automatically create it. This allows us to easily edit and expand the database.

Challenges we ran into

Uploading data from the web scraper was challenging. Originally, it was stored as a map, but had to be changed to use parallel arrays. Collecting the publication titles and links was especially challenging since it required converting the HTML code to a string and narrowing the result down to the individual titles using multiple find() and replace() commands. Extra lines were needed to account for inconsistencies in the websites.

Accomplishments we’re proud of

We are proud of being able to create a database of 30+ professors, spanning 11 CS research areas as well as having paved the path for future scalability. Our goal was to allow users to accurately and easily find professors based on search inquiries, to which we have been successful in accomplishing.

What we learned

Overall, our team became more comfortable with web development, database management, and interlacing different components together to create a functional web app.

Shelly: I learned more about part of speech tagging and keyword extraction as well as web scraping by being challenged by such a large and diverse dataset to analyze.

Malcolm: I learned more about how to import data to Firebase and use beautiful soup for the web scraping of links to different web pages.

Evan: I learned more about web dev and integrating databases into websites, specifically using Flask, Python and firebase.

Ryan: I learned more about CSS, Flask, and web development in general. I was also able to familiarize myself with Firebase database.

What’s next for Inquiry

Our future goals are to create a more robust algorithm for summarizing the bios of researches as well as HTML web scraping professor pages. In the future, we would like to expand to include databases of other colleges and universities to build a larger platform for users. Adding additional features such as specifying the qualifications of students professors are looking to work with would be beneficial.