ResumeBias
Blind Web App
Disrupt the District 2018
Demo
Purpose
To support equality and diversity in the hiring process. White male applicants receive up to 36% more callbacks than their peers based on their names. Blind strips the name off of the resume to reduce racial or gender bias during the first round of hiring.
How
Blind uses natural language processing to identify the name and email on the resume. It then generates an unbiased resume where both pieces of information are blocked out.
Usage
Upload your resume in PDF format on the demo website. Choose how much information you want to remove by clicking the buttons below the submission box. The new PDF will automatically download onto your computer.
Dependencies
Python 3.6+
nltk (Stanford NLP module optional)
numpy
pdfminer.six
PyPDF2
unidecode
Note:
need to run nltk.dowload() and choose book before running entire program
Challenges
Coordinating all of the dependencies among our computers and AWS. Identifying and removing non-standard characters. Finding both a person's first and last name.
Future Improvements
We would have liked to include the improved version of the natural language processing but it was too large for AWS in its current state. Removing other potential biases on resumes such as age or location.
Sources Used:
NLTK
Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O'Reilly Media Inc. (nltk.org)
Email Regex
http://www.regular-expressions.info/email.html
Stanford NLP with NLTK
NFL Players Dataset
https://raw.githubusercontent.com/theliamcrawford/6-Degrees-of-NFL-Players/master/names.txt
Log in or sign up for Devpost to join the conversation.