Inspiration

The Myers–Briggs Type Indicator (MBTI) is an introspective self-report questionnaire indicating differing psychological preferences in how people perceive the world and make decisions. We wanted to use a variant of this test to analyze the data available from a candidate and identify if their personality matches with the character of the person or the company.

What it does

The web app will take in the twitter handle user ID and analyze the latest 200 posts to identify the MBTI personality of the twitter user. We have created an API that will take in the twitter user ID and then evaluate the posts made by the user. The model was trained on the Kaggle dataset (https://www.kaggle.com/datasnaek/mbti-type) Which was sourced from the personality café website, which analyses the data of all their users, and this was the dataset that was used for creating the model upon which the NLP predictor runs.

How I built it

The NLP was built using a dataset that was pretty imbalanced, and hence this turned out to be an Imbalanced Classification Problem.

Generally, there are two approaches to tackle this kind of problem:

Sampling: There are different sampling techniques to address this problem. Random Sampling SMOTE Modified SMOTE

Ensemble Methods:

We will be using a type of random sampling technique called under-sampling. This approach will randomly remove our sample instances from the majority class until the size of majority class instances is equal to minority class instances.

The models were trained upon each pair's Introvert/Extrovert, Intuition/Sensing, Feeling/Thinking, and Perception/Judgment. Thus when the accuracy was found, it came out to be around 82%, although this accuracy is superficial in the cases of the first two models i.e., IE and NS. The reason being that these are heavily biased towards a single characteristic value. Our accuracy is just high because most of the data is belonging to one instance. Introvert, for example, in the IE model.

Challenges I ran into

The main challenge with this particular dataset is the imbalance in the dataset. The dataset is hugely imbalanced. I have tackled this problem by training a model for each characteristic pair. There is an imbalance in different peculiar pair like IE pair.

Accomplishments that I'm proud of

We were able to make use of the data we had on hand and achieve something that was pretty acceptable as an extensible web app. This particular feature is pretty helpful in gauging the personality of the candidate and creates a decent mapping between the candidate and the job they are applying to.

What I learned

We have a better understanding of the Imbalanced Classification Problem, and eventually, we came up with a method to make the best use of an imbalanced dataset.

What's next for JudgeMe

We are having an idea of implementing this as a skill of the chatbot from Paradox AI named Olivia. Candidates can give access to their twitter user id and analyze their compatibility with the job and company that they are applying to.

Built With

Share this project:

Updates