- Extract NN-ADJ bigrams
- Extract NN unigrams
- Extract top n-grams using TF-IDF
- Sentiment Anaysis on Reviews on Aspects (unigrams or bigrams)
- Visualize sentiments on keywords extracted using barplots/Wordcloud
Presentation and Tutorial Available at - https://mediaspace.illinois.edu/media/t/1_ammbs24f?st=0
Project Documentation.pdf
- Final Project V4.ipynb (whole source code)
- Test.ipynb (Code for testers to run)
- requirements.txt (libraries to install)
- sentiment_analyzer.joblib (trained sentiment classifier model)
- preprocess_airbnb.csv (Airbnb data with necesaary n-grams extracted with source code to be used for topic extraction and sentiment analysis)
- preprocess_hotel.csv (Hotel data with necesaary n-grams to be used for topic extraction and sentiment analysis)
https://nbviewer.jupyter.org/github/richameher/CourseProject/blob/main/code/Final_proj%20V4.html
python3 -m venv py3-env-final-proj
source py3-env-final-proj/bin/activate
pip install jupyter
python3 -m ipykernel install --user —name=final-proj
(final-proj will be used as env in jupter notebook)
pip install -r requirements.txt
jupyter notebook
9. Change the file path to where the preprocessed files are i.e. under folder data and Run all the cells in the notebook
- Read Original Datasets (Check Proposal.pdf Data for the links)
- Clean Text- Tokenize, Remove punctuations, tabs, whitespaces, stopwords, common words
- Extract bigrams- Create a bigrams column to extract all afjacent pairs of bigrams from review/text
- Create a bigram_list column and keep bigrams that are NN-ADJ pairs
- Create a unigram_list column and keep only unigrams that are NN
- Train a Logistic Regression classifier on sentiment and reviews of the hotel dataset as only hotel dataset has ratings (map ratings to sentiments first)
- Use WordCloud to visualize frequent bigrams
- Use TF-IDF to extract top n keywords from unigrams or bigrams
- Use the trained sentiment classifier to classify the sentiment for the keywords
- Plot a bar graph, with sentiment as labels, keywords and sentiment probability/topic extent on Y-axis
[Hotel Review Bigram WordCloud]
[Hotel Review Sentiment-Topic Extent Bar Plot]
In the Wordcloud we can observe that people tend to talk about the quality of rooms. Features like safety is usually associated with the hotels than Airbnbs. Also Hotels have their own website , so people also talk about the online booking system. As for the bar plot, we can see that hotels have “theft”, “suite” aspects that have been associated with negative sentiment. Also the highest positive sentiment is observed among aspects like “room”, “view” and “manager”
(Note- we can also observe the sentiment probability on y-axis instead of topic extent, Check Test.ipynb)
Completed By Richa Meherwal
Free Topic- Using Topic Mining and Sentiment Analysis to compare customer level satisfaction in Airbnb vs Hotels

