NLP Sentiment Analysis

This repo demonstrates two approaches to sentiment analysis, executed with R and Python.

R Analysis: CHILDES Corpus Sentiment Visualization

This analysis uses child speech from the Child Language Data Exchange System (CHILDES), processing 3.7 million words from real-world conversations to provide insights into how children's speech conveys positive and negative emotions.

Overview:

Use the childesr package to access the CHILDES TalkBank database.
Download and cache the CHILDES English-North America corpus (3.7M+ tokens).
Apply stemming and remove stop words for dimensionality reduction.
Perform lexicon-based sentiment analysis with the Bing sentiment lexicon.
Identify the top 20 most frequent positive and negative words in children's speech.
Use ggplot2 to create a publication-ready visualization.

Python Analysis: RoBERTa Transformer for Binary Sentiment Classification

File: NLP_sentiment_analysis_RoBERTa.ipynb

This analysis uses deep learning for text classification using RoBERTa (Robustly Optimized BERT Approach), a transformer model with 125 million parameters. The model is fine-tuned on the NLTK movie_reviews corpus to classify reviews as positive or negative. The current, quick-and-dirty method delivers 90%+ accuracy, precision, recall, and F1-score after 3 training epochs.

Overview:

Load the NLTK movie_reviews corpus and the pre-trained RoBERTa model.
Tokenize text and convert tokenized text to PyTorch tensors for batching.
Configure training arguments such as number of epochs.
Evaluate performance with accuracy, precision, recall, and F1-score metrics.
Use plotnine (ggplot2 for Python) to visualize accuracy by training epoch.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitignore		.gitignore
NLP_sentiment_analysis.R		NLP_sentiment_analysis.R
NLP_sentiment_analysis_RoBERTa.ipynb		NLP_sentiment_analysis_RoBERTa.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Sentiment Analysis

R Analysis: CHILDES Corpus Sentiment Visualization

Python Analysis: RoBERTa Transformer for Binary Sentiment Classification

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Sentiment Analysis

R Analysis: CHILDES Corpus Sentiment Visualization

Python Analysis: RoBERTa Transformer for Binary Sentiment Classification

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages