A collection of course materials, tutorials, and projects covering the full pipeline of social media analytics — from data collection to advanced NLP modelling — with a focus on Indonesian-language content and the Twitter/X platform.
social-media-analytics/
├── Tutorial1_TextMining/ # Text mining fundamentals
├── Tutorial2_Topic Modelling/ # Topic modelling with LDA & variants
├── Tutorial3_Data Collection/ # Twitter data collection (API & Twint)
├── tugas_1/ # Assignment 1: basic data analysis
├── proyek tengah semester/ # Mid-term project: user profiling
├── proyek akhir semester/ # Final project: stance detection
├── graph1.ipynb # Graph/network analysis basics
├── graph2.ipynb # Extended graph analysis
├── pagerank.ipynb # PageRank algorithm on social networks
└── script.py # Twitter API utility script (Tweepy)
Covers the core NLP pipeline applied to social media text:
- Text preprocessing (tokenization, stopword removal, normalization)
- Word Embeddings (Word2Vec, GloVe)
- Transformer-based Language Models (BERT, IndoBERT)
Unsupervised discovery of latent topics from Twitter corpora:
- Latent Dirichlet Allocation (LDA)
- Indonesian-language datasets (e.g., trending topics on Twitter)
- Preprocessing with colloquial lexicon & abbreviation dictionaries
Methods for collecting social media data:
- Twitter API v2 via
tweepy - Twint for scraping without API rate limits
- Structured storage of collected tweets
Predicting user attributes from tweet content and profile metadata:
- Gender classification using TF-IDF, LSTM, and Transformer models
- Occupation classification using large Transformer models
- Exploratory Data Analysis (EDA) and error analysis included
End-to-end analysis of opinion and influence on Twitter:
- Tweet collection via Twint
- Stance detection (e.g., pro/against a topic) using fine-tuned Transformers
- Network analysis: retweet/like graphs, PageRank-based influence scoring
| Category | Tools |
|---|---|
| Data Collection | tweepy, twint |
| Data Processing | pandas, numpy |
| NLP | nltk, scikit-learn, gensim |
| Deep Learning | transformers (HuggingFace), tensorflow / pytorch |
| Network Analysis | networkx |
| Visualization | matplotlib, seaborn |
| Environment | Python 3, Jupyter Notebook |
-
Clone the repository
git clone https://github.com/nichsedge/social-media-analytics.git cd social-media-analytics -
Set up a virtual environment
python -m venv .env source .env/bin/activate # Linux/macOS .env\Scripts\activate.bat # Windows
-
Install dependencies (per tutorial/project folder as needed)
pip install tweepy pandas numpy nltk scikit-learn gensim transformers networkx matplotlib seaborn
-
Open notebooks
jupyter notebook
- Some notebooks use Indonesian-language datasets and lexicons (e.g.,
colloquial-indonesian-lexicon.csv,stopwordsID.csv). - Twitter API credentials in
script.pyare for reference only — replace with your own keys before running. - Large dataset files (
.csv) may not be included in the repository due to size constraints.
This repository is intended for educational purposes as part of a Social Media Analytics course.