Skip to content

nichsedge/social-media-analytics

Repository files navigation

📊 Social Media Analytics

A collection of course materials, tutorials, and projects covering the full pipeline of social media analytics — from data collection to advanced NLP modelling — with a focus on Indonesian-language content and the Twitter/X platform.


📁 Repository Structure

social-media-analytics/
├── Tutorial1_TextMining/          # Text mining fundamentals
├── Tutorial2_Topic Modelling/     # Topic modelling with LDA & variants
├── Tutorial3_Data Collection/     # Twitter data collection (API & Twint)
├── tugas_1/                       # Assignment 1: basic data analysis
├── proyek tengah semester/        # Mid-term project: user profiling
├── proyek akhir semester/         # Final project: stance detection
├── graph1.ipynb                   # Graph/network analysis basics
├── graph2.ipynb                   # Extended graph analysis
├── pagerank.ipynb                 # PageRank algorithm on social networks
└── script.py                      # Twitter API utility script (Tweepy)

📚 Tutorials

Tutorial 1 — Text Mining

Covers the core NLP pipeline applied to social media text:

  • Text preprocessing (tokenization, stopword removal, normalization)
  • Word Embeddings (Word2Vec, GloVe)
  • Transformer-based Language Models (BERT, IndoBERT)

Tutorial 2 — Topic Modelling

Unsupervised discovery of latent topics from Twitter corpora:

  • Latent Dirichlet Allocation (LDA)
  • Indonesian-language datasets (e.g., trending topics on Twitter)
  • Preprocessing with colloquial lexicon & abbreviation dictionaries

Tutorial 3 — Data Collection

Methods for collecting social media data:

  • Twitter API v2 via tweepy
  • Twint for scraping without API rate limits
  • Structured storage of collected tweets

🎓 Projects

Mid-Term Project — User Profiling

Predicting user attributes from tweet content and profile metadata:

  • Gender classification using TF-IDF, LSTM, and Transformer models
  • Occupation classification using large Transformer models
  • Exploratory Data Analysis (EDA) and error analysis included

Final Project — Stance Detection & Network Analysis

End-to-end analysis of opinion and influence on Twitter:

  • Tweet collection via Twint
  • Stance detection (e.g., pro/against a topic) using fine-tuned Transformers
  • Network analysis: retweet/like graphs, PageRank-based influence scoring

🛠️ Technologies & Libraries

Category Tools
Data Collection tweepy, twint
Data Processing pandas, numpy
NLP nltk, scikit-learn, gensim
Deep Learning transformers (HuggingFace), tensorflow / pytorch
Network Analysis networkx
Visualization matplotlib, seaborn
Environment Python 3, Jupyter Notebook

🚀 Getting Started

  1. Clone the repository

    git clone https://github.com/nichsedge/social-media-analytics.git
    cd social-media-analytics
  2. Set up a virtual environment

    python -m venv .env
    source .env/bin/activate        # Linux/macOS
    .env\Scripts\activate.bat       # Windows
  3. Install dependencies (per tutorial/project folder as needed)

    pip install tweepy pandas numpy nltk scikit-learn gensim transformers networkx matplotlib seaborn
  4. Open notebooks

    jupyter notebook

⚠️ Notes

  • Some notebooks use Indonesian-language datasets and lexicons (e.g., colloquial-indonesian-lexicon.csv, stopwordsID.csv).
  • Twitter API credentials in script.py are for reference only — replace with your own keys before running.
  • Large dataset files (.csv) may not be included in the repository due to size constraints.

📄 License

This repository is intended for educational purposes as part of a Social Media Analytics course.

About

Social media analytics course materials: text mining, topic modelling, data collection (Twitter/X API & Twint), stance detection, user profiling (gender & occupation), and network analysis with PageRank.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors