Review Moderation (Hackathon Project)

Project Overview

This project builds a moderation pipeline for Google-style reviews. Goal: automatically flag reviews that violate policy categories:

advertisement: promotional content, referral links, sales pitches.
irrelevant: unrelated to the business (random stories, off-topic text).
no_visit_rant: rants/complaints where reviewer admits no visit.
none: all other legitimate, on-topic reviews.

We compare multiple approaches:

Baseline (TF-IDF + Logistic Regression)
Zero-Shot Classification (Hugging Face NLI models)
Few-Shot Prompting (LLMs guided by src/prompts/policy.md)

The pipeline produces labeled datasets, model predictions, evaluation metrics, and prompt artifacts for analysis.

Setup Instructions

1. Clone the repo

git clone cd review-moderation-ml

2. Create virtual environment

python3 -m venv .venv source .venv/bin/activate

3. Install dependencies

pip install -U pip pip install pandas scikit-learn transformers torch accelerate datasets evaluate matplotlib rich joblib python-dotenv jupyter

How to reproduce results

1. Data Preparation

Place raw Google Reviews dataset into data/raw/ For JSON data: python src/data/ingest_json.py

2. Zero-Shot Labeling (pseudo-labels)

python src/models/hf_batch_infer.py

3. Baseline Model (TF-IDF + Logistic Regression)

python src/models/baseline.py

4. Evaluation

python src/eval/visualise.py

5. Prompt Engineering

Instructions and few-shot examples are in src/prompts/policy.md. Optionally, run a few-shot demo with a small instruction-tuned LLM.

Team Contributions

Sean See — Data ingestion, JSON/CSV cleaning pipeline, repository setup.
Cyril Pedrina — Zero-shot classification pipeline (Hugging Face), pseudo-label generation.
Jade Ng — Baseline model (TF-IDF + Logistic Regression), training, and evaluation metrics.
Eleanor Cheak — Prompt engineering (policy.md), few-shot examples, and LLM experimentation.
Zelda Seow — Documentation, and final presentation delivery.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
artifacts		artifacts
data		data
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review Moderation (Hackathon Project)

Project Overview

Setup Instructions

1. Clone the repo

2. Create virtual environment

3. Install dependencies

How to reproduce results

1. Data Preparation

2. Zero-Shot Labeling (pseudo-labels)

3. Baseline Model (TF-IDF + Logistic Regression)

4. Evaluation

5. Prompt Engineering

Team Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Review Moderation (Hackathon Project)

Project Overview

Setup Instructions

1. Clone the repo

2. Create virtual environment

3. Install dependencies

How to reproduce results

1. Data Preparation

2. Zero-Shot Labeling (pseudo-labels)

3. Baseline Model (TF-IDF + Logistic Regression)

4. Evaluation

5. Prompt Engineering

Team Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages