You will fine-tune a compact DistilBERT encoder to decide whether a news article is fake or real, export the model to an 8-bit ONNX file, and run it fully client-side in a single-page React / Vite app.
The popular Fake and Real News corpus (~44 k articles) fits on a laptop, and INT8 quantisation shrinks the model to ≈ 18 MB, keeping browser latency below 200 ms with WebGPU.
Build a browser-only fake-news detector that takes an article (headline + text) and returns
- a probability (
0 = real,1 = fake) - a quick verdict badge
All inference must remain in the user’s tab—no server calls.
Misinformation detection is topical, involves longer inputs than tweets, and requires more than surface-level cues—perfect for demonstrating practical NLP skills without huge compute.
| Component | Default Pick | Rationale | Key Spec |
|---|---|---|---|
| Dataset | Fake and Real News (Kaggle) | 23 k fake + 21 k real; permissive CC0-like licence | CSV (~35 MB) |
| Base model | distilbert-base-uncased |
66 M params, 6 layers, ≈ 97 % of BERT at 40 % size | 250 MB FP32 → ≤ 20 MB INT8 |
| Exporter | optimum-cli export onnx --quantize dynamic |
One-command ONNX + quantisation | 18–22 MB ONNX |
| Browser EP | (A) Transformers.js (WASM/WebGPU) (B) ONNX Runtime Web (WebGPU) |
Both support INT8; WebGPU halves latency | < 200 ms on desktop |
| Frontend | React + Vite | Instant HMR dev-server, zero-config static build | Bundle ≤ 1 MB gz |
- Public Git repo with a concise
README.mddescribing- dataset provenance & licence link
- training hyper-parameters
- evaluation metrics (F1, accuracy)
model.onnx≤ 25 MB andtokenizer.jsoncommitted under/public/model/.- Browser demo at
<your-url>/fakenews/featuring- textarea (or drop-zone) for article text
- probability bar ± verdict badge
- console log of inference latency
- Short screencast (< 2 min) or live URL proving the app works offline (reload with Wi-Fi disabled).
conda create -n fakenews python=3.10
conda activate fakenews
pip install "transformers>=4.40" datasets evaluate accelerate \
"optimum[onnxruntime,gpu]" onnxruntime-web==1.17.0 \
@huggingface/transformers # JS sideONNX Runtime Web ≥ 1.17 adds official WebGPU support.
from datasets import load_dataset
# The Kaggle CSV can be loaded directly once downloaded locally
news = load_dataset(
"csv",
data_files={"train": "train.csv", "test": "test.csv"}
)Split 10 % of training into validation; truncate/clean text as desired.
- Task: sequence classification
- Epochs: 2
- Learning rate: 2 e-5
max_length: 384- Batch size: 8–16 (GPU-dependent)
Monitor F1 on the validation set.
Keep the best checkpoint (save_total_limit=1) and log metrics with evaluate.
optimum-cli export onnx --model path/to/best onnx/ --quantize dynamicExpect an ~18 MB INT8 model.
import { pipeline } from "@huggingface/transformers";
const clf = await pipeline(
"text-classification",
"/public/model",
{ quantized: true }
);
const { label, score } = (await clf(articleText))[0];import * as ort from "onnxruntime-web";
const session = await ort.InferenceSession.create("/model.onnx", {
executionProviders: ["webgpu", "wasm"]
});Good luck—help us separate facts from fiction!