Mostly False

A lie detector for the internet. Real-time fact-checker that works in your browser, verifying claims on YouTube, Twitter/X, LinkedIn, and Reddit against 12,000+ verified fact-checks from PolitiFact, FactCheck.org, and other organizations.

Architecture

Browser Extension (WXT + React + TypeScript)
  |  content scripts: YouTube, Twitter/X, LinkedIn, Reddit
  |  UI: inline annotations, toast notifications, popup, side panel
  |
  v  HTTP POST /api/check/text
FastAPI Backend (Python)
  |-- Claim Detection (ClaimBuster or heuristic fallback)
  |-- VectorAI DB semantic search (12,779 embedded claims via gRPC)
  |-- Google Fact Check Tools API
  |-- Azure OpenAI GPT-4o LLM fallback
  |
Actian VectorAI DB (Docker, gRPC on port 50051)
  |-- FAISS HNSW index, cosine similarity
  |-- 1536-dim embeddings from text-embedding-3-small

Next.js Web App (secondary)
  |-- Landing page, video upload analysis, stats dashboard

Prerequisites

Python 3.11+
Node.js 20+ and pnpm
Docker (for Actian VectorAI DB)
yt-dlp (required for YouTube transcript extraction in backend /api/youtube/transcript)
Azure OpenAI resource with deployed models (see below)
Google Fact Check Tools API key

Quick Start

1. Start VectorAI DB

docker compose up -d

This starts Actian VectorAI DB with ports 8100 (HTTP, unused) and 50051 (gRPC).

2. Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your Azure OpenAI and Google Fact Check API keys

# Run the data pipeline (one-time setup)
python -m app.data.ingest           # Download LIAR dataset
python -m app.data.normalize        # Normalize claim labels
python -m app.data.embed_and_load   # Embed claims and load into VectorAI DB

# Start the API server
# NixOS users: set LD_LIBRARY_PATH for gRPC (see NixOS Notes below)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3. Extension (Chrome)

cd extension
pnpm install
pnpm build        # Production build to .output/chrome-mv3/

Load the extension:

Open chrome://extensions/
Enable Developer mode
Click Load unpacked
Select extension/.output/chrome-mv3/

For development with hot-reload:

pnpm dev

4. Web App (optional)

cd web
pnpm install
pnpm dev

Environment Variables

Copy backend/.env.example to backend/.env and fill in:

Variable	Description
`AZURE_OPENAI_API_KEY`	Azure OpenAI resource key
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint URL (e.g. `https://eastus2.api.cognitive.microsoft.com/`)
`AZURE_OPENAI_API_VERSION`	API version (default: `2024-10-21`)
`AZURE_OPENAI_CHAT_DEPLOYMENT`	GPT-4o deployment name for LLM fallback
`AZURE_OPENAI_EMBEDDING_DEPLOYMENT`	text-embedding-3-small deployment name
`AZURE_OPENAI_WHISPER_DEPLOYMENT`	Whisper deployment name (optional, for video upload)
`GOOGLE_FACTCHECK_API_KEY`	Google Fact Check Tools API key
`CLAIMBUSTER_API_KEY`	ClaimBuster API key (optional, heuristic fallback used when absent)
`CHECK_WORTHINESS_THRESHOLD`	Minimum score to fact-check a sentence (default: `0.5`)
`VECTORDB_URL`	VectorAI DB address (default: `http://localhost:8100`, not used for gRPC)

Getting API Keys

Azure OpenAI:

# Create resource and deploy models
az cognitiveservices account create -n <name> -g <rg> --kind OpenAI --sku S0 -l eastus2
az cognitiveservices account deployment create -n <name> -g <rg> \
  --deployment-name gpt4o --model-name gpt-4o --model-version 2024-08-06 --model-format OpenAI
az cognitiveservices account deployment create -n <name> -g <rg> \
  --deployment-name embed3small --model-name text-embedding-3-small --model-version 1 --model-format OpenAI

Google Fact Check Tools API:

gcloud services enable factchecktools.googleapis.com --project <project-id>
gcloud alpha services api-keys create --display-name="Mostly False FactCheck" --project <project-id>

Data Pipeline

The backend uses a multi-source dataset of fact-checked claims:

python -m app.data.ingest           # Downloads LIAR dataset (12,800+ claims)
python -m app.data.normalize        # Normalizes verdicts to TRUE/FALSE/MIXED/etc.
python -m app.data.embed_and_load   # Generates embeddings via Azure OpenAI,
                                    # loads into VectorAI DB (takes ~15 min with S0 tier)
python -m app.data.demo_cache       # Pre-caches results for demo scenarios

The embed step has retry logic for Azure rate limits (S0 tier: ~1 req/sec).

Fact-Check Pipeline

Each claim goes through a 4-stage pipeline:

Check-Worthiness -- ClaimBuster API (or heuristic fallback). Sentences below threshold are skipped.
VectorAI DB -- Semantic search over 12,779 embedded claims. Matches above 0.80 cosine similarity return a verified verdict.
Google Fact Check API -- Searches existing human fact-checks. Returns verified verdict with source URL.
LLM Fallback -- Azure OpenAI GPT-4o evaluates the claim. Returns ai_analysis verdict with confidence score.

Extension Features

YouTube -- Extracts video transcript from captions, fact-checks claims, shows toast notifications synced to video playback
Twitter/X -- Scans tweet text via MutationObserver, shows inline verdict badges below tweets
LinkedIn -- Scans post text in feed, shows inline verdict badges
Reddit -- Scans shreddit-post (and legacy post containers), fact-checks post title/body claims, and renders inline verdict badges
Scam highlighting -- Claims flagged as scams are rendered as SCAM with high-risk emphasis in badges/cards/toasts
Popup -- Truth Meter gauge, score summary, recent claims, disable-per-site toggle
Side Panel -- Full claim list with explanations, sources, confidence badges, and timeline-aware view for video content

Extension Build

cd extension
pnpm build              # Chrome → .output/chrome-mv3/
pnpm build:firefox      # Firefox → .output/firefox-mv2/
pnpm zip                # Chrome zip for distribution
pnpm zip:firefox        # Firefox zip for distribution

NixOS Notes

On NixOS, grpcio requires libstdc++.so.6 which isn't in the default path. The vectordb.py service handles this automatically by preloading the library via ctypes.CDLL. If running the backend manually:

export LD_LIBRARY_PATH=$(find /nix/store -path '*gcc-*-lib/lib/libstdc++.so.6' -exec file {} \; | grep 64-bit | head -1 | cut -d: -f1 | xargs dirname):$LD_LIBRARY_PATH
uvicorn app.main:app --reload --port 8000

Challenge Coverage

SafetyKit -- Misinformation detection and intervention via real-time fact-checking
Actian VectorAI DB -- 12,779 embedded claims with FAISS HNSW index for semantic retrieval
Sphinx AI -- Dataset pattern analysis and insights
Best Overall -- Multi-stage pipeline with production-grade architecture

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
docs		docs
extension		extension
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
How MIT's New Concrete Stores 10x More Energy [A-mZ853HJqM].en.vtt		How MIT's New Concrete Stores 10x More Energy [A-mZ853HJqM].en.vtt
README.md		README.md
docker-compose.yml		docker-compose.yml
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mostly False

Architecture

Prerequisites

Quick Start

1. Start VectorAI DB

2. Backend

3. Extension (Chrome)

4. Web App (optional)

Environment Variables

Getting API Keys

Data Pipeline

Fact-Check Pipeline

Extension Features

Extension Build

NixOS Notes

Challenge Coverage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mostly False

Architecture

Prerequisites

Quick Start

1. Start VectorAI DB

2. Backend

3. Extension (Chrome)

4. Web App (optional)

Environment Variables

Getting API Keys

Data Pipeline

Fact-Check Pipeline

Extension Features

Extension Build

NixOS Notes

Challenge Coverage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages