Mostly False — A Lie Detector for the Internet

Tagline: Real-time fact-checking, everywhere you browse.


🎯 Inspiration

The internet is full of claims — in videos, tweets, and posts — and most go unchecked. We wanted to bring fact-checking to the content instead of asking people to leave the page, copy a claim, and search for the truth. What if every check-worthy claim could be verified in real time, with verdicts and sources right where you're reading or watching?

We built Mostly False to do exactly that: a lie detector that runs in your browser on YouTube, Twitter/X, and LinkedIn, using a pipeline backed by 50,000+ verified fact-checks from PolitiFact, Snopes, and 26 other organizations.


✨ What It Does

  • Browser extension (YouTube, Twitter/X, LinkedIn)
    Install once; Mostly False runs in the background. It finds check-worthy claims on the page, runs them through our fact-check pipeline, and surfaces inline verdict badges (True, Mostly True, Mixed, Mostly False, False) with tooltips. Click any badge to open the side panel for the full breakdown — source, explanation, and a Truth Meter score for the whole page.

  • Dataset & stats (web app)
    A Next.js landing page and a stats dashboard show the scale of our fact-check dataset (50,000+ claims), verdict distribution, and high-level insights — the same data that powers every check.

  • Scam detection (same pipeline)
    We use the same implementation to flag scam-like content: same badges and UI, we just mark those claims so users see them at a glance. Built with the SafetyKit challenge in mind — AI for human safety (misinformation, scams, manipulation).


🛠 How We Built It

Architecture

  • Browser extensionWXT, React 19, TypeScript. Content scripts inject into YouTube, Twitter/X, and LinkedIn; we extract text, send check-worthy sentences to the backend via REST, and render inline annotations and a side panel (Truth Meter + claim cards). Cross-browser (Chrome MV3, Firefox MV2).

  • Backend — FastAPI (Python 3.12). Single fact-check pipeline used by the extension:

    1. Check-worthiness — Azure OpenAI (gpt-4o-mini) or ClaimBuster API scores sentences so we only check factual claims, not opinions or filler.
    2. Known false patterns — Quick rules for common misinformation (e.g. flat earth, vaccine–autism, Holocaust denial) with citations.
    3. Actian VectorAI DB — 50,000+ fact-check claims embedded (Azure OpenAI) and stored; we do semantic similarity search (gRPC) to find matching fact-checks.
    4. Google Fact Check API — When we have a key, we query it for additional verified results.
    5. LLM fallback — When there’s no strong match, we use Azure OpenAI (gpt-4o-mini) to evaluate the claim and label the result as AI analysis so users know the source.
  • Web app — Next.js 16, React 19, Tailwind. Landing page and stats dashboard (dataset size, verdict breakdown, insights). Calls the same backend API for stats; no video upload.

  • YouTube transcript API — For YouTube pages we can fetch captions via yt-dlp (run in a thread with asyncio.to_thread on Windows) and run the same fact-check pipeline on that text.

  • Data pipeline — We ingest LIAR, MultiFC, and ClaimBuster datasets, normalize verdicts, generate embeddings (Azure OpenAI), and load into Actian VectorAI DB for semantic search.

Tech Stack

Layer Technologies
Extension WXT, React 19, TypeScript, Tailwind 4, Framer Motion, Lucide
Backend FastAPI, Python 3.12, Pydantic 2, httpx, gRPC, yt-dlp
Web Next.js 16, React 19, Tailwind 4, Framer Motion
Data / ML Actian VectorAI DB, Azure OpenAI (embeddings, Whisper, gpt-4o-mini), Google Fact Check API, ClaimBuster (optional)
Infra Docker (VectorAI DB), REST

😤 Challenges We Ran Into

  • Windows + asyncio subprocesses — The event loop in some environments didn’t support create_subprocess_exec (used for yt-dlp for YouTube transcripts), causing NotImplementedError. We fixed it by running yt-dlp via synchronous subprocess.run inside asyncio.to_thread so we don’t depend on the loop supporting subprocesses.

  • Python 3.14 vs pydantic-core — pydantic-core (Rust/PyO3) only supported up to 3.13. We recreated the venv with Python 3.12 and documented the version for the backend.

  • Single pipeline for the extension — We kept one fact-check pipeline (check-worthiness → vector DB → Google → LLM) and one set of FastAPI routes and schemas so the extension gets consistent verdicts and we only maintain one implementation.


🏆 Accomplishments We're Proud Of

  • One pipeline, one extension — The same fact-check logic powers every request from the browser; users get the same verdicts and sources on YouTube, Twitter, and LinkedIn.

  • 50,000+ fact-checks, semantic search — Using Actian VectorAI DB for meaning-based retrieval (not just keywords) lets us surface relevant fact-checks even when wording differs.

  • Transparent sourcing — Every result is labeled as Verified (matched to a known fact-check) or AI analysis (LLM fallback), so users know where the verdict came from.

  • Real-time UX — Inline badges and a side panel on the page. No extra tabs, no copy-paste.

  • SafetyKit alignment — We’re targeting misinformation and scam-like content with the same pipeline, putting fact-checking and safety in one place.


📚 What We Learned

  • A single backend pipeline for the extension (many small requests) works well when check-worthiness and fact-check logic live in one place and the extension just sends text and gets back results.
  • Semantic search over fact-checks is powerful but depends on good normalization of verdicts and labels across datasets (LIAR, MultiFC, ClaimBuster, PolitiFact, etc.).
  • On Windows, asyncio subprocess support can vary; offloading blocking subprocess calls to a thread via asyncio.to_thread is a robust cross-platform approach.

🔮 What's Next

  • More platforms — Extend content scripts to more sites (e.g. Reddit, news comment sections).
  • Scam detection — Expand the “mark as mostly false / scam” path with dedicated signals and clearer in-UI treatment.
  • Performance — Batch or cache check-worthiness and embeddings where possible to reduce latency and API cost.
  • Mobile / share — Explore a shareable “fact-check this” flow (e.g. paste URL → get a report).

Built With

Share this project:

Updates