A lie detector for the internet. Real-time fact-checker that works in your browser, verifying claims on YouTube, Twitter/X, LinkedIn, and Reddit against 12,000+ verified fact-checks from PolitiFact, FactCheck.org, and other organizations.
Browser Extension (WXT + React + TypeScript)
| content scripts: YouTube, Twitter/X, LinkedIn, Reddit
| UI: inline annotations, toast notifications, popup, side panel
|
v HTTP POST /api/check/text
FastAPI Backend (Python)
|-- Claim Detection (ClaimBuster or heuristic fallback)
|-- VectorAI DB semantic search (12,779 embedded claims via gRPC)
|-- Google Fact Check Tools API
|-- Azure OpenAI GPT-4o LLM fallback
|
Actian VectorAI DB (Docker, gRPC on port 50051)
|-- FAISS HNSW index, cosine similarity
|-- 1536-dim embeddings from text-embedding-3-small
Next.js Web App (secondary)
|-- Landing page, video upload analysis, stats dashboard
- Python 3.11+
- Node.js 20+ and pnpm
- Docker (for Actian VectorAI DB)
- yt-dlp (required for YouTube transcript extraction in backend
/api/youtube/transcript) - Azure OpenAI resource with deployed models (see below)
- Google Fact Check Tools API key
docker compose up -dThis starts Actian VectorAI DB with ports 8100 (HTTP, unused) and 50051 (gRPC).
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your Azure OpenAI and Google Fact Check API keys
# Run the data pipeline (one-time setup)
python -m app.data.ingest # Download LIAR dataset
python -m app.data.normalize # Normalize claim labels
python -m app.data.embed_and_load # Embed claims and load into VectorAI DB
# Start the API server
# NixOS users: set LD_LIBRARY_PATH for gRPC (see NixOS Notes below)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd extension
pnpm install
pnpm build # Production build to .output/chrome-mv3/Load the extension:
- Open
chrome://extensions/ - Enable Developer mode
- Click Load unpacked
- Select
extension/.output/chrome-mv3/
For development with hot-reload:
pnpm devcd web
pnpm install
pnpm devCopy backend/.env.example to backend/.env and fill in:
| Variable | Description |
|---|---|
AZURE_OPENAI_API_KEY |
Azure OpenAI resource key |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint URL (e.g. https://eastus2.api.cognitive.microsoft.com/) |
AZURE_OPENAI_API_VERSION |
API version (default: 2024-10-21) |
AZURE_OPENAI_CHAT_DEPLOYMENT |
GPT-4o deployment name for LLM fallback |
AZURE_OPENAI_EMBEDDING_DEPLOYMENT |
text-embedding-3-small deployment name |
AZURE_OPENAI_WHISPER_DEPLOYMENT |
Whisper deployment name (optional, for video upload) |
GOOGLE_FACTCHECK_API_KEY |
Google Fact Check Tools API key |
CLAIMBUSTER_API_KEY |
ClaimBuster API key (optional, heuristic fallback used when absent) |
CHECK_WORTHINESS_THRESHOLD |
Minimum score to fact-check a sentence (default: 0.5) |
VECTORDB_URL |
VectorAI DB address (default: http://localhost:8100, not used for gRPC) |
Azure OpenAI:
# Create resource and deploy models
az cognitiveservices account create -n <name> -g <rg> --kind OpenAI --sku S0 -l eastus2
az cognitiveservices account deployment create -n <name> -g <rg> \
--deployment-name gpt4o --model-name gpt-4o --model-version 2024-08-06 --model-format OpenAI
az cognitiveservices account deployment create -n <name> -g <rg> \
--deployment-name embed3small --model-name text-embedding-3-small --model-version 1 --model-format OpenAIGoogle Fact Check Tools API:
gcloud services enable factchecktools.googleapis.com --project <project-id>
gcloud alpha services api-keys create --display-name="Mostly False FactCheck" --project <project-id>The backend uses a multi-source dataset of fact-checked claims:
python -m app.data.ingest # Downloads LIAR dataset (12,800+ claims)
python -m app.data.normalize # Normalizes verdicts to TRUE/FALSE/MIXED/etc.
python -m app.data.embed_and_load # Generates embeddings via Azure OpenAI,
# loads into VectorAI DB (takes ~15 min with S0 tier)
python -m app.data.demo_cache # Pre-caches results for demo scenariosThe embed step has retry logic for Azure rate limits (S0 tier: ~1 req/sec).
Each claim goes through a 4-stage pipeline:
- Check-Worthiness -- ClaimBuster API (or heuristic fallback). Sentences below threshold are skipped.
- VectorAI DB -- Semantic search over 12,779 embedded claims. Matches above 0.80 cosine similarity return a
verifiedverdict. - Google Fact Check API -- Searches existing human fact-checks. Returns
verifiedverdict with source URL. - LLM Fallback -- Azure OpenAI GPT-4o evaluates the claim. Returns
ai_analysisverdict with confidence score.
- YouTube -- Extracts video transcript from captions, fact-checks claims, shows toast notifications synced to video playback
- Twitter/X -- Scans tweet text via MutationObserver, shows inline verdict badges below tweets
- LinkedIn -- Scans post text in feed, shows inline verdict badges
- Reddit -- Scans
shreddit-post(and legacy post containers), fact-checks post title/body claims, and renders inline verdict badges - Scam highlighting -- Claims flagged as scams are rendered as
SCAMwith high-risk emphasis in badges/cards/toasts - Popup -- Truth Meter gauge, score summary, recent claims, disable-per-site toggle
- Side Panel -- Full claim list with explanations, sources, confidence badges, and timeline-aware view for video content
cd extension
pnpm build # Chrome → .output/chrome-mv3/
pnpm build:firefox # Firefox → .output/firefox-mv2/
pnpm zip # Chrome zip for distribution
pnpm zip:firefox # Firefox zip for distributionOn NixOS, grpcio requires libstdc++.so.6 which isn't in the default path. The vectordb.py service handles this automatically by preloading the library via ctypes.CDLL. If running the backend manually:
export LD_LIBRARY_PATH=$(find /nix/store -path '*gcc-*-lib/lib/libstdc++.so.6' -exec file {} \; | grep 64-bit | head -1 | cut -d: -f1 | xargs dirname):$LD_LIBRARY_PATH
uvicorn app.main:app --reload --port 8000- SafetyKit -- Misinformation detection and intervention via real-time fact-checking
- Actian VectorAI DB -- 12,779 embedded claims with FAISS HNSW index for semantic retrieval
- Sphinx AI -- Dataset pattern analysis and insights
- Best Overall -- Multi-stage pipeline with production-grade architecture