Skip to content

nash-pillai/Hacklytics

Repository files navigation

Mostly False

A lie detector for the internet. Real-time fact-checker that works in your browser, verifying claims on YouTube, Twitter/X, LinkedIn, and Reddit against 12,000+ verified fact-checks from PolitiFact, FactCheck.org, and other organizations.

Architecture

Browser Extension (WXT + React + TypeScript)
  |  content scripts: YouTube, Twitter/X, LinkedIn, Reddit
  |  UI: inline annotations, toast notifications, popup, side panel
  |
  v  HTTP POST /api/check/text
FastAPI Backend (Python)
  |-- Claim Detection (ClaimBuster or heuristic fallback)
  |-- VectorAI DB semantic search (12,779 embedded claims via gRPC)
  |-- Google Fact Check Tools API
  |-- Azure OpenAI GPT-4o LLM fallback
  |
Actian VectorAI DB (Docker, gRPC on port 50051)
  |-- FAISS HNSW index, cosine similarity
  |-- 1536-dim embeddings from text-embedding-3-small

Next.js Web App (secondary)
  |-- Landing page, video upload analysis, stats dashboard

Prerequisites

  • Python 3.11+
  • Node.js 20+ and pnpm
  • Docker (for Actian VectorAI DB)
  • yt-dlp (required for YouTube transcript extraction in backend /api/youtube/transcript)
  • Azure OpenAI resource with deployed models (see below)
  • Google Fact Check Tools API key

Quick Start

1. Start VectorAI DB

docker compose up -d

This starts Actian VectorAI DB with ports 8100 (HTTP, unused) and 50051 (gRPC).

2. Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your Azure OpenAI and Google Fact Check API keys

# Run the data pipeline (one-time setup)
python -m app.data.ingest           # Download LIAR dataset
python -m app.data.normalize        # Normalize claim labels
python -m app.data.embed_and_load   # Embed claims and load into VectorAI DB

# Start the API server
# NixOS users: set LD_LIBRARY_PATH for gRPC (see NixOS Notes below)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3. Extension (Chrome)

cd extension
pnpm install
pnpm build        # Production build to .output/chrome-mv3/

Load the extension:

  1. Open chrome://extensions/
  2. Enable Developer mode
  3. Click Load unpacked
  4. Select extension/.output/chrome-mv3/

For development with hot-reload:

pnpm dev

4. Web App (optional)

cd web
pnpm install
pnpm dev

Environment Variables

Copy backend/.env.example to backend/.env and fill in:

Variable Description
AZURE_OPENAI_API_KEY Azure OpenAI resource key
AZURE_OPENAI_ENDPOINT Azure OpenAI endpoint URL (e.g. https://eastus2.api.cognitive.microsoft.com/)
AZURE_OPENAI_API_VERSION API version (default: 2024-10-21)
AZURE_OPENAI_CHAT_DEPLOYMENT GPT-4o deployment name for LLM fallback
AZURE_OPENAI_EMBEDDING_DEPLOYMENT text-embedding-3-small deployment name
AZURE_OPENAI_WHISPER_DEPLOYMENT Whisper deployment name (optional, for video upload)
GOOGLE_FACTCHECK_API_KEY Google Fact Check Tools API key
CLAIMBUSTER_API_KEY ClaimBuster API key (optional, heuristic fallback used when absent)
CHECK_WORTHINESS_THRESHOLD Minimum score to fact-check a sentence (default: 0.5)
VECTORDB_URL VectorAI DB address (default: http://localhost:8100, not used for gRPC)

Getting API Keys

Azure OpenAI:

# Create resource and deploy models
az cognitiveservices account create -n <name> -g <rg> --kind OpenAI --sku S0 -l eastus2
az cognitiveservices account deployment create -n <name> -g <rg> \
  --deployment-name gpt4o --model-name gpt-4o --model-version 2024-08-06 --model-format OpenAI
az cognitiveservices account deployment create -n <name> -g <rg> \
  --deployment-name embed3small --model-name text-embedding-3-small --model-version 1 --model-format OpenAI

Google Fact Check Tools API:

gcloud services enable factchecktools.googleapis.com --project <project-id>
gcloud alpha services api-keys create --display-name="Mostly False FactCheck" --project <project-id>

Data Pipeline

The backend uses a multi-source dataset of fact-checked claims:

python -m app.data.ingest           # Downloads LIAR dataset (12,800+ claims)
python -m app.data.normalize        # Normalizes verdicts to TRUE/FALSE/MIXED/etc.
python -m app.data.embed_and_load   # Generates embeddings via Azure OpenAI,
                                    # loads into VectorAI DB (takes ~15 min with S0 tier)
python -m app.data.demo_cache       # Pre-caches results for demo scenarios

The embed step has retry logic for Azure rate limits (S0 tier: ~1 req/sec).

Fact-Check Pipeline

Each claim goes through a 4-stage pipeline:

  1. Check-Worthiness -- ClaimBuster API (or heuristic fallback). Sentences below threshold are skipped.
  2. VectorAI DB -- Semantic search over 12,779 embedded claims. Matches above 0.80 cosine similarity return a verified verdict.
  3. Google Fact Check API -- Searches existing human fact-checks. Returns verified verdict with source URL.
  4. LLM Fallback -- Azure OpenAI GPT-4o evaluates the claim. Returns ai_analysis verdict with confidence score.

Extension Features

  • YouTube -- Extracts video transcript from captions, fact-checks claims, shows toast notifications synced to video playback
  • Twitter/X -- Scans tweet text via MutationObserver, shows inline verdict badges below tweets
  • LinkedIn -- Scans post text in feed, shows inline verdict badges
  • Reddit -- Scans shreddit-post (and legacy post containers), fact-checks post title/body claims, and renders inline verdict badges
  • Scam highlighting -- Claims flagged as scams are rendered as SCAM with high-risk emphasis in badges/cards/toasts
  • Popup -- Truth Meter gauge, score summary, recent claims, disable-per-site toggle
  • Side Panel -- Full claim list with explanations, sources, confidence badges, and timeline-aware view for video content

Extension Build

cd extension
pnpm build              # Chrome → .output/chrome-mv3/
pnpm build:firefox      # Firefox → .output/firefox-mv2/
pnpm zip                # Chrome zip for distribution
pnpm zip:firefox        # Firefox zip for distribution

NixOS Notes

On NixOS, grpcio requires libstdc++.so.6 which isn't in the default path. The vectordb.py service handles this automatically by preloading the library via ctypes.CDLL. If running the backend manually:

export LD_LIBRARY_PATH=$(find /nix/store -path '*gcc-*-lib/lib/libstdc++.so.6' -exec file {} \; | grep 64-bit | head -1 | cut -d: -f1 | xargs dirname):$LD_LIBRARY_PATH
uvicorn app.main:app --reload --port 8000

Challenge Coverage

  • SafetyKit -- Misinformation detection and intervention via real-time fact-checking
  • Actian VectorAI DB -- 12,779 embedded claims with FAISS HNSW index for semantic retrieval
  • Sphinx AI -- Dataset pattern analysis and insights
  • Best Overall -- Multi-stage pipeline with production-grade architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors