SourceSauce – AI Source Credibility Analyzer

Inspiration

AI generated content (also known as "AI slop") is everywhere, and most people can't tell the difference between a credible source and a content farm. AI content detectors exist, but there's nothing that actually verifies the sources AI tools like ChatGPT cite during web searches. Every link looks equally trustworthy on the surface, and that felt like a real problem worth solving.

What It Does

SourceSauce is a Chrome extension that scores the credibility of every source an AI chatbot cites, in real time, without leaving the chat. It analyzes domain reputation, bias signals, and runs a machine learning model to detect AI-generated content, then attaches a color-coded badge to each link so you know what you're clicking before you click it.

How We Built It

The Chrome extension uses a MutationObserver to detect new links as they appear, intercepts fetch() calls and SSE streams to extract citation URLs, and proxies requests through background.js to avoid CORS issues. A popup dashboard shows all scored sources with full reasoning breakdowns.

The backend runs on a FastAPI server deployed on a DigitalOcean Ubuntu 24.04 droplet (4GB RAM, 2 vCPU). It handles URLs in parallel using ThreadPoolExecutor, scrapes and normalizes article text, and runs Hello-SimpleAI/chatgpt-detector-roberta for AI authorship scoring. A weighted credibility engine combines AI detection, domain trust scores across 350+ verified domains, content type, article length, and burstiness into a single final score. MongoDB Atlas caches results, cutting compute time by over 50% and keeping badge updates near instant.

Challenges We Ran Into

AI content detection accuracy was the biggest headache. Midway through, it became clear that no single model was reliable enough on its own, so I tested several alternatives and even looked into training a custom model. Time constraints eventually forced a rollback to RoBERTa, which gave the best tradeoff between accuracy and practicality given the deadline.

Accomplishments That We're Proud Of

Getting a full pipeline working end to end, from intercepting browser streams to running ML inference and serving cached results in real time, within a hackathon timeframe felt like a real win. The multi-signal scoring engine producing a single interpretable score is something we think makes the tool genuinely useful, not just technically interesting.

What We Learned

Combining imperfect signals into a reliable composite score is harder than expected. Each signal has its own failure modes, and calibrating the weighted engine took a lot of iteration. Real-world scraping is also significantly messier than any clean benchmark dataset.

What's Next for SourceSauce

Use or train a better AI detection model to push detection accuracy to 85-90%, sentence-level AI highlighting to show exactly which parts of an article seem machine-written, a dynamic API-driven domain trust list to replace the static registry, and multilingual support for non-English content.

Share this project:

Updates