Privacy Forensics

Inspiration

We've all done it - clicked "I Agree" on privacy policies we'll never read. But what if buried in those 10,000 words is permission to sell your data or collect your biometric information?During our team discussion about TikTok's privacy concerns, we realized something: everyone talks about privacy, but there's no easy way for regular people to actually evaluate it. We wanted to change that. What if you could get an instant privacy risk score before clicking accept?

What it does

Privacy Forensics gives you X-ray vision into privacy policies. Just go to any policy webpage, get a gist of risks and for deep analysis get: Risk scores (0-100) across four categories: Data Resale, Biometric Collection, Indefinite Retention, and Vague Language AI-powered analysis that validates findings against GDPR regulations with specific article citations Visual dashboard showing exactly which clauses are risky and why Browser extension that auto-detects privacy policies and shows risk badges in real-time

How we built it

Frontend: React + Vite, Tailwind CSS, Chart.js for visualizations Backend: Flask API with two-phase analysis:

Regex patterns that catch sophisticated legal language (not just "sell data" but "share with analytics partners") AI validation using GPT-4o via Backboard.io to verify findings against GDPR

Browser Extension: Chrome extension with content scripts that detect privacy policies and communicate with our API The tricky part was catching companies that avoid trigger words. Facebook doesn't say "we sell your data" - they say "we provide analytics partners with insights." We built patterns to catch these euphemisms.

Challenges we ran into

The legal language problem: Companies hire lawyers specifically to avoid privacy red flags. We had to evolve from simple keywords to contextual patterns. Facebook initially scored 10.8 (way too low!) because we weren't catching their carefully-worded ad monetization.

The WhatsApp paradox: After improving our patterns, WhatsApp scored 38.4 despite having great privacy practices. We learned that vague legal language doesn't equal bad privacy - a company can have terrible policies but use specific words, or vice versa.

Accomplishments that we're proud of

Built a working system in 24 hours that actually differentiates privacy practices (4.2 vs 30.5 vs 65.3) Solved the sophisticated language problem - we catch phrases like "legitimate business interests" that other tools miss Created both a web app AND browser extension AI integration that provides real value with specific GDPR citations, not just buzzword compliance Professional UI that makes complex regulatory analysis actually understandable

What we learned

Technical: Regex can be surprisingly powerful when you think semantically, not just literally. Prompt engineering matters more than we thought. React + Vite is fast and fun. Privacy: GDPR Article 5(1)(b) is now burned into our brains. Companies use euphemisms like "analytics" and "service providers" to hide data sharing. Privacy practices exist on a spectrum, not a binary. Product: Real-world testing is everything - policies from actual companies revealed edge cases we never imagined. Users need simple scores first, details second.

What's next for Privacy Forensics

Multi-language support Mobile app with QR code scanning Historical tracking ("this company's score dropped 23 points last month") Real-time alerts when policies change Side-by-side policy comparison