Polao

Dashboard
Project Architecture
Bias
Summary

Inspiration

In today’s world, American news consumption feels more polarized than ever—and research consistently shows that perceived media bias erodes trust and drives echo chambers. Polao is our answer: a place to line up multiple perspectives side-by-side, compare coverage, surface bias signals, and help people form their own informed opinion. “Polao” (rice!) is about balanced portions: a daily, nutritious news diet.

What it does

Clusters a story and surfaces Left / Center / Right coverage in one view.

Computes a Bias Support Meter (average lean across sources).

Generates a neutral summary plus counter-bias summaries (“what you’d think if you only read left/right”).

Framing Lens toggle: highlights loaded language and rewrites headlines neutrally.

AI Copilot: chat over just the clustered articles (RAG) with citations.

Daily newsletter: a balanced digest with L/C/R links and the bias meter.

Auth: secure login, role-based gates for admin tools, and user preferences.

How we built it

Frontend: Next.js (App Router) + Tailwind; responsive, accessible UI. Landing page, cluster view, framing highlighter, newsletter preview.

Data & NLP: Ingestion from News APIs → normalize → de-dup by title/domain cosine.

Embeddings:Google Gemini text embeddings for high-recall semantic search and topic features (big-dataset friendly).

Bias detector: BERT Transformer classifier (HF) for article-level stance/bias; final score = 0.6 * source_prior + 0.4 * BERT_stance.

Clustering: Gemini embeddings → K-Means (mini-batch for scale) to form story clusters across large corpora.

Summaries & Chatbot: Gemini API (prompt-grounded to the cluster) returns neutral + counter-bias summaries and powers the RAG chatbot; all answers must cite 2–3 URLs from the cluster.

Framing Lens: lexicon + POS patterns to highlight emotive/hedge/blame language; optional headline “neutralizer” rewrite check with NLI.

Storage & Jobs: Supabase/Firestore; CRON workers for ingest/cluster/newsletter; cached hot topics.

Auth Auth0 Universal Login with passkey/biometric support, email verification, and RBAC for admin features.

Challenges we ran into

Learning ML from scratch: We had zero ML background. Figuring out embeddings, BERT, evaluation—and training our own bias/stance model—was the steepest curve.

Big, clean dataset: Finding a large, relevant corpus and cleaning/de-duping/labeling it took time but was crucial for accuracy.

Clustering + event grouping: Getting stable K-Means clusters and pulling all articles for the same event/topic meant diving deep into multiple API docs and lots of tuning.

First-time Auth0 with MFA/biometrics: Our first Auth0 integration—wiring Universal Login, MFA, and passkeys—was tricky but now solid.

Accomplishments that we're proud of

Shipped our own model overnight: We built and trained a bias/stance model in one night, hitting strong accuracy on our training set after rapid iteration and cleanup.

Actionable bias signal, not just labels: We combined Gemini embeddings + BERT stance with source priors to turn bias into an explainable score (left/center/right) with citations—so users see why an article leans a certain way.

A working Gemini-powered pipeline (embeddings + chatbot + summaries) that scales to large article sets.

What we learned

We learned to start simple and scale: Gemini embeddings + K-Means gave us fast, reliable clustering, and layering a BERT stance model with source priors added the nuance needed to reflect bias realistically. Along the way, we discovered that data hygiene beats tricks—cleaning, de-duping, and labeling a large corpus improved accuracy more than any fancy tweak. We also saw that great ML still needs great plumbing: rate limits, caching, and background jobs mattered just as much as the models for a smooth UX. Most importantly, building and training our own model—end to end—taught us how to balance practicality with rigor under tight time pressure.