Logo
Feature Input Screen
Individual Juror Report
Final Juror Report

⚖️JurAI: Multi-Agent Compliance Judge

💡Problem Statement

Automating Geo-regulation with LLM. Every product rollout risks non-compliance with region-specific laws, exposing companies to legal and reputational risks.

🛠️Functionality and Features

JurAI is a multi-agent compliance pipeline that simulates a jury of AI models. Each juror reviews a product feature, a critic challenges its reasoning, and a judge consolidates the final verdict. The system integrates Retrieval-Augmented Generation (RAG) to ground reasoning in real legislation as well as get additional context from past context.

Key Features

1. Dual RAG Context Retrieval

The first RAG, retrieves past verdicts for similar features
The second RAG, retrieves Relevant Region-Specific Legislation

2. Jury–Critic–Judge Pipeline

Multiple Jury Agents analyze the feature independently
Each Jury has its own Critic Agent that reviews, points out weaknesses, and forces revisions
A Judge Agent merges jury outputs, removes duplicates, and delivers one clean verdict
Runs with five parallel jurors and critics to refine compliance reports iteratively

3. Diversity of Models

Juries do not all run on the same LLM (e.g., one on DeepSeek, another on GPT-5-mini)
Critics and Judge can also mix models
Prevents a single model’s blind spots from dominating the outcome !!

4. Region-Aware Compliance

Supports EU and US jurisdictions (California, Florida, Utah, National)
Uses real ingested legislation for region-specific analysis

5. Structured and Consistent Outputs

Always produces structured JSON, never free-text
Every statement backed by a citation for auditability
Easy for both lawyers and automated pipelines to parse
Terminology glossary dynamically updated to keep reasoning consistent

6. Real-Time Streaming

Deliberation results are streamed live via Server-Sent Events (SSE) for immediate visibility and transparency

Results

The results for the given dataset are uploaded as a .csv file in the next section

🧑‍💻Technologies Used

Frontend

Next.js + TypeScript – interactive web-based UI for submitting features and viewing jury reports

Backend

FastAPI (Python) – REST API for feature detail processing and jury execution
sse-starlette – live-streaming of jury decisions

AI/ML Core

Google ADK Agents SDK – orchestration of jurors, critics, and judge
Google GenAI SDK – content handling & pipeline communication
OpenAI GPT-5 Mini – final judge + critic
DeepSeek Chat – juror agent
Qwen 3 235B, LLaMA 4 Maverick, Moonshot Kimi K2, Gemini 2.5 Flash – diverse juror/critic pairings for cross-model deliberation
LightRAG - past case ingestion and retrieval
LangChain-tools powered RAG - ingestion and retrival of legislative documents

Hosting and Deployment

Render - hosts backend server
Vercel - deploys frontend web-UI

📂Assets

Region-specific legislation files ingested into RAG

digital_services_act_wiki.txt (EU DSA)
USCA_SB976.txt (California SB976)
florida_state_law.txt (Florida privacy law)
Utah Social Media Regulation Act - Wikipedia.html
US law on reporting child sexual abuse content to NCMEC.txt

Reflection

🔨Challenges & Reflection

Designing a multi-agent debate structure that avoids infinite loops and is efficient yet still refines reasoning.
Managing RAG ingestion and retrieval pipelines across different regions and heterogeneous data formats (Wikipedia text, legislation PDFs, HTML).
Coordinating outputs from five different LLM families into a consistent final verdict.
Debugging streaming issues between frontend (Nextjs) and backend (FastAPI).
Debugging deployment related issues

🚀Next Steps

Expand to More Regions – extend compliance coverage beyond US/EU to APAC, LATAM, and other emerging regulatory markets.
Benchmark Against Human Reviews – compare jury verdicts with real compliance lawyer assessments to measure accuracy and reliability.
Batch Processing – support bulk feature submissions to reduce latency and lower API costs.
Interactive Feedback Loop – allow users to refine verdicts by providing additional context, clarifications, or corrections.
Customizable Jury – let users select which LLMs (OpenAI, DeepSeek, Qwen, LLaMA, Moonshot, Gemini) power their juries and critics directly from the UI.