⚖️JurAI: Multi-Agent Compliance Judge


💡Problem Statement

Automating Geo-regulation with LLM. Every product rollout risks non-compliance with region-specific laws, exposing companies to legal and reputational risks.


🛠️Functionality and Features

JurAI is a multi-agent compliance pipeline that simulates a jury of AI models. Each juror reviews a product feature, a critic challenges its reasoning, and a judge consolidates the final verdict. The system integrates Retrieval-Augmented Generation (RAG) to ground reasoning in real legislation as well as get additional context from past context.

Key Features

1. Dual RAG Context Retrieval

  • The first RAG, retrieves past verdicts for similar features
  • The second RAG, retrieves Relevant Region-Specific Legislation

2. Jury–Critic–Judge Pipeline

  • Multiple Jury Agents analyze the feature independently
  • Each Jury has its own Critic Agent that reviews, points out weaknesses, and forces revisions
  • A Judge Agent merges jury outputs, removes duplicates, and delivers one clean verdict
  • Runs with five parallel jurors and critics to refine compliance reports iteratively

3. Diversity of Models

  • Juries do not all run on the same LLM (e.g., one on DeepSeek, another on GPT-5-mini)
  • Critics and Judge can also mix models
  • Prevents a single model’s blind spots from dominating the outcome !!

4. Region-Aware Compliance

  • Supports EU and US jurisdictions (California, Florida, Utah, National)
  • Uses real ingested legislation for region-specific analysis

5. Structured and Consistent Outputs

  • Always produces structured JSON, never free-text
  • Every statement backed by a citation for auditability
  • Easy for both lawyers and automated pipelines to parse
  • Terminology glossary dynamically updated to keep reasoning consistent

6. Real-Time Streaming

  • Deliberation results are streamed live via Server-Sent Events (SSE) for immediate visibility and transparency

Results

  • The results for the given dataset are uploaded as a .csv file in the next section

🧑‍💻Technologies Used

Frontend

  • Next.js + TypeScript – interactive web-based UI for submitting features and viewing jury reports

Backend

  • FastAPI (Python) – REST API for feature detail processing and jury execution
  • sse-starlette – live-streaming of jury decisions

AI/ML Core

  • Google ADK Agents SDK – orchestration of jurors, critics, and judge
  • Google GenAI SDK – content handling & pipeline communication
  • OpenAI GPT-5 Mini – final judge + critic
  • DeepSeek Chat – juror agent
  • Qwen 3 235B, LLaMA 4 Maverick, Moonshot Kimi K2, Gemini 2.5 Flash – diverse juror/critic pairings for cross-model deliberation
  • LightRAG - past case ingestion and retrieval
  • LangChain-tools powered RAG - ingestion and retrival of legislative documents

Hosting and Deployment

  • Render - hosts backend server
  • Vercel - deploys frontend web-UI

📂Assets

Region-specific legislation files ingested into RAG

  • digital_services_act_wiki.txt (EU DSA)
  • USCA_SB976.txt (California SB976)
  • florida_state_law.txt (Florida privacy law)
  • Utah Social Media Regulation Act - Wikipedia.html
  • US law on reporting child sexual abuse content to NCMEC.txt

Reflection

🔨Challenges & Reflection

  • Designing a multi-agent debate structure that avoids infinite loops and is efficient yet still refines reasoning.
  • Managing RAG ingestion and retrieval pipelines across different regions and heterogeneous data formats (Wikipedia text, legislation PDFs, HTML).
  • Coordinating outputs from five different LLM families into a consistent final verdict.
  • Debugging streaming issues between frontend (Nextjs) and backend (FastAPI).
  • Debugging deployment related issues

🚀Next Steps

  • Expand to More Regions – extend compliance coverage beyond US/EU to APAC, LATAM, and other emerging regulatory markets.
  • Benchmark Against Human Reviews – compare jury verdicts with real compliance lawyer assessments to measure accuracy and reliability.
  • Batch Processing – support bulk feature submissions to reduce latency and lower API costs.
  • Interactive Feedback Loop – allow users to refine verdicts by providing additional context, clarifications, or corrections.
  • Customizable Jury – let users select which LLMs (OpenAI, DeepSeek, Qwen, LLaMA, Moonshot, Gemini) power their juries and critics directly from the UI.

Built With

+ 92 more
Share this project:

Updates