Argus — CMS Proactive Program Integrity

91% of eventually-revoked providers detected from billing patterns alone — before CMS acted on revocation.

CMS loses an estimated $60 billion annually to improper payments across Medicare and Medicaid. Current detection is largely reactive — fraud is identified after payments are made, forcing a costly "pay-and-chase" cycle. Argus is a decision-support system that identifies anomalous provider billing patterns, surfaces evidence-backed risk cases, and provides explainable scores that human reviewers can act on proactively, in real time.

Resource	Link
Live App	argus.precise-lab.com
GitHub	precisesoft/cms-fraud-detection
Architecture	docs/architecture-v3.md
Demo Script	docs/demo-script.md
Responsible AI	docs/responsible-ai-considerations.md
Path to Pilot	docs/path-to-cms-pilot.md

Judge access note: the live website is protected by HTTP basic auth during evaluation. If you have login issues, contact [email protected].

Judge Quick Links

Deliverable	Link
Demo Script	docs/demo-script.md
Architecture	docs/architecture-v3.md
Risk Scoring Methodology	docs/risk-scoring-methodology.md
Responsible AI Considerations	docs/responsible-ai-considerations.md
Isolation Forest Model Card	docs/model-card-isolation-forest.md
AI and Open Source Disclosure	docs/ai-oss-disclosure.md
Path to CMS Pilot	docs/path-to-cms-pilot.md
Architecture Diagrams	docs/diagrams/

Judging Snapshot

Criterion	Why Argus Scores Well	Evidence
Mission Relevance	Targets CMS improper payments with a proactive provider-risk workflow instead of post-payment recovery alone.	Problem framing, Validated Results, Investigation Workflow
Technical Soundness	End-to-end working system: ETL, scoring engine, fairness analysis, investigation UI, and live deployment.	Architecture, Live app, Demo script
Explainability & Responsible AI	Deterministic scoring, named signals, evidence provenance, fairness monitoring, and human-in-the-loop review.	Explainability and Responsible AI, Risk scoring methodology, Model card
Feasibility for Adoption	Uses public CMS data today, avoids PHI, deploys on AWS, and has a documented path from MVP to government pilot.	Feasibility for Government Adoption, Path to CMS Pilot
Innovation	Combines dual scoring, per-provider ML explainability, evidence graph views, and AI-assisted investigation.	Why Argus Stands Out, Risk scoring methodology, Architecture
Demo Clarity	Includes a guided narrative, live screens, and judge-ready supporting documents.	Judge Quick Links, Demo Script, Judge Deliverables

How It Works

  19GB real CMS data          13 explainable signals           Evidence-backed cases
 (4 public datasets)    ──>  (risk + legitimacy scoring)  ──>  for human investigators
                              + ML anomaly detection            + AI-generated narratives

Ingest — 19GB of real, public Medicare data: 9.66M service lines across 10,282 providers from data.cms.gov. No PHI. Core scoring and validation use real public CMS data; synthetic records are only used in separate admin/demo workflows.
Score — Every provider-service case receives two independent scores: a risk score (how anomalous vs. peers) and a legitimacy score (how many trust indicators exist). 13 named signals, each with a threshold, weight, and data-source citation. Plus an independent Isolation Forest anomaly score with per-provider feature importance.
Investigate — Analysts review flagged cases with full signal breakdowns, peer comparison charts, evidence graphs, and AI-generated narratives. The system explains; the human decides.

Validated Results

We didn't just build a scoring system — we validated it. We took all 335 revoked providers in our dataset, removed the revocation flag, and re-scored them using only behavioral signals.

Metric	Current Result
Revoked providers detected, blind provider-level	306 / 335 (91.34%)
Revoked cases detected, blind case-level	779 / 862 (90.37%)
Non-revoked provider baseline flagging	51.47%
Detection lift over non-revoked baseline	1.77x
Felony-related revocations detected	100%
Billing-abuse revocations detected	94.12%

Methodology: The retrospective validation endpoint (/api/validation) removes the revoked_provider signal and re-scores providers using only behavioral signals, including peer comparisons, enrollment context, charge patterns, and concentration metrics. In the current artifact, 306 of 335 revoked providers still land in the review or high_risk bands without using the revocation label.

Source of truth: data/validation/retrospective_results.json powers the API response and the judge-facing validation story.

Try it live: argus.precise-lab.com/api/validation
The site is behind HTTP basic auth during judging. If access fails, contact [email protected].

Why Argus Stands Out

Dual Scoring — Risk and Legitimacy

Traditional fraud detection produces a single risk score. Argus computes two independent scores for every case. A provider with high volume (risk signal) who is enrolled, Medicare-participating, and peer-aligned on all other metrics (legitimacy signals) won't be flagged — the legitimacy score contextualizes the risk. This reduces false positives and ensures providers aren't flagged on a single anomalous metric.

Per-Provider ML Explainability

The Isolation Forest anomaly model provides per-provider feature importance via leave-one-out approximation — not a global average, but which specific features drive this provider's anomaly score. Risk-increasing features shown in red, protective features in green, all computed in under 100ms.

Real-Time Scoring

Claims stream via Server-Sent Events, scored by the 13-signal engine in under 50 milliseconds. No batch jobs, no overnight processing — proactive detection as payments arrive.

Built-In Fairness Monitoring

A dedicated /api/fairness endpoint computes flagging rate disparities across geography and specialty using statistical parity difference, disparate impact ratio (EEOC four-fifths rule), and outlier detection. Configurable threshold. No demographic variables are used in scoring.

AI-Assisted Investigation

AWS Bedrock Claude powers three capabilities: text-to-SQL (analysts ask questions in plain English), risk narratives (structured signals summarized in plain language), and chat (conversational investigation). All AI output is advisory — the scoring engine is fully deterministic and AI-free.

Explainability and Responsible AI

Argus is designed as a decision-support system, not an autonomous enforcement engine.

Human in the loop: analysts review evidence packages, take explicit case actions, and retain final decision authority.
Explainable outputs: every case exposes the signals, thresholds, peer baselines, evidence provenance, and narrative explanation behind its score.
Responsible ML use: the deterministic scoring engine remains primary; the Isolation Forest model is supportive and documented through a formal model card.
Fairness monitoring: the /api/fairness workflow measures flagging disparities across state and specialty using statistical parity difference, disparate impact ratio, and outlier analysis.

Supporting docs:

Feasibility for Government Adoption

Argus is designed to deploy into a government environment with minimal rework.

Capability	Current Position
Data strategy	Operates on public CMS data today. Connecting agency claims feeds is a data-source swap, not an architectural rewrite.
Cloud and AI foundation	AWS-based deployment path with Bedrock-backed AI services; current architecture is aligned to a government deployment model and documents a GovCloud path.
Security posture	CI runs secrets scanning, SAST, dependency auditing, SBOM generation, and container scanning on every change.
Deployment path	AWS-based architecture with Terraform, ECR, EKS, and GitOps deployment via ArgoCD.
Governance	Audit logging, RBAC, deterministic scoring, and responsible AI documentation are already built into the system.
Integration path	Designed to complement CMS FPS and downstream UCM-style investigator workflows rather than replace them outright.
Pilot readiness	A documented MVP → pilot → production path exists in docs/path-to-cms-pilot.md.

Bottom line: the path from hackathon MVP to agency pilot is primarily a data connection and integration exercise, not a rebuild.

The Product

Dashboard — Aggregate Risk Overview

Total providers scored, risk distribution breakdown, geographic heatmap of state-level flagging patterns, and top-risk cases.

Live Payment Monitor — Real-Time Scoring(Disabled Due to High CPU utilization will do live demo)

Claims stream across a US map with pulsing risk dots. Each claim scored in <50ms. Click a flagged claim to investigate.

Provider Detail — Signal Breakdown

Full signal decomposition: which signals fired, how many points each contributed, peer baseline comparisons, evidence graph, ML anomaly score with per-provider feature importance.

Claims Simulator — Pre-Payment Screening

Submit a hypothetical claim and watch the scoring engine extract signals, compute risk + legitimacy, and generate an AI narrative — simulating what pre-payment screening would look like.

Fairness Dashboard — Bias Monitoring

Statistical parity and disparate impact metrics across states and specialties. Outlier detection flags systemic bias.

Investigation Workflow — Case Management

Triaged case queue with approve/flag/deny/escalate actions, audit trail, and AI chat sidebar for natural-language data queries.

Claims Detail — Service-Line Evidence

Each flagged claim has a dedicated case view with service-line details, z-score metrics, hybrid scoring context, and case-specific AI assistance.

Data Operations — Ingestion and Recalibration

Admin users can ingest raw CMS data, seed demo datasets for sandbox workflows, recalibrate deterministic scores, and retrain models from a tracked pipeline run.

Execution Metrics

These metrics reflect the repository state as of March 25, 2026.

Metric	Current Value
Issues tracked	239
Open issues	6
Pull requests opened	230
Pull requests merged	205
Open pull requests	1
Total commits in repository history	300
Commits on default-branch history	283

Argus was built through a disciplined AI-assisted delivery process with human review, automated CI/CD, and GitOps deployment. Full process detail lives in docs/development-process.md.

Delivery Discipline

Working software first: live product, live API, judge-ready diagrams, and a rehearsed demo script all exist in the repo.
Operational rigor: GitHub Actions runs quality, security, build, scan, release, and deploy stages; ArgoCD handles cluster sync.
Traceable execution: daily scoreboards in docs/agile/ record progress, blockers, and decisions across the sprint.
AI under review: AI-assisted development accelerated delivery, but final architecture, merge, and release decisions remained human-reviewed.

Architecture

Full specification: Architecture (v3)

Tech Stack

Layer	Technology	Purpose
Frontend	Vite + React 19 + TypeScript + Tailwind v4 + Recharts	12-page SPA with responsive design
Backend	Python 3.12 + FastAPI + psycopg (async)	14 REST endpoints, auto-documented
Database	PostgreSQL 16 (EKS StatefulSet, 20Gi gp3)	Relational queries, provider/case data
Graph	Neo4j 5 Community (EKS StatefulSet, 10Gi gp3)	Evidence relationships, network analysis
Scoring	Deterministic rule engine (13 signals) + Isolation Forest	Auditable, reproducible, peer comparison
AI	AWS Bedrock (Claude Sonnet 4.6 + Haiku 4.5)	Narratives, text-to-SQL, chat — FedRAMP High
ETL	DuckDB + Polars	19GB data pipeline
CI/CD	GitHub Actions → ECR → ArgoCD	8-stage unified pipeline with GitOps
Infra	AWS EKS + Istio + Terraform	Container-native, horizontally scalable

Data Flow

Scoring Engine

Deployment

All Diagrams

Diagram	Description
System Architecture	Full-stack component map
Deployment Architecture	CI/CD → EKS pipeline
Data Pipeline	19GB ETL flow
Scoring Engine	Dual scoring with signal provenance
Evidence Graph	Neo4j relationship model
AI Reasoning	Text-to-SQL + narrative flow
Demo User Journey	5-7 min demo script
Signal Taxonomy	Risk + legitimacy signal definitions
Fairness Evaluation	Responsible AI metrics pipeline
Path to CMS Pilot	MVP → Pilot → Production roadmap

Public Data Sources

All datasets are publicly available and currently downloadable. No PHI is used.

Active (used in scoring pipeline)

Dataset	Source	Use
Medicare Physician & Other Practitioners — by Provider and Service	data.cms.gov	Core billing patterns, service volumes, charges, peer baselines
Medicare Physician & Other Practitioners — by Provider	data.cms.gov	Provider-level totals (benes, services, payments)
Public Provider Enrollment	data.cms.gov	Enrollment status verification
Revoked Providers (Q1 2026)	data.cms.gov	Revocation flag for risk scoring

Reference only (not used in current pipeline)

Dataset	Source	Notes
OIG LEIE Exclusion List	oig.hhs.gov	Potential enrichment; weak NPI join coverage
Medicare Part D Prescribers	data.cms.gov	Potential prescribing-pattern enrichment

Documentation

Judge Deliverables

Deliverable	Document
Risk Scoring Methodology	docs/risk-scoring-methodology.md
Responsible AI Considerations	docs/responsible-ai-considerations.md
AI & Open Source Disclosure	docs/ai-oss-disclosure.md
Path to CMS Pilot (5-min brief)	docs/path-to-cms-pilot.md
Demo Script (5-7 min)	docs/demo-script.md
Isolation Forest Model Card	docs/model-card-isolation-forest.md
Development Process	docs/development-process.md
Architecture (v3)	docs/architecture-v3.md
Architecture Diagrams	docs/diagrams/
User Personas	docs/personas.md

Additional Documentation

Problem statement

Pre-sprint research (historical)

Quickstart

# Clone
git clone https://github.com/precisesoft/cms-fraud-detection.git
cd cms-fraud-detection

# Start all services (Postgres, Neo4j, API, Frontend)
docker compose up -d

# --- Backend ---
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest --cov=src -q                    # Run backend tests
uvicorn src.api.app:create_app --factory --host 0.0.0.0 --port 8000

# --- Frontend ---
cd frontend
npm install
npm run dev                            # http://localhost:3000
npm test                               # Run frontend tests

# API docs: http://localhost:8000/docs

Sprint Timeline

Phase	Due	Status	Key Deliverables
Phase 0: Project Spine	Mar 14	Done	Monorepo, CI/CD, Dockerfiles, branch protection
Phase 1: Data Foundation	Mar 18	Done	19GB ETL, 13K cases + 10K providers, Neo4j graph
Phase 2: Scoring + API	Mar 20	Done	Scoring engine, all REST endpoints, peer baselines
Phase 3: AI Signals	Mar 22	Done	Text-to-SQL, risk narratives, anomaly detection
Phase 4: User Interface	Mar 24	Done	Claims simulator, investigation workflow, chat sidebar
Phase 4b: Live Monitor	Mar 24	Done	SSE real-time payment monitor, ML explainability UI
Phase 5: Ship	Mar 25	Done	Demo script, AI/OSS disclosure, judge access

Demo Day: March 27, 2026 — Reston, Virginia

Team

Arun Sanna — Lead, AI/ML Engineering, Architecture
Bibek Poudel — Backend, Infrastructure
Rahul Vadera
Reaz Rahman

Name		Name	Last commit message	Last commit date
Latest commit History 316 Commits
.github		.github
data		data
db		db
docs		docs
frontend		frontend
k8s		k8s
manifests		manifests
scripts		scripts
src		src
terraform		terraform
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Argus — CMS Proactive Program Integrity

Judge Quick Links

Judging Snapshot

How It Works

Validated Results

Why Argus Stands Out

Dual Scoring — Risk and Legitimacy

Per-Provider ML Explainability

Real-Time Scoring

Built-In Fairness Monitoring

AI-Assisted Investigation

Explainability and Responsible AI

Feasibility for Government Adoption

The Product

Dashboard — Aggregate Risk Overview

Live Payment Monitor — Real-Time Scoring(Disabled Due to High CPU utilization will do live demo)

Provider Detail — Signal Breakdown

Claims Simulator — Pre-Payment Screening

Fairness Dashboard — Bias Monitoring

Investigation Workflow — Case Management

Claims Detail — Service-Line Evidence

Data Operations — Ingestion and Recalibration

Execution Metrics

Delivery Discipline

Architecture

Tech Stack

Data Flow

Scoring Engine

Deployment

All Diagrams

Public Data Sources

Active (used in scoring pipeline)

Reference only (not used in current pipeline)

Documentation

Judge Deliverables

Additional Documentation

Quickstart

Sprint Timeline

Team

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages