Verifiable Agent Demo

This repository is the walkthrough demo for the execution-evidence path.

It is the guided walkthrough surface across the stack, not the canonical architecture hub and not the canonical evidence-profile spec.

Navigation

Architecture -> digital-biosphere-architecture
Evidence -> agent-evidence
Audit -> aro-audit

Fastest runnable path

This repo proves the path, agent-evidence is the evidence substrate, and aro-audit is the audit control plane.

Fastest local run:

python3 -m demo.agent

Fastest enterprise sandbox artifact chain:

python3 examples/enterprise_sandbox_demo/run.py

The sandbox run writes artifacts/enterprise_sandbox_demo/ with:

intent.json
policy.json
trace.jsonl
sep.bundle.json
replay_verdict.json
audit_receipt.json

Start here

Depends on

Status

active walkthrough demo
research annexes remain secondary to the demo path
not a canonical implementation repo

Shared doctrine:

Sandbox controls execution; portable evidence verifies execution.

Governance decides what should be allowed.
Execution integrity proves what actually happened.
Audit evidence exports artifacts for independent review.

flowchart LR
    Persona["Persona (POP)"] --> Intent["Intent Object (AIP)"]
    Intent --> Governance["Governance Check"]
    Governance --> Trace["Execution Trace"]
    Trace --> Audit["Audit Evidence (ARO)"]

What this demo proves

a portable persona-oriented entry point can be projected into runtime
explicit intent and action objects can be emitted before execution
result objects can be emitted after execution
execution steps can be recorded as inspectable evidence
audit-facing artifacts can be exported as bounded outputs

Architecture Path in this Demo

Persona Layer -> POP-aligned persona context carried into the run
Interaction Layer -> intent, action, and result objects emitted under interaction/
Governance Layer -> referenced as the control checkpoint for runtime policy and budget constraints
Execution Integrity Layer -> runtime execution trace and verifiable execution context
Audit Evidence Layer -> ARO-style exported evidence artifacts

This repository does not claim a full Token Governor integration. It demonstrates a minimal aligned path across the broader stack, with explicit governance checkpoint references in the emitted interaction and result objects.

It now also includes one fixed enterprise sandbox artifact chain for the scenario organize client visit notes -> generate weekly report -> request approval, while still not claiming a general full-stack Token Governor integration.

How to read this demo

This demo is a guided path across layers. It is not the normative specification for each layer, and it points outward to the canonical repositories for those layers: digital-biosphere-architecture, persona-object-protocol, agent-intent-protocol, token-governor, and aro-audit.

Execution Evidence Demo Note

See docs/execution-evidence-demo-note.md.

Expected Artifacts

Repo-tracked sample bundle:

interaction/intent.json
interaction/action.json
interaction/result.json
evidence/example_audit.json
evidence/result.json
evidence/sample-manifest.json

Additional tracked example:

evidence/crew_demo_audit.json

Current concrete examples in this repository include:

docs/quick-walkthrough.md
docs/interaction-flow.md
docs/shortest-validation-loop.md

Run the Demo

Scripted wrapper

bash scripts/run_demo.sh

This local wrapper writes fresh output under artifacts/demo_output/.

Fastest external demo path

bash scripts/run_demo.sh
make killer-demo
python3 -m http.server --directory docs 8000

The receipt for the enterprise sandbox chain is checked through the canonical ARO surface aro_audit.receipt_validation with the minimal profile.

Existing CrewAI demo path

bash scripts/setup_framework_venv.sh
.venv/bin/python crew/crew_demo.py

Environment notes:

Python 3 is sufficient for the minimal local path.
Refresh the tracked deterministic sample bundle with python3 scripts/refresh_demo_samples.py.
The optional CrewAI and LangChain paths should run from a git-ignored local .venv/ created by scripts/setup_framework_venv.sh.
The pinned framework helper environment currently uses crewai 1.10.1, langchain 1.2.12, and langchain-core 1.2.18.
CrewAI currently requires Python <3.14.
Both demo paths use deterministic local mock data and do not require external API calls.

Repository Automation

The Mermaid render workflow opens PRs to main only through a dedicated GitHub App.
Configure repository variable PROTOCOL_BOT_APP_ID and repository secret PROTOCOL_BOT_PRIVATE_KEY under Settings -> Secrets and variables -> Actions.
The default repository GITHUB_TOKEN remains read-only and is not used for auto-PR promotion.

Research Evaluation Annex

This repository now includes a paper-ready evaluation harness for Execution Evidence Architecture for Agentic Software Systems: From Intent Objects to Verifiable Audit Receipts.

Primary entry points:

make eval-baseline
make eval-evidence
make eval-external-baseline
make eval-framework-pair
make eval-langchain-pair
make eval-ablation
make falsification-checks
make human-review-kit
make review-sample
make compare
make paper-eval
make top-journal-pack

Supporting material:

Generated outputs:

artifacts/runs/<task_id>/<mode>/
docs/paper_support/comparison-summary.md
docs/paper_support/comparison-summary.csv
artifacts/metrics/comparison-summary.json
docs/paper_support/external-baseline-summary.md
docs/paper_support/framework-pair-summary.md
docs/paper_support/langchain-pair-summary.md
docs/paper_support/ablation-summary.md
docs/paper_support/falsification-summary.md
artifacts/human_review/synthetic-review-summary.json

Research Manuscript Draft

The repository also includes a manuscript draft grounded in the current implemented harness and checked-in metrics:

paper/latex/README.md
paper/latex/main.tex
paper/latex/main.pdf after local compilation

Related Repositories

digital-biosphere-architecture - system overview and canonical architecture hub
persona-object-protocol - portable persona object layer
agent-intent-protocol - semantic interaction layer
token-governor - runtime governance and budget-policy control layer
aro-audit - audit evidence and conformance-oriented verification layer

Minimal Reference Surface

interaction/ for explicit interaction objects
evidence/ for audit and result artifacts
demo/ and crew/ for runnable entry points
integration/ for persona and intent adapters
docs/spec/ for schema notes and example payloads

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
adapters		adapters
artifacts		artifacts
crew		crew
demo		demo
docs		docs
evaluation		evaluation
evidence		evidence
examples		examples
integration		integration
interaction		interaction
outreach		outreach
paper/latex		paper/latex
paper_eval		paper_eval
poster		poster
schemas		schemas
scripts		scripts
submission/ase2026		submission/ase2026
tests		tests
verifiable_agent		verifiable_agent
.gitignore		.gitignore
DEMO_RECEIPT_VALIDATION_REPORT.md		DEMO_RECEIPT_VALIDATION_REPORT.md
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
requirements-frameworks.txt		requirements-frameworks.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verifiable Agent Demo

Navigation

Fastest runnable path

Start here

Depends on

Status

What this demo proves

Architecture Path in this Demo

How to read this demo

Execution Evidence Demo Note

Expected Artifacts

Run the Demo

Scripted wrapper

Fastest external demo path

Existing CrewAI demo path

Repository Automation

Research Evaluation Annex

Research Manuscript Draft

Related Repositories

Minimal Reference Surface

Further Reading

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Verifiable Agent Demo

Navigation

Fastest runnable path

Start here

Depends on

Status

What this demo proves

Architecture Path in this Demo

How to read this demo

Execution Evidence Demo Note

Expected Artifacts

Run the Demo

Scripted wrapper

Fastest external demo path

Existing CrewAI demo path

Repository Automation

Research Evaluation Annex

Research Manuscript Draft

Related Repositories

Minimal Reference Surface

Further Reading

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages