AI builds, tests, and verifies your Compose app
ComposeProof is an MCP server that gives AI assistants the full loop: render UI headlessly, mock backend APIs on-device, generate edge-case tests, and verify everything — no emulator needed.
Works with any MCP client
The problem with AI + Android today
AI coding assistants are powerful at writing code, but blind when it comes to UI. The feedback loop is broken.
AI writes code but can't verify it works
Your AI assistant writes Compose UI code but never sees the result. Empty lists, overflowing text, broken error states — all shipped to QA because nobody tested the edge cases.
Backend state is untested visually
90% of apps depend on API responses. What does your screen look like when the API returns an error? An empty list? 500 items? Nobody mocks the backend to find out.
Edge-case testing is manual or nonexistent
Paparazzi and Roborazzi test the states you write. But who writes the empty string test? The RTL test? The 200-character name test? Nobody generates edge cases automatically.
Three layers. End-to-end coverage.
From headless rendering to API mocking to in-process inspection — AI gets the full feedback loop, not just screenshots.
Headless Rendering
Build-time / CI
AI analyzes composable types, generates edge-case @Preview functions, renders them headlessly, and verifies — no device needed.
API Mocking & Device
Run-time / Zero-install
Mock any API endpoint on a live device — zero app code changes. AI controls what the backend returns, then screenshots the result.
Embedded Agent
In-process / SDK
An in-app debug SDK gives the AI direct access to app internals — permissions, lifecycle, navigation, DataStore, coroutines — from inside the process.
See it in action
A real session testing StickerExplode — a Compose Multiplatform sticker canvas app.
Generated Health Report
One command generates a self-contained HTML report — assertions, screenshots, compose intelligence, and context graph. Share it, archive it, diff it.
Token Cost
| MCP Tool Call | Input | Output |
|---|---|---|
| insights (project overview) | ~200 | ~500 |
| list_previews (discover 4 @Preview) | ~200 | ~300 |
| take_device_screenshot (before) | ~200 | ~800 |
| device_interact (swipe drag) | ~300 | ~800 |
| take_device_screenshot (after) | ~200 | ~800 |
| get_context (scope=structure) | ~200 | ~2,000 |
| generate_report | ~300 | ~500 |
| Total | ~1,600 | ~5,700 |
| Grand total | ~7,300 tokens | |
Full project analysis — discover, screenshot, interact, and report — for under 7,500 tokens. That's less than reading a single large file.
Session Flow
This isn't a mock — this is a real session captured from Claude Code + ComposeProof.
Watch the demo videos
One prompt, zero human intervention. AI builds the app, discovers UI patterns, tests interactions, and reports results.
Recorded on a Pixel 9 Pro Fold with StickerExplode demo app.
Build, deploy & launch
AI runs preflight to detect the device, picks the right Gradle variant, builds, installs, and launches — all from a single natural language prompt.
AI discovers the interaction model
The FAB tap fails, so the AI reads source code, discovers the FAB opens a bottom sheet sticker tray, and switches strategy. Every action auto-screenshots.
Drag testing with coordinate recalculation
ADB swipe doesn't trigger Compose gestures. The AI switches to raw touch events, detects resolution mismatch, recalculates coordinates, and successfully drags stickers.
Results & persistent screenshots
AI produces a structured test report (3/3 pass), notes z-ordering and animations. Every screenshot from the session is saved to disk for human review.
Full uncut session
6 minutes — from prompt to test report, no edits.
40 tools shipped. More coming.
Every tool is an MCP endpoint your AI can call. Edge-case testing, API mocking, headless rendering, device interaction, and more.
insights Project overview — preview count, golden coverage, device status
render Render any @Preview headlessly via Roborazzi → PNG
list_previews Discover all @Preview functions — file, line, params
verify Single-call PASS/FAIL: render + golden + accessibility
render_batch Render/verify multiple previews with compact summary
diff Golden management: verify, record, or update baselines
preflight Check device + app state — connected, installed, screen
inspect_ui_tree Dump live Compose/View hierarchy with a11y warnings
device_interact Tap, swipe, type, scroll — AI navigates the app
get_recomposition_stats Find recomposition hotspots via compiler metrics
take_device_screenshot Capture device screen, auto-saved to disk
build_and_deploy Gradle build + install APK on device
get_build_status Check build success and APK version match
get_network_logs Capture OkHttp HTTP traffic from logcat
manage_proxy Set/clear device HTTP proxy
get_feature_flags Read/write SharedPreferences
inspect_permissions Runtime permissions — granted, denied, rationale needed
inspect_process_lifecycle Activity/Fragment lifecycle states for all components
inspect_navigation_graph Navigation graph, back stack, deep link patterns
inspect_datastore Jetpack DataStore preferences — all keys and values
inspect_coroutine_state Active coroutines — state, dispatchers, job hierarchy
execute_deeplink Fire a deep link URI and report which handler resolved
simulate_process_death Recreate Activity to test save/restore state handling
accessibility-checker WCAG 2.1 audit — touch targets, contrast, TalkBack
compose-performance Recomposition traps — unstable params, lambda allocations
kmp-architect KMP architecture — shared code, expect/actual patterns
ui-reviewer Visual quality — spacing, typography, Material 3 compliance
screenshot-test-writer Generate test code for Paparazzi, Roborazzi, or goldens
spec-verifier Full spec-driven verification — parse spec, map, verify, report
generate_edge_cases Analyze composable types and suggest edge-case @Preview tests
mock_api Start/stop mock API server — intercept real API calls on device
semantic_ui_query Query Compose UI tree by semantic role, text, or test tag
profile_lazy_list Profile LazyColumn/LazyRow scroll performance
inspect_compose_state Read remembered/derived state values from live composables
track_recompositions Count recompositions per composable per frame
analyze_stability Report stability classification of composable params
inspect_shared_preferences Read/write SharedPreferences from inside the app
inspect_viewmodel_state Snapshot ViewModel state fields and StateFlow values
inspect_current_screen Get current screen route and visible composables
inspect_network_logs In-process HTTP traffic capture via OkHttp interceptor
detect_memory_leaks
planned
LeakCanary heap analysis with reference chains
profile_startup
planned
Cold/warm/hot start breakdown with bottlenecks
Up and running in 3 steps
Zero-install architecture. No changes to your project's build files.
Run the setup wizard
One command. Installs the binary, picks your AI agents (Claude Code, Gemini CLI, Cursor), configures MCP, and optionally installs the Compose UI skill.
npx composeproof
AI builds and renders
Your AI assistant writes code, renders previews headlessly, generates edge-case tests, and iterates on UI autonomously.
You: "build the order screen" AI: writes code → renders 12 edge cases → PASS ✓
Verify end-to-end
Mock backend APIs on a real device, screenshot every state, and confirm the full loop works. No Postman, no manual QA.
AI: mock_api GET /orders → empty, error, 500 items
take_screenshot × 3 states
✓ all states render correctly AI Skill
Teach AI the render-review-fix loop
ComposeProof ships a skill — a behavioral instruction that teaches AI agents how to do Compose UI work, not just gives them tools.
Without the skill
Default AI behavior
With the skill loaded
AI knows the rules
Render after every change
AI checks its own work visually, not just at the end
Read screenshots critically
Checklist: layout, spacing, contrast, states, overflow
Mock and verify APIs
Swap backend responses on-device, screenshot every state
Recover from errors
Decision tree for failures instead of flailing
The full loop, not just screenshots
AI writes the code, renders it headlessly, generates edge-case tests, mocks backend APIs on a real device, and screenshots every state. One prompt, full verification.
Works with Claude Code, Gemini CLI, Cursor, and any MCP client. Same 40 tools, same skill, every agent.
$ claude "test OrderScreen with every edge case" → generate_edge_cases OrderScreen.kt → render_batch 12 previews → mock_api GET /orders → empty, error, 500 items → take_screenshot × 3 device states PASS 15/15 — 0 overflows, 0 crashes
Roadmap
40 tools shipped across 5 waves. Edge-case testing & API mocking shipped. Multiplatform next.
Headless Rendering & Verification
Readyrenderlist_previewsverifyrender_batchdiffinsights Device Inspection & Interaction
Readypreflightinspect_ui_treedevice_interacttake_device_screenshotbuild_and_deployget_build_statusget_recomposition_statsget_network_logsmanage_proxyget_feature_flags Embedded Agent — Runtime Inspection
Readyinspect_permissionsinspect_process_lifecycleinspect_navigation_graphinspect_datastoreinspect_coroutine_stateexecute_deeplinksimulate_process_death AI Skills & Session Learning
Readynpx composeproof installercompose-ui-workflow skillMCP instructionsexpert prompts (6)license gating Edge-Case Testing & API Mocking
Readygenerate_edge_casesmock_apicomposable type analysisWireMock integrationADB proxy orchestrationstale proxy safety Multiplatform & CI
PlannedCMP renderingGradle pluginGitHub ActionHTML reportsSSE transport Get started
One command. Works on any Compose project, no build file changes.
- ✓ Headless rendering + screenshots
- ✓ Golden diffing + visual regression
- ✓ Device interaction + inspection
- ✓ HTML reports + expert prompts
- ✓ ~30 MCP tools, all open source
- ✓ Everything in Free
- ✓ Compose Intelligence (5 tools)
- ✓ Embedded Agent (11 tools)
- ✓ Companion app
- ✓ Session screenshots in reports