Skip to content

metacogdev/testwall

Repository files navigation

testwall

Enforce test immutability for agentic TDD workflows.

LLM coding agents routinely cheat test gates — weakening assertions, deleting failing tests, modifying config, or special-casing inputs. Research (ImpossibleBench, arxiv 2510.20270) shows frontier models exploit test cases 76% of the time when given write access, but cheating drops to near zero when tests are read-only. testwall enforces that boundary.

How it works

testwall init       # snapshot test files + compute SHA-256 checksums
testwall lock       # chmod 444 — agent can read but not modify
testwall run        # restore from snapshot, then execute tests
testwall verify     # check checksums — exit 1 on any mismatch
testwall accept     # verify + unlock + clean up snapshot

Even if an agent bypasses file permissions, testwall run restores the original tests from snapshot before executing them. testwall verify catches any tampering at the checksum level.

Install

# Rust
cargo install testwall

# Python
pip install testwall

# Node
npm install -g testwall

Quick start

# 1. Initialize — snapshots all test files matching default patterns
testwall init

# 2. Lock test files before handing off to an implementing agent
testwall lock

# 3. Agent implements... then run tests against the immutable snapshot
testwall run

# 4. If tests pass and nothing was tampered with, accept the result
testwall accept

Commands

testwall init [-p PATTERN...] [-c CMD]

Scan for test files, compute checksums, and store snapshots in .testwall/.

Without -p, uses built-in patterns for Python, Rust, JavaScript/TypeScript, Go, Java, and Kotlin — plus common config files like pytest.ini, jest.config.*, and .cargo/config.toml.

testwall init                              # auto-detect
testwall init -p "tests/**/*.py" -p "conftest.py"  # explicit patterns
testwall init -c "pytest -x"              # record the test command

testwall lock

Set all snapshotted test files to read-only (chmod 444).

testwall unlock

Restore write permissions on test files.

testwall run [-c CMD] [-- extra args]

Restore test files from snapshot, then execute the test runner. This is the tamper-proof execution path — even if the agent modified the working copies, the originals run.

testwall run                    # use command from init or auto-detect
testwall run -c "pytest"        # override test command
testwall run -- -x --no-header  # forward args to test runner

testwall verify [--report-only]

Compare current test file checksums against the manifest. Exits with code 1 if any file was modified or deleted.

testwall verify                 # fail on mismatch
testwall verify --report-only   # print report, always exit 0

testwall accept

The merge gate. Runs verification, then unlocks files and cleans up the snapshot directory. Rejects if any tampering is detected.

testwall status

Show the current manifest: file count, lock state, snapshot presence, patterns, and test command.

Default patterns

testwall ships with patterns for common test conventions:

Ecosystem Patterns
Python test_*.py, *_test.py, tests/**/*.py, conftest.py
Rust tests/**/*.rs
JS/TS **/*.test.{js,ts,tsx}, **/*.spec.{js,ts,tsx}
Go **/*_test.go
Java/Kotlin src/test/**/*.java, src/test/**/*.kt
Config pytest.ini, setup.cfg, jest.config.*, vitest.config.*, .cargo/config.toml

Typical workflow

  You (test author)          testwall            Agent (implementer)
  ─────────────────          ────────            ───────────────────
  Write tests
          ├──── testwall init ────►
          ├──── testwall lock ────►
          │                              Agent implements code
          │                              Agent tries to edit tests → DENIED
          │                              Agent runs testwall run
          │                        ◄──── tests execute from snapshot
          │                              Tests pass
          ├──── testwall accept ──►
          │     ✓ checksums match
          │     ✓ files unlocked
          │     ✓ snapshot cleaned

What it catches

  • Weakened assertions (assert x > 0assert True)
  • Deleted test cases
  • Modified test config (conftest.py, jest.config.*)
  • Special-cased test inputs
  • Swapped test runner flags
  • Any byte-level change to snapshotted files

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors