Build a formally verified compiler from full JavaScript (ECMAScript 2020) to WebAssembly, written entirely in Lean 4. The compiler's correctness—semantic preservation from source to target—is proved mechanically in Lean's type theory. The compiled output runs on any standard Wasm runtime. End-to-end behavior is validated against Node.js using Test262.
The ECMAScript 2020 Language Specification is the normative reference for all semantic decisions:
Every agent implementing or formalizing JS semantics must cite the specific section of this spec justifying the behavior. Lean docstrings on semantic rules should include the spec section number:
/-- ECMA-262 §13.15.2 Runtime Semantics: Evaluation — AssignmentExpression -/
inductive AssignmentEval : ... → Prop whereWhen the spec is ambiguous, document the ambiguity in ARCHITECTURE.md and pick the behavior that matches Node.js (v20 LTS).
The compiler's "Linux kernel moment" is compiling the TypeScript compiler (tsc) to WebAssembly and using it to typecheck real TypeScript projects. tsc is ~130,000 lines of JavaScript, exercises every corner of the language, and is the single most widely-used JS tool in the world.
Secondary targets (in rough order of difficulty):
- Prettier (~50k LOC) — format JS/TS files, diff output against native Prettier
- Babel (~70k LOC) — transpile modern JS, compare output against native Babel
- Three.js examples — compile a Three.js demo, render in a browser via Wasm
- TypeScript compiler (~130k LOC) — the flagship
The flagship repos are tracked as git submodules under tests/flagship/.
# 1. Build the compiler
lake build
# 2. Write a simple JS program
echo 'var x = 1 + 2; var y = x * 3;' > hello.js
# 2b. Printing example
echo 'var x = 41; console.log(x + 1);' > print.js
# 3. Inspect intermediate representations
lake exe verifiedjs hello.js --emit=core # Core IL (desugared JS)
lake exe verifiedjs hello.js --emit=flat # Flat IL (closure-converted)
lake exe verifiedjs hello.js --emit=anf # ANF IL (A-normal form)
lake exe verifiedjs hello.js --emit=wasmIR # Wasm IR (structured control flow)
lake exe verifiedjs hello.js --emit=wat # WebAssembly Text Format
lake exe verifiedjs print.js --emit=core # inspect console.log lowering
# 4. Compile to .wasm binary
lake exe verifiedjs hello.js -o hello.wasm
lake exe verifiedjs print.js -o print.wasm
# 4b. Compile an ES module entry with linked relative imports/exports
lake exe verifiedjs src/main.js --module -o app.wasm
# 5. Run with wasmtime (simple arithmetic programs work)
wasmtime hello.wasm
wasmtime print.wasm
# 6. Run the e2e test suite
bash tests/e2e/run_e2e.sh- Runtime semantics are still stubbed: Runtime helper functions are now emitted as valid Wasm and execute under
wasmtime, but many helpers (__rt_call, property/global/object ops) still return placeholders instead of full ECMA-262 behavior - Globals: Unbound identifiers are lowered via a runtime global-lookup stub (
__rt_getGlobal) that currently returnsundefined; semantic global object behavior is not implemented yet - Float precision: Numbers are currently lowered as i32 pointers; proper NaN-boxing/tagged pointers needed
- Module support is linker-based (partial):
--moduleresolves relative imports/exports across files (including named/default/namespace imports and re-exports), but host module semantics and full live-binding behavior are still incomplete - Language features: Classes, for-in/of, destructuring, optional chaining are stubbed in elaboration
The compiler is a pipeline of formally verified passes. Each pass transforms between two intermediate languages and is equipped with a Lean proof of semantic preservation (a forward simulation). Composition yields end-to-end correctness.
JavaScript source
│
├─ [1] Lexing + Parsing (outside TCB; validated by test suite)
▼
JS.AST ← full ECMAScript 2020 abstract syntax
│
├─ [2] Elaboration + desugaring
▼
JS.Core ← normalized core: destructuring/for-in/classes → primitives
│
├─ [3] Closure conversion + environment representation
▼
JS.Flat ← first-order; closures → structs + function indices
│
├─ [4] ANF conversion (side effects sequenced, all subexpressions named)
▼
JS.ANF ← A-normal form
│
├─ [5] [STUB] Optimization passes (no-op identity for now)
▼
JS.ANF ← unchanged
│
├─ [6] Lowering to Wasm IR
▼
Wasm.IR ← structured control flow, Wasm types, linear memory ops
│
├─ [7] Wasm IR → Wasm AST
▼
Wasm.AST ← abstract Wasm module (mirrors WasmCert-Coq)
│
├─ [8] Binary encoding (outside TCB; validated by wasm-tools + Valex-style checker)
▼
.wasm binary
Each intermediate language has four components that must exist before the pass is considered complete:
Syntax.lean— inductive type definitionsSemantics.lean— small-step LTS as an inductive relationStep : State → Trace → State → PropInterp.lean— executable reference interpreter (def interp : Program → IO (List Trace))Print.lean— pretty-printer producing human-readable output
The CLI exposes each IL:
lake exe verifiedjs input.js --emit=core # print JS.Core
lake exe verifiedjs input.js --emit=flat # print JS.Flat
lake exe verifiedjs input.js --emit=anf # print JS.ANF
lake exe verifiedjs input.js --emit=wasmIR # print Wasm.IR
lake exe verifiedjs input.js --emit=wat # print Wasm text format
lake exe verifiedjs input.js --run=core # interpret at JS.Core level
lake exe verifiedjs input.js --run=flat # interpret at JS.Flat level
lake exe verifiedjs input.js --run=anf # interpret at JS.ANF level
lake exe verifiedjs input.js --run=wasmIR # interpret at Wasm.IR levelWhen an end-to-end test fails, bisect by running interpreters at each level to isolate which pass introduced the bug.
For each pass P : IL_in → IL_out, prove:
theorem P_correct (s : IL_in.Program) (t : IL_out.Program) (h : P s = some t) :
∀ b, IL_out.Behaves t b → ∃ b', IL_in.Behaves s b' ∧ BehaviorRefines b b'JavaScript has no undefined behavior (unlike C), so BehaviorRefines is exact equality on defined programs. Compose all pass theorems in EndToEnd.lean.
-- VerifiedJS/ANF/Optimize.lean
/-- Identity pass. Future optimizations go here, each with a simulation proof. -/
def optimize (p : ANF.Program) : ANF.Program := p
theorem optimize_correct (p : ANF.Program) :
∀ b, ANF.Behaves (optimize p) b ↔ ANF.Behaves p b := by
intro b; constructor <;> (intro h; exact h)Port the relevant subset of WasmCert-Coq's Rocq formalization to Lean 4:
Wasm.Syntax— module structure, instructions, types (from WasmCert-Coqtheories/datatypes.v)Wasm.Validation— type checking rulesWasm.Execution— small-step reduction, store, stack, framesWasm.Numerics— i32/i64/f32/f64 operations
Reference repos:
- WasmCert-Coq (
github.com/WasmCert/WasmCert-Coq): canonical Rocq formalization of Wasm 2.0 - lean-wasm (
github.com/T-Brick/lean-wasm): existing Lean 4 Wasm formalization
Target Wasm 1.0 MVP + bulk memory + mutable globals.
JS lexing is context-sensitive (/ is division or regex depending on prior token). Hand-written lexer producing a Token stream. Recursive descent parser — JS grammar is not LR(1).
The parser is outside the verified TCB. Validate by:
- Test262 parse-only tests
- Round-trip:
parse ∘ print ≈ idon the AST - Differential: parse with VerifiedJS, parse with Acorn/Babel, compare ASTs
- Flagship parser gates:
./scripts/parse_flagship_failfast.sh --full
- Value representation: NaN-boxing in f64, or tagged pointers in i64
- Heap: bump allocator + mark-sweep GC compiled into the Wasm module
- Strings: interned, UTF-16, stored in linear memory
- Objects: property maps as linked hash tables in linear memory
- Closures:
(function_index, environment_pointer)pairs - Prototypes: chain traversal in linear memory
- Generators/async: CPS transform in JS.Flat, state machines in Wasm.IR
The runtime is linked into every output module. Its semantics are axiomatic—document the TCB boundary in ARCHITECTURE.md.
The project uses the Lean LSP via the lean-lsp-mcp MCP server (github.com/oOo0oOo/lean-lsp-mcp). Agents use LSP feedback as a primary signal, not just lake build exit codes.
- Before editing: query
lean_diagnostic_messageson the target file. - After editing: save, wait for re-elaboration, query diagnostics again. Fix new errors before committing.
- For proofs: use
lean_goalto inspect proof state at a specific line/column. - For exploration: use
lean_hover_infofor types,lean_completionsfor lemmas. - For search: use
lean_leansearch(LeanSearch),lean_loogle(Loogle),lean_local_search(ripgrep).
Verified code development is a loop:
define interface/types → write implementation → attempt proof →
├─ proof succeeds → done
└─ proof fails →
├─ adjust implementation to make it more provable → retry
├─ refine the theorem statement → retry
└─ restructure the IL definition → retry (coordinate via ARCHITECTURE.md)
Based on the ECMAScript specification, write a parser for javascript. Iterate on the parser and AST until we can parse ALL benchmarks, and pretty-print them to equivalent ASTs.
Define all IL Syntax.lean types, all pass function signatures, all correctness theorem statements (with sorry bodies), and all Semantics.lean relation signatures. These are the contracts that enable parallel work.
Agents work on different modules in parallel. The interfaces stay roughly consistent. Each module (Source, Core, Flat, ANF, Wasm) can have independent agents working on:
- Implementation (pass, interpreter, pretty-printer)
- Proofs (correctness theorems for the pass)
- Tests (unit tests, e2e tests)
Code and proofs are dependent but can alternate: one agent implements a pass, another works on the proof for a previously-implemented pass.
Implementation and proofs co-evolve. When stuck:
- Try automation first (see Proof Automation below)
- Break the goal into lemmas, sorry each, try automation on each
- If stuck >30 minutes on a single goal, file in
PROOF_BLOCKERS.mdwith the goal state - If a property has been attempted 3+ times and failed, flag it in
PROOF_BLOCKERS.mdwithESCALATE:prefix.
sorry is a coordination mechanism. Unchecked proliferation defeats verification.
- Every sorry must have a
-- TODO:comment explaining what remains. - Sorrys in
Proofs/are expected during development. Sorrys inSyntax.leanorSemantics.leanare bugs. - Sorry count tracked in CI. Threshold: 100 → 50 → 20 → 0.
- Sorrys must not appear in
EndToEnd.leanafter Phase 2.
Run ./scripts/sorry_report.sh to generate the current sorry report.
Maximize automation. Manual proof terms are a last resort.
decide— decidable propositionssimp [lemma₁, lemma₂]— rewritingomega— linear arithmetic overNat,Intgrind— congruence closure + case splittingcanonical— type inhabitation solver (github.com/chasenorman/Canonical)aesop— automated reasoning with custom rule setsnative_decide— kernel evaluation for large finite checks
Agents work on different modules simultaneously. The pipeline's modular structure (Source, Core, Flat, ANF, Wasm) provides natural boundaries for parallel work. Each module has well-defined interfaces (Syntax.lean) that remain stable.
Parallelization strategy:
- Different modules: Two agents can safely work on
Core/Elaborate.leanandWasm/Lower.leansimultaneously — no conflicts. - Code vs proof on same module: One agent implements
Flat.ClosureConvert, another provesProofs/ClosureConvertCorrect.lean. The proof agent works against the current interface; if implementation changes, the proof adapts. - Parser is independent: Parser work never conflicts with backend passes.
All work is driven by coordination files. Agents read these on startup and pick greedily.
TASKS.md — the master task list. Maintained by all agents. Priority-ordered.
PROGRESS.md — per-pass status. Updated after completing work.
PROOF_BLOCKERS.md — goals agents are stuck on. Check this before starting proof work to avoid duplicating failed attempts.
Every agent, on every spawn:
- Read
TASKS.md. Pick the highest-priority unchecked task. - Read
PROGRESS.mdto understand what's done and what's in flight. - If the task is proof work, read
PROOF_BLOCKERS.mdto avoid repeating failed attempts. - Do the task. Run
lake buildto verify. Runbash tests/e2e/run_e2e.shfor e2e checks. - Update
TASKS.md(mark done) andPROGRESS.md(update status). - If the task revealed additional required work, add new unchecked items to
TASKS.md.
| Type | Description |
|---|---|
| implement | Write a pass, interpreter, or pretty-printer. |
| test | Write unit tests, e2e tests, add Test262 coverage. |
| prove | Work on a simulation proof. Follow automation-first. |
| parser | Improve parser coverage. Run flagship gates to verify. |
| review | Check for regressions, duplicated code, architectural drift. |
| flagship | Work on compiling a flagship target (Prettier/Babel/tsc). |
Mitigation: Strict regression gate. lake build + bash tests/e2e/run_e2e.sh before pushing. If any previously-passing test fails, fix it first.
Mitigation: All test runners print one summary line per suite to stdout. Full output goes to test_logs/. Pre-compute aggregate statistics. Never make the agent count things manually.
Mitigation: Agents work on different modules. The pipeline's modular structure provides natural separation. Use TASKS.md to claim tasks before starting.
Mitigation: PROGRESS.md, TASKS.md, and PROOF_BLOCKERS.md are the agent's context. Read these first. Keep them short and current.
Mitigation: Shared helpers go in Util.lean. When reviewing, grep for duplicate helper functions and coalesce.
Mitigation: PROOF_BLOCKERS.md records failed proof attempts. Read it before starting proof work.
#eval do
let src := "let x = 1 + 2; console.log(x);"
let ast ← JS.parse src
let core ← JS.elaborate ast
let result ← JS.Core.interp core
assert! result.stdout == ["3"]Located in tests/e2e/. The test harness (tests/e2e/run_e2e.sh) runs:
- Pipeline stage tests: parse → core → flat → anf → wasmIR → wat → .wasm for each test file
- Metamorphic tests: Core vs Flat vs ANF interpreter traces must match (semantic preservation)
- Wasm validation: compiled .wasm runs on wasmtime
- Node.js comparison: test files are valid JS that runs identically in Node.js
Use scripts/run_test262_compare.sh to compare Node.js syntax acceptance against VerifiedJS compilation on the embedded Test262 suite:
# Fast deterministic sample
./scripts/run_test262_compare.sh --fast --sample 60 --seed local
# Larger sample (used by full pipeline test runs)
./scripts/run_test262_compare.sh --full --sample 500 --seed localThe runner emits machine-parseable lines:
TEST262_PASS: Node--checkaccepts and VerifiedJS compilesTEST262_FAIL: unexpected VerifiedJS compile failureTEST262_XFAIL: compile failure classified as known limitationTEST262_SKIP: metadata-based or limitation-based skip
Current skip/xfail filters intentionally avoid unsupported Test262 categories and known frontend/runtime gaps (negative tests, harness includes/flags, module/raw/async harness requirements, fixture files, and tests that rely on unsupported globals or stubbed features).
The harness injects a small prelude that defines Test262Error, assert, assert.sameValue, and assert.notSameValue when missing, so plain Test262 assertion-style tests can run without external harness includes.
Known backend structural bugs that currently produce Wasm validation failures are classified as TEST262_XFAIL known-backend:wasm-validation (instead of hard FAIL) to keep regressions focused on newly introduced issues.
wasmtimeon every.wasmoutputwasm-tools validatefor spec conformance- Node.js as differential oracle
lake build # full build
lake exe verifiedjs input.js -o output.wasm # compile
lake exe verifiedjs input.js --parse-only # parser-only check
wasmtime output.wasm # run
lake exe verifiedjs input.js --emit=core # inspect IL
lake exe verifiedjs input.js --run=anf # interpret at ANF
lake test # Lean unit tests
bash tests/run_tests.sh --fast --profile pipeline # unit+e2e+test262+wasm summary
bash tests/e2e/run_e2e.sh # e2e tests
./scripts/run_test262_compare.sh --fast --sample 60 --seed local
./scripts/sorry_report.sh # sorry report
./scripts/parse_flagship_failfast.sh --full # parser completion gateverifiedjs/
├── README.md ← this file
├── TASKS.md ← master task list
├── PROGRESS.md ← per-pass status
├── PROOF_BLOCKERS.md ← stuck goals with failed approaches
├── lakefile.lean
├── lean-toolchain
│
├── VerifiedJS/
│ ├── Source/
│ │ ├── Lexer.lean
│ │ ├── Parser.lean
│ │ ├── AST.lean ← SPEC: §11–15
│ │ └── Print.lean
│ │
│ ├── Core/
│ │ ├── Syntax.lean ← SPEC: desugared subset
│ │ ├── Semantics.lean ← SPEC: §8, §9
│ │ ├── Interp.lean
│ │ ├── Print.lean
│ │ └── Elaborate.lean
│ │
│ ├── Flat/
│ │ ├── Syntax.lean
│ │ ├── Semantics.lean
│ │ ├── Interp.lean
│ │ ├── Print.lean
│ │ └── ClosureConvert.lean
│ │
│ ├── ANF/
│ │ ├── Syntax.lean
│ │ ├── Semantics.lean
│ │ ├── Interp.lean
│ │ ├── Print.lean
│ │ ├── Convert.lean
│ │ └── Optimize.lean ← identity pass (with proof)
│ │
│ ├── Wasm/
│ │ ├── Syntax.lean
│ │ ├── Semantics.lean
│ │ ├── IR.lean
│ │ ├── IRInterp.lean
│ │ ├── IRPrint.lean
│ │ ├── Lower.lean
│ │ ├── Emit.lean
│ │ ├── Print.lean ← WAT printer
│ │ └── Binary.lean
│ │
│ ├── Proofs/
│ │ ├── ElaborateCorrect.lean
│ │ ├── ClosureConvertCorrect.lean
│ │ ├── ANFConvertCorrect.lean
│ │ ├── OptimizeCorrect.lean
│ │ ├── LowerCorrect.lean
│ │ ├── EmitCorrect.lean
│ │ └── EndToEnd.lean ← no sorrys allowed after Phase 2
│ │
│ ├── Util.lean ← shared helpers
│ └── Driver.lean
│
├── tests/
│ ├── unit/
│ ├── e2e/ ← handcrafted JS tests + run_e2e.sh
│ ├── test262/ ← git submodule
│ └── flagship/ ← git submodules: prettier, babel, TypeScript
│
└── scripts/
├── run_test262_compare.sh
├── sorry_report.sh
├── parse_flagship_failfast.sh
└── validate_wasm.sh
The compiler targets all of ECMAScript 2020 (SPEC: https://tc39.es/ecma262/2020/). No feature is excluded. Features are implemented incrementally but none are out of scope.
Formalize incrementally: 0. Implement parser/lexer end to end, make sure it runs on everything first.
- Computational core: primitives, let/const, functions, closures, objects, arrays, control flow (SPEC §6–8, §12–13)
- Classes, prototypes,
this(SPEC §9, §14.6) - Generators, async/await (SPEC §14.4, §14.7–14.8, §25.6)
- Proxy, Symbol, WeakMap (SPEC §19.4, §23.3, §26)
- Modules, eval, regex (SPEC §15, §18.2.1, §21.2)
Each increment extends Syntax.lean and Semantics.lean monotonically. Design the semantics so existing proofs do not break when new features are added.
- Read
TASKS.mdfirst. Pick the highest-priority unclaimed task. - Cite the spec. Every semantic rule in Lean must reference the ECMA-262 §section it implements.
- Every IL must be inspectable. Syntax + Semantics + Interpreter + Printer. No exceptions.
- Automation first in proofs. Try
decide/simp/omega/grind/canonical/aesopbefore manual proof. - Use the LSP. Query diagnostics, goal states, hover, LeanSearch/Loogle.
- Every change must build and pass e2e tests. No regressions.
- No context pollution. Print one-line summaries. Log details to files.
- Update coordination files (
TASKS.md,PROGRESS.md) after completing work. - Small commits. One logical change per commit.
- When stuck on a proof >30 minutes, file in
PROOF_BLOCKERS.mdwith the goal state and failed approaches. - When an e2e test fails, bisect using
--run=<IL>interpreters. - Design for provability. If an implementation is correct but hard to prove, refactor it.
- Watch for code duplication. Coalesce into
Util.lean.