Skip to content

macabeus/mizuchi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

65 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mizuchi

πŸ‰ Forge C from the ashes of assembly. What the compiler consumed, the dragon returns.

Mizuchi automates the cycle of writing C code, compiling, and comparing against a target binary, towards the goal of fully automatic matching decompilation.

It orchestrates a plugin-based pipeline that can leverage programmatic and AI-powered tools to automatically decompile assembly functions to C source code that produces byte-for-byte identical machine code when compiled.

  • ✨ Automatic retries with detailed context on compilation or match failures
  • 🐍 Integration with Claude, m2c, decomp-permuter, and objdiff.
  • πŸ—ΊοΈ Decomp Atlas, a powerful webapp to browse functions and generate rich prompts in one click
  • πŸ“Š Beautiful Report UI to visualize the pipeline result

πŸ“š Learn about this project and its benchmarks on this post

image
image
Achieve fully matching code automatically
image
Even partial matches provide a good start
image
Explore the function cloud by similarity
image
Pick your next function to decompile based on scoring
image
Build rich prompts to decompile a function in a single click

βš™οΈ What is Matching Decompilation?

Matching decompilation is the art of converting assembly back into C source code that, when compiled, produces byte-for-byte identical machine code. It’s popular in the retro gaming community for recreating the source code of classic games. For example, Super Mario 64 and The Legend of Zelda: Ocarina of Time have been fully match-decompiled.

Learn more by watching my talk.

Installation

npm install
npm run build && npm run build:ui

m2c Setup (Optional)

To enable the m2c programmatic phase:

git submodule update --init vendor/m2c
./scripts/setup-m2c.sh

decomp-permuter Setup (Optional)

To enable decomp-permuter (brute-force mutation matching). Works both in the programmatic phase and as background tasks during the AI-powered phase:

git submodule update --init vendor/decomp-permuter
./scripts/setup-decomp-permuter.sh

Requirements

  • ANTHROPIC_API_KEY environment variable set or login on Claude Code to cache credentials locally

Quick Start

  1. Create a configuration file: Copy the example config and customize it for your project.
cp mizuchi.example.yaml /path/to/you/decomp/project/mizuchi.yaml
  1. Index your codebase:
npm start -- index-codebase --config /path/to/your/decomp/project/mizuchi.yaml
  1. Start the Decomp Atlas server:
npm start -- atlas --config /path/to/your/decomp/project/mizuchi.yaml
  1. Generate prompts: Open Decomp Atlas at http://localhost:3000/, browse the functions and generate the prompts

  2. Run the pipelines:

npm start -- run --config /path/to/your/decomp/project/mizuchi.yaml

Pipeline Overview

Mizuchi executes a pipeline of plugins:

Pipeline Diagram

πŸ“Œ Roadmap: See the issues tab for planned features.

Output

Mizuchi generates three output files:

File Description
run-results-{timestamp}.json Complete execution data including plugin results, timing, and success/failure status
run-report-{timestamp}.html Visual report with success rates, metrics, and per-prompt breakdown
claude-cache.json Cached Claude API responses keyed by prompt content hash

Built-in Plugins

Plugin Description
m2c Optional: generates an initial C decompilation using m2c
decomp-permuter Optional: brute-forces code mutations using decomp-permuter to improve match scores
Claude Runner Sends prompts to Claude and processes responses
Compiler Compiles generated C code using a configurable shell script template
Objdiff Compares compiled object files against targets using objdiff
Integrator Optional post-match: integrates matched C code into the decomp project (docs)

Decomp Atlas

Decomp Atlas is a web UI for exploring your decompilation project and target the next functions to decompile. It includes a prompt builder that generates rich decompilation prompts.

Starting the server

# Build the CLI and UI
npm run build && npm run build:decomp-atlas

# Start the Decomp Atlas server
npm start -- atlas --config mizuchi.yaml

The server reads your mizuchi.yaml config and serves the Decomp Atlas UI at http://localhost:3000.

Note: Your project must have a mizuchi-db.json file in the root directory for the Decomp Atlas to work. Generate it with mizuchi index-codebase (see below).

Indexing Your Codebase

The index-codebase command scans your decompilation project and generates a mizuchi-db.json file containing all discovered functions, their assembly, C source (if decompiled), call graphs, and vector embeddings.

1. Configure your mizuchi.yaml:

Add nonMatchingAsmFolders to the global section listing directories that contain non-matching assembly files (relative to projectPath):

global:
  projectPath: /path/to/decomp/project
  mapFilePath: /path/to/project.map
  target: gba # or n64, ps1, etc.
  nonMatchingAsmFolders:
    - asm/non_matching
    - asm

2. Run the indexer:

# Build first (if not already done)
npm run build

# Index the codebase
npm start -- index-codebase --config mizuchi.yaml

# Or in development mode
npm run dev -- index-codebase --config mizuchi.yaml

The indexer performs three phases:

  1. Scan matched functions β€” finds C function definitions via ast-grep, resolves each to its compiled .o file using the map file, and extracts assembly via objdiff
  2. Scan unmatched functions β€” reads .s/.S/.asm files from nonMatchingAsmFolders and parses function boundaries
  3. Compute embeddings β€” generates vector embeddings using jina-embeddings-v2-base-code via a Python subprocess with MPS GPU acceleration (Apple Silicon) or CPU fallback

Options:

Flag Description
-c, --config Path to mizuchi.yaml (defaults to ./mizuchi.yaml)
-s, --skip-embeddings Skip embedding generation (useful for quick re-indexing)

Incremental indexing: Re-running the command only recomputes embeddings for new or changed functions. Unchanged functions preserve their existing embeddings.

Python requirements for embeddings: Python 3.10+ is required. On first run, the indexer automatically creates a virtual environment at ~/.cache/mizuchi/python-venv/ and installs torch and transformers (~2-3 GB). The model weights are cached at ~/.cache/huggingface/. Use --skip-embeddings to skip this entirely.

Development

See DEVELOPMENT.md for development setup, commands, and notes.

About

πŸ‰ Forge C from the ashes of assembly

Resources

License

Stars

Watchers

Forks

Contributors