An implementation of Recursive Language Models (RLM) — an interactive coding CLI where the LLM recursively writes and executes Python code to solve problems. Think Claude Code / Cursor — the LLM writes Python code in ```repl``` blocks to call filesystem tools, inspect results, iterate, and converge on an answer.
RLM Code provides 6 filesystem tools (bash, read_file, write_file, edit_file, glob_files, grep_files) and a recursive llm_query() sub-call in a persistent Python execution environment. The LLM doesn't call tools via a structured function-call API — instead it writes Python code that invokes the tool functions directly, reads print() output, and decides what to do next. This loop continues until the LLM emits FINAL(...) with a summary.
User prompt
↓
System prompt (role + tools + examples + guidelines + termination protocol)
↓
┌─ Iteration loop (up to 30 iterations) ────────────────────┐
│ 1. Stage hint injected (not persisted in history) │
│ ↓ │
│ 2. LLM generates text + ```repl``` code blocks │
│ ↓ │
│ 3. CodeExecutor runs code (AST-parsed, REPL-style) │
│ ↓ │
│ 4. stdout/stderr fed back as user message │
│ ↓ │
│ 5. Check for FINAL(...) → done, else continue loop │
└────────────────────────────────────────────────────────────┘
↓
Final answer returned to user
| Decision | Why |
|---|---|
| Python code blocks instead of JSON function-calling | LLM can compose tool calls, use conditionals, string processing, loops |
| Execution results fed back as user messages | Natural conversation flow — LLM writes code, "user" reports results |
| Persistent namespace across iterations | Variables survive across rounds, enabling incremental computation |
| Stage hints not persisted in history | Guide LLM behavior without polluting context |
| Streaming preview + final Markdown rendering | Stream with plain text (fast), final render with Markdown (polished) |
Recursive sub-model queries via llm_query() |
Delegate lightweight tasks (summarisation, analysis), keep main context clean |
| Tool-call markup fallback parsing | Gracefully handles fine-tuned models that emit raw function-call tokens |
# From the project directory
pip install -e .Requires Python >= 3.11.
Set API keys, model, and base URL via environment variables or a .env file (auto-loaded via python-dotenv).
See .env.example for a template.
rlm-codeLaunches a Textual TUI with a Tokyo Night color scheme. Type requests in the input bar, watch the LLM reason and call tools in real-time with streaming output.
┌─ RLM Code ─────────────────── gpt-5 ─── ~/Projects ─┐
│ │
│ > fix the off-by-one in parser.py │
│ │
│ ◆ Iteration 1 │
│ Let me read the file first. │
│ ┌─ repl ─────────────────────────┐ │
│ │ content = read_file("parser.py")│ │
│ │ print(content) │ │
│ └────────────────────────────────┘ │
│ stdout: [file contents...] │
│ │
│ ◆ Iteration 2 │
│ Found the bug on line 42. Fixing... │
│ ┌─ repl ─────────────────────────┐ │
│ │ edit_file("parser.py", ...) │ │
│ └────────────────────────────────┘ │
│ stdout: Replaced 1 lines with 1 lines in parser.py │
│ │
│ ✓ Fixed the off-by-one error in parser.py line 42. │
│ │
│ > _ │
└────────────────────────────────────────────────────────┘
Keyboard shortcuts:
- Enter: Submit request
- Up/Down: Input history
- Ctrl+C: Interrupt current operation
- Ctrl+D or type
quit/exit: Exit
Pass a prompt as a positional argument to skip the TUI and print results to stdout:
rlm-code "list all Python files in this project"
rlm-code "explain what src/utils.py does"
rlm-code "add type hints to the parse() function in parser.py"rlm-code [prompt] [options]
positional arguments:
prompt One-shot prompt (omit for interactive TUI mode)
options:
--backend {openai,anthropic}
LLM backend (default: openai, env: RLM_BACKEND)
--model MODEL Model name (default: gpt-5 / claude-sonnet-4-6, env: RLM_MODEL)
--sub-model MODEL Sub-model for llm_query() (default: same as --model, env: RLM_SUB_MODEL)
--base-url URL API base URL (env: RLM_BASE_URL). For OpenAI-compatible providers.
--api-key API_KEY API key (default: from env vars)
--max-iterations N Max iterations per request (default: 30)
--cwd PATH Working directory for tools (default: current dir)
Priority: CLI args > environment variables > defaults.
# Use Anthropic backend
rlm-code --backend anthropic --model claude-sonnet-4-20250514
# Use an OpenAI-compatible provider (e.g. local vLLM, OpenRouter, etc.)
rlm-code --base-url http://localhost:8000/v1 --model my-local-model
# Or configure entirely via .env
# RLM_BASE_URL=https://openrouter.ai/api/v1
# RLM_MODEL=deepseek/deepseek-chat
# OPENAI_API_KEY=sk-or-...
rlm-code
# Work on a specific directory
rlm-code --cwd ~/Projects/myapp
# Quick one-shot with a custom model
rlm-code --model gpt-4o-mini "what does this project do?"The LLM has access to 6 filesystem tools (in rlm_code/tools.py) plus a recursive sub-call:
| Tool | Signature | Description |
|---|---|---|
bash |
(command, timeout=120) |
Execute a shell command. Returns stdout + stderr + exit code. |
read_file |
(path, offset=0, limit=2000) |
Read a file with line numbers. Supports pagination via offset/limit. |
write_file |
(path, content) |
Write content to a file. Creates parent directories automatically. |
edit_file |
(path, old_text, new_text) |
Replace first exact occurrence of old_text with new_text. |
glob_files |
(pattern, path=None) |
Find files matching a glob pattern. Skips .git, node_modules, __pycache__, etc. Max 200 results. |
grep_files |
(pattern, path=None, glob=None, context=0) |
Search file contents with regex (grep -rnE). Max 500 results. 30s timeout. |
llm_query |
(prompt) |
Call a sub-model to answer a question. Useful for summarisation, analysis, or decomposing complex tasks. Defaults to the same model as --model unless --sub-model is set. |
All paths are resolved relative to the working directory (--cwd). The tools run in the user's filesystem — there is no sandbox. The LLM can read, write, and execute anything you can.
rlm_code/
├── __init__.py # exports main()
├── __main__.py # python -m rlm_code entry point
├── config.py # CLIConfig dataclass + argparse
├── engine.py # CodingEngine: iteration loop, CodeExecutor, parsing
├── llm.py # LLMClient: OpenAI + Anthropic backends (sync + streaming)
├── prompt.py # System prompt template + build_system_prompt()
├── tools.py # 6 tool functions + build_coding_tools()
└── tui.py # Textual TUI app + one-shot mode + main()
tests/
├── test_engine.py # 44 tests: engine loop, executor, parsing, streaming, hints
└── test_tools.py # 21 tests: all tools (no LLM, no network)
- CodingEngine: The core iteration loop. Calls the LLM with stage-aware hints (iteration 1: "understand the problem first"; subsequent: "continue working"), parses
```repl```code blocks, executes them viaCodeExecutor, feeds output back as user messages, and loops untilFINAL(...)or max iterations. When max iterations is exceeded, a forced wrap-up prompt is sent. - CodeExecutor: Persistent Python namespace. Code is AST-parsed: imports separated and executed first; if the last statement is a bare expression its value is auto-printed (REPL/Jupyter-style). Execution protected by
threading.Lock. Dangerous builtins (eval,exec,input,compile,globals,locals,exit,quit) are blocked. Output truncated at 20,000 characters. - Parsing:
parse_code_blocks()extracts```repl```fences.parse_tool_calls()is a fallback that converts raw function-calling markup from fine-tuned models into executable Python.parse_final_answer()detectsFINAL(...)andFINAL_VAR(...)completion signals.
- Unified
LLMClientwithcompletion()(sync) andstream_completion()(streaming withon_tokencallback). - OpenAI backend: Supports
base_urlfor compatible providers (vLLM, OpenRouter, etc.). SDKs are lazily imported on first call. - Anthropic backend: Automatically extracts the system message from the message list and passes it separately (as required by the Anthropic API).
- Built with Textual (full-terminal UI framework).
- Tokyo Night color scheme:
#1a1b26background,#7aa2f7user text,#bb9af7iteration headers,#9ece6afinal answers,#f7768eerrors,#565f89status. - Streaming: LLM tokens stream to a plain-text
MessageWidgetin real-time viacall_from_thread(). When each iteration completes, the streaming preview is removed and replaced with a properly renderedMarkdownwidget + syntax-highlighted code blocks. - Engine
run()executes in a background thread (@work(thread=True)).
pytest tests/ -v65 tests total. All use tmp_path fixtures — no LLM calls, no network, no side effects on your filesystem. Engine tests use MockLLM / MockStreamingLLM / RecordingLLM classes.
MIT