GitHub - parcadei/ouros: A sandboxed Python runtime for AI agents, written in Rust.

A sandboxed Python runtime for AI agents, written in Rust.

Ouros is more than an interpreter and more than a REPL. It's a stateful, sandboxed Python runtime designed for AI agents that need to execute code across multiple turns, fork execution paths, rewind mistakes, and call out to external services — all without any access to the host system.

Run LLM-generated code safely with startup times under 1 microsecond. No containers, no VMs, no network access, no filesystem — just fast, isolated Python execution with full control over what the code can do.

Key Features

Sandboxed execution — No filesystem, network, subprocess, or environment access. The only way sandbox code communicates with the outside world is through external functions you explicitly provide.

Persistent REPL sessions — SessionManager keeps variables alive across multiple execute() calls. Fork sessions, rewind history, transfer variables between sessions, save/restore to disk.

Snapshot & resume — Execution pauses at external function calls and produces a serializable Snapshot. Store it in a database, send it over the network, resume in a different process.

Type checking included — Full Python type checking via ty (from Astral/Ruff), bundled in a single binary. Catch errors before execution.

72 stdlib modules — json, re, datetime, collections, dataclasses, math, itertools, decimal, pathlib, statistics, csv, hashlib, uuid, asyncio, and many more.

Resource limits — Cap memory, allocations, stack depth, and execution time. Kill runaway code before it causes problems.

Multi-language bindings — Call from Python, JavaScript/TypeScript, or Rust.

Usage

Python

uv add ouros

import ouros

# Basic execution
m = ouros.Sandbox('x + y', inputs=['x', 'y'])
result = m.run(inputs={'x': 10, 'y': 20})  # returns 30

External Functions

Sandbox code can call functions on the host — but only the ones you allow:

import ouros

code = """
data = fetch(url)
len(data)
"""

m = ouros.Sandbox(code, inputs=['url'], external_functions=['fetch'])

# start() pauses when fetch() is called
result = m.start(inputs={'url': 'https://example.com'})

print(type(result))  # <class 'ouros.Snapshot'>
#> <class 'ouros.Snapshot'>
print(result.function_name)  # 'fetch'
#> fetch
print(result.args)  # ('https://example.com',)
#> ('https://example.com',)

# Perform the real fetch, then resume
result = result.resume(return_value='hello world')
print(result.output)  # 11
#> 11

Async External Functions

import ouros

code = """
async def agent(prompt, messages):
    while True:
        output = await call_llm(prompt, messages)
        if isinstance(output, str):
            return output
        messages.extend(output)

await agent(prompt, [])
"""

m = ouros.Sandbox(
    code,
    inputs=['prompt'],
    external_functions=['call_llm'],
    type_check=True,
)


async def call_llm(prompt, messages):
    # Your LLM call here
    return f'Done after {len(messages)} messages'


output = await ouros.run_async(  # noqa: F704
    m,
    inputs={'prompt': 'testing'},
    external_functions={'call_llm': call_llm},
)

Persistent Sessions (REPL)

Variables survive across calls. Fork sessions. Rewind history. Transfer variables between sessions.

from ouros import SessionManager

mgr = SessionManager()
session = mgr.create_session('analysis', external_functions=['llm_query'])

# Execute code — variables persist
session.execute('data = [1, 2, 3, 4, 5]')
session.execute('total = sum(data)')

# Inspect state
session.get_variable('total')  # {'json_value': 15, 'repr': '15'}
session.list_variables()  # [{'name': 'data', ...}, {'name': 'total', ...}]

# Fork a session for exploration
branch = session.fork('experiment')
branch.execute('data.append(100)')
branch.get_variable('data')  # [1, 2, 3, 4, 5, 100]
session.get_variable('data')  # [1, 2, 3, 4, 5] — original unchanged

# Rewind mistakes
session.execute('data = "oops"')
session.rewind(steps=1)
session.get_variable('data')  # [1, 2, 3, 4, 5] — restored

# Save and restore
mgr.set_storage_dir('/tmp/ouros-sessions')
session.save(name='checkpoint-1')
# Later, or in another process:
mgr.load_session('checkpoint-1')

Serialization

Both Sandbox and Snapshot can be serialized to bytes, stored, and restored later:

import ouros

# Cache parsed code
m = ouros.Sandbox('x + 1', inputs=['x'])
data = m.dump()
m2 = ouros.Sandbox.load(data)
print(m2.run(inputs={'x': 41}))  # 42
#> 42

# Suspend execution mid-flight
m = ouros.Sandbox('fetch(url)', inputs=['url'], external_functions=['fetch'])
snapshot = m.start(inputs={'url': 'https://example.com'})
state = snapshot.dump()

# Resume later, even in a different process
snapshot2 = ouros.Snapshot.load(state)
result = snapshot2.resume(return_value='response data')

JavaScript / TypeScript

npm install ouros

import { Sandbox, Snapshot, runSandboxAsync } from 'ouros'

// Basic
const m = new Sandbox('x + y', { inputs: ['x', 'y'] })
const result = m.run({ inputs: { x: 10, y: 20 } }) // 30

// Iterative execution with external functions
const m2 = new Sandbox('a() + b()', { externalFunctions: ['a', 'b'] })
let progress = m2.start()
while (progress instanceof Snapshot) {
  progress = progress.resume({ returnValue: 10 })
}
console.log(progress.output) // 20

Rust

use ouros::{Runner, Object, NoLimitTracker, StdPrint};

let code = r#"
def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

fib(x)
"#;

let runner = Runner::new(code.to_owned(), "fib.py", vec!["x".to_owned()], vec![]).unwrap();
let result = runner.run(vec![Object::Int(10)], NoLimitTracker, &mut StdPrint).unwrap();
assert_eq!(result, Object::Int(55));

MCP Server

Ouros ships an MCP (Model Context Protocol) server that exposes the full SessionManager API as tools. Any coding agent or IDE that supports MCP can use Ouros as a sandboxed Python runtime.

cargo install ouros-mcp

The server speaks JSON-RPC over stdin/stdout with Content-Length framing (standard MCP transport).

Available Tools

Category	Tools
Execution	`execute`, `resume`, `resume_as_pending`, `resume_futures`
Variables	`list_variables`, `get_variable`, `set_variable`, `delete_variable`, `eval_variable`, `transfer_variable`, `call_session`
Sessions	`create_session`, `destroy_session`, `list_sessions`, `fork_session`, `reset`
Persistence	`save_session`, `load_session`, `list_saved_sessions`
History	`rewind`, `history`, `set_history_depth`
Heap introspection	`heap_stats`, `snapshot_heap`, `diff_heap`

All tools accept an optional session_id parameter. When omitted, the "default" session is used.

Adding to Claude Code

Add to your project's .claude/settings.json:

{
  "mcpServers": {
    "ouros": {
      "command": "ouros-mcp",
      "args": ["--storage-dir", "/tmp/ouros-sessions"]
    }
  }
}

Adding to Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "ouros": {
      "command": "ouros-mcp",
      "args": ["--storage-dir", "/tmp/ouros-sessions"]
    }
  }
}

Adding to any MCP client

{
  "command": "ouros-mcp",
  "args": ["--storage-dir", "/tmp/ouros-sessions"]
}

The --storage-dir flag is optional. It defaults to $OUROS_STORAGE_DIR if set, otherwise ~/.ouros/sessions/. When configured, save_session and load_session tools persist session state to disk.

Example: Agent workflow over MCP

Agent → execute(code: "data = [1, 2, 3]\nsum(data)")
     ← {status: "ok", result: 6, variables: [{name: "data", type: "list"}]}

Agent → execute(code: "fetch(url)", session_id: "default")
     ← {status: "external_call", call_id: 1, function: "fetch", args: ["https://..."]}

Agent → resume(call_id: 1, result: "response body")
     ← {status: "ok", result: "response body"}

Agent → fork_session(session_id: "default", new_session_id: "experiment")
     ← {status: "ok"}

Agent → execute(code: "data.append(100)", session_id: "experiment")
     ← {status: "ok"}

Agent → get_variable(name: "data", session_id: "default")
     ← {name: "data", value: [1, 2, 3]}  // original unchanged

Use Cases

AI Agent Code Execution

The primary use case. LLMs generate Python code, Ouros runs it safely:

Tool calling via code — Instead of JSON tool schemas, let the model write Python that calls your functions
Data processing — Let agents write analysis code that runs on your data without exposing your filesystem
Multi-step reasoning — Persistent sessions let agents build up state across multiple code generations

RLM (Recursive Language Model) Runtime

Ouros's SessionManager is a natural fit for the RLM pattern — a root LLM generates code that runs in a persistent REPL, calling llm_query() for sub-tasks. Variables persist in Ouros's heap (not in the LLM's context window), enabling processing of arbitrarily large inputs with a bounded model context.

See examples/rlm_orchestrator/ for a working implementation.

Sandboxed Computation

Any scenario where you need to run untrusted Python safely: user-submitted code, plugin systems, educational platforms, competitive programming judges.

Examples

See examples/ for complete, runnable examples:

Example	Language	Description
`basic_js/`	TypeScript	Basic execution, external functions, async, serialization
`basic_rust/`	Rust	Basic execution, fibonacci, external functions
`expense_analysis/`	Python	Async external functions for team expense analysis
`rlm_orchestrator/`	Python	Recursive Language Model pattern with persistent REPL sessions
`sql_playground/`	Python	Cross-format data joining with SQL, JSON, and sentiment analysis

Stdlib Modules

72 modules implemented natively in Rust:

abc, argparse, array, asyncio, atexit, base64, binascii, bisect, builtins, cmath, codecs, collections, collections.abc, concurrent, concurrent.futures, contextlib, copy, csv, dataclasses, datetime, decimal, difflib, enum, errno, fnmatch, fractions, functools, gc, hashlib, heapq, html, inspect, io, ipaddress, itertools, json, linecache, logging, math, numbers, operator, os, os.path, pathlib, pickle, pprint, queue, random, re, secrets, shelve, shlex, statistics, string, struct, sys, textwrap, threading, time, token, tokenize, tomllib, traceback, types, typing, typing_extensions, urllib, urllib.parse, uuid, warnings, weakref, zlib

What Ouros Cannot Do

Full CPython compatibility — Ouros implements a substantial subset of Python, not all of it
Third-party libraries — No pip, no numpy, no pandas. This is by design
Direct I/O — No filesystem, network, or subprocess access from sandbox code. All external communication goes through external functions you control

Performance

Benchmarks comparing Ouros vs CPython 3.14 (make bench):

Benchmark	Ouros	CPython	Ratio
End-to-end (parse + run)	1.2 µs	8.2 µs	6.9x faster
Loop + modulo (1k iter)	38 µs	27 µs	1.4x slower
Kitchen sink (mixed ops)	5.0 µs	1.5 µs	3.4x slower
List append (100k ints)	3.9 ms	2.7 ms	1.4x slower
List append (100k strings)	13.3 ms	6.0 ms	2.2x slower
Fibonacci (recursive, n=25)	20.5 ms	9.7 ms	2.1x slower

End-to-end startup is where Ouros shines — no interpreter boot, no module imports, just parse and go. Runtime is typically 1.4–3.4x slower than CPython, which is fast enough for agent-generated code where the bottleneck is the LLM call, not the computation.

Concepts

Term	What it is
Sandbox	A compiled Python program. Parse once, run many times with different inputs. No access to the host.
External function	A host function that sandbox code can call by name. Execution pauses until the host provides a return value. This is how sandbox code talks to the outside world.
Snapshot	A frozen execution state, captured when an external function is called. Serializable to bytes — store in a database, send over the wire, resume in another process.
SessionManager	A persistent multi-session runtime. Variables survive across `execute()` calls. Think Jupyter kernel, but sandboxed.
Session	A single named environment within a SessionManager. Has its own variables, history, and heap.
Fork	Copy a session into an independent branch. The original is unchanged. Use for speculative execution or tree-of-thought.
Rewind	Undo the last N `execute()` calls in a session. Variables revert to their previous state.
Heap introspection	`heap_stats()`, `snapshot_heap()`, `diff_heap()` — inspect and compare memory state across executions.
Resource limits	Cap allocations, memory, stack depth, and wall-clock time. Execution terminates with `ResourceError` if exceeded.
Type checking	Static analysis via ty before execution. Optional, zero runtime cost.

Acknowledgments

Ouros is forked from Monty by Pydantic, originally created by Samuel Colvin.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
examples		examples
playground		playground
scripts		scripts
tools		tools
.gitignore		.gitignore
.python-version		.python-version
.rustfmt.toml		.rustfmt.toml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
logo.jpeg		logo.jpeg
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A sandboxed Python runtime for AI agents, written in Rust.

Key Features

Usage

Python

External Functions

Async External Functions

Persistent Sessions (REPL)

Serialization

JavaScript / TypeScript

Rust

MCP Server

Available Tools

Adding to Claude Code

Adding to Cursor

Adding to any MCP client

Example: Agent workflow over MCP

Use Cases

AI Agent Code Execution

RLM (Recursive Language Model) Runtime

Sandboxed Computation

Examples

Stdlib Modules

What Ouros Cannot Do

Performance

Concepts

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A sandboxed Python runtime for AI agents, written in Rust.

Key Features

Usage

Python

External Functions

Async External Functions

Persistent Sessions (REPL)

Serialization

JavaScript / TypeScript

Rust

MCP Server

Available Tools

Adding to Claude Code

Adding to Cursor

Adding to any MCP client

Example: Agent workflow over MCP

Use Cases

AI Agent Code Execution

RLM (Recursive Language Model) Runtime

Sandboxed Computation

Examples

Stdlib Modules

What Ouros Cannot Do

Performance

Concepts

Acknowledgments

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages