sandbox-bash-mcp

MCP server for sandboxed bash execution inside an isolated Docker container. Zero host access. Seven layers of security. Full audit trail. Used by nl2shell-web to give the NL2Shell agent a safe environment to run generated shell commands.

Part of the nl2shell organization — the AI-powered natural language to shell command system.

Keywords: MCP server bash sandbox Docker isolated container AI agent tool-calling secure shell execution model context protocol

Organization Context

Repo	Role
nl2shell	Core ML — NL-to-shell model training and inference
nl2shell-web	Website — calls this sandbox via a relay server
vox	Voice CLI — terminal voice interface
sandbox-bash-mcp (this repo)	Docker sandbox — MCP server for safe bash execution
collab	Desktop app — collaborative shell session UI

Architecture

 ┌─────────────────────────────────────────────────────────┐
 │  AI Agent Hosts                                          │
 │  LM Studio / Claude Code / Codex / nl2shell-web relay   │
 └───────────────────┬─────────────────────────────────────┘
                     │  MCP protocol over stdio (JSON-RPC 2.0)
                     ▼
 ┌─────────────────────────────────────────────────────────┐
 │  sandbox-bash-mcp  (server.mjs)                          │
 │  ─────────────────────────────────────────────────────  │
 │  MCP tools: run_bash, write_file, read_file,             │
 │             list_files, audit_log                        │
 │                                                          │
 │  Security middleware:                                    │
 │    └── Command blocklist (15 patterns)                   │
 │    └── Path confinement (/agent/ only)                   │
 │    └── Output cap (1 MB stdout, 4 KB audit)              │
 │    └── JSONL audit trail (/agent/audit/toolcalls.jsonl)  │
 └───────────────────┬─────────────────────────────────────┘
                     │  execSync inside container
                     ▼
 ┌─────────────────────────────────────────────────────────┐
 │  Docker container (Ubuntu/node:22-slim)                  │
 │  ─────────────────────────────────────────────────────  │
 │  Layer 1: --network none          (no outbound traffic)  │
 │  Layer 2: --cap-drop ALL          (no Linux caps)        │
 │  Layer 3: --security-opt no-new-privileges               │
 │  Layer 4: --memory 512m --cpus 1 --pids-limit 100        │
 │  Layer 5: USER agent              (non-root, uid!=0)     │
 │  Layer 6: command blocklist       (in server.mjs)        │
 │  Layer 7: JSONL audit trail       (/agent/audit/)        │
 │                                                          │
 │  Available tools: bash, python3, node v22, jq,           │
 │                   git, tree, bc, file, procps            │
 │  Workspace: /agent/workspace  (persistent in session)    │
 └─────────────────────────────────────────────────────────┘

MCP Tools

All tools speak JSON-RPC 2.0 over stdio. Inputs and outputs are MCP content arrays with type: "text".

`run_bash`

Execute a bash command inside the sandboxed container.

Input schema

Field	Type	Required	Default	Description
`command`	string	yes	—	Bash command to execute
`timeout_ms`	number	no	30000	Timeout in ms (capped at 120000)
`working_dir`	string	no	`/agent/workspace`	CWD; must be under `/agent/`

Output (success)

$ echo hello
hello
[exit: 0 | 12ms | audit: 1718000000000]

Output (blocked)

BLOCKED: This command is not allowed in the sandbox.
Reason: matches security blocklist
Audit ID: 1718000000000

Output (error)

$ cat /etc/shadow
cat: /etc/shadow: Permission denied
[exit: 1 | 5ms | audit: 1718000000001]

`write_file`

Write content to a file in the sandbox. Parent directories are created automatically.

Input schema

Field	Type	Required	Description
`path`	string	yes	File path; prefixed to `/agent/workspace/` if not under `/agent/`
`content`	string	yes	File content (UTF-8)

Output

Written 128 bytes to /agent/workspace/script.py

`read_file`

Read a file from the sandbox filesystem.

Input schema

Field	Type	Required	Description
`path`	string	yes	File path; prefixed to `/agent/workspace/` if not under `/agent/`

Output

<file contents as plain text>

`list_files`

List files and directories under a path in the sandbox.

Input schema

Field	Type	Required	Default	Description
`path`	string	no	`/agent/workspace`	Directory path; must be under `/agent/`

Output

total 8
drwxr-xr-x 2 agent agent 4096 Jan  1 00:00 .
drwxr-xr-x 4 agent agent 4096 Jan  1 00:00 ..
-rw-r--r-- 1 agent agent  128 Jan  1 00:00 script.py

`audit_log`

View the JSONL audit trail of all tool calls in the current container session.

Input schema

Field	Type	Required	Default	Description
`last_n`	number	no	20	Number of most recent entries to return

Output

[2024-01-01T00:00:00.000Z] run_bash -> success (45ms)
[2024-01-01T00:00:01.000Z] write_file -> success (3ms)
[2024-01-01T00:00:02.000Z] run_bash -> error (12ms)

Each entry in the underlying JSONL file at /agent/audit/toolcalls.jsonl has this shape:

{
  "id": 1718000000000,
  "tool": "run_bash",
  "input": { "command": "echo hello", "working_dir": "/agent/workspace" },
  "output": "hello\n",
  "status": "success",
  "duration_ms": 12,
  "timestamp": "2024-01-01T00:00:00.000Z"
}

Security Model

7-Layer Defense

Layer	Mechanism	What it prevents
1	`--network none`	All outbound and inbound network traffic
2	`--cap-drop ALL`	Privilege escalation via Linux capabilities
3	`--security-opt no-new-privileges`	setuid/setgid privilege escalation
4	`--memory 512m --cpus 1 --pids-limit 100`	Resource exhaustion and fork bombs at OS level
5	Non-root `agent` user (home: `/agent`)	Host filesystem access, root-only operations
6	Command blocklist in `server.mjs`	Known destructive/escape patterns at application level
7	JSONL audit trail	Full visibility into every tool call for post-hoc review

Blocked Command Patterns

The following 15 regex patterns are rejected before execution:

Pattern	What it blocks
`rm -rf /`	Recursive root deletion
`mkfs`	Filesystem formatting
`dd if=`	Raw disk writes
`:(){ :	:& };:`
`curl ...	sh`
`wget ...	sh`
`chmod +s`	setuid/setgid bit manipulation
`chown root`	Ownership change to root
`nsenter`	Container namespace escape
`mount`	Filesystem mounting
`umount`	Filesystem unmounting
`shutdown`	System shutdown
`reboot`	System reboot
`halt`	System halt
`docker`	Docker-in-Docker escape
`kubectl`	Kubernetes API access

Path Confinement

run_bash: working directory is forced to /agent/workspace if not already under /agent/
write_file / read_file: paths outside /agent/ are prefixed with /agent/workspace/
list_files: directory outside /agent/ falls back to /agent/workspace

Output Limits

stdout buffer: 1 MB per command
audit entry output field: 4 KB (truncated)
command timeout: max 120 seconds regardless of timeout_ms input

Setup

Prerequisites

Docker (any recent version)
Node.js 22+ (only needed for test-agent.mjs; not required if registering as an MCP server)

Build the Image

git clone https://github.com/nl2shell/sandbox-bash-mcp
cd sandbox-bash-mcp
docker build -t sandbox-bash-mcp .

Verify the Build

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' \
  | docker run --rm -i --network none sandbox-bash-mcp

You should receive an MCP initialize response with serverInfo.name: "sandbox-bash".

Registering as an MCP Server

The server communicates over stdio. Register it using the wrapper script run.sh which applies all security flags automatically.

Claude Code (`~/.claude.json`)

{
  "mcpServers": {
    "sandbox-bash": {
      "command": "/path/to/sandbox-bash-mcp/run.sh",
      "args": []
    }
  }
}

LM Studio (`~/.lmstudio/mcp.json`)

{
  "servers": {
    "sandbox-bash": {
      "command": "/path/to/sandbox-bash-mcp/run.sh",
      "args": []
    }
  }
}

Codex (`~/.codex/config.toml`)

[[mcp_servers]]
name = "sandbox-bash"
command = "/path/to/sandbox-bash-mcp/run.sh"
args = []

System Prompt for Best Results

Add this to your agent's system prompt to encourage immediate tool use:

You are a bash operator with a sandboxed Linux container.
When asked to do anything, USE your tools immediately. Do not explain first.
Tools: run_bash (execute commands), write_file (create files), read_file (read files).
The sandbox has: bash, python3, node v22, jq, git, tree, bc.
Working directory: /agent/workspace. Files persist between calls in a session.

Integration with nl2shell-web

nl2shell-web uses this sandbox as the execution backend for the web interface. The relay pattern works as follows:

Browser -> nl2shell-web server -> spawns sandbox-bash-mcp container
                               -> sends MCP tool calls over stdio
                               -> streams results back to browser

The web server spawns a fresh container per session using the same docker run flags as run.sh. Tool results are forwarded as SSE or WebSocket messages to the browser.

Each container is ephemeral: when the web session ends, the container and all files inside it are destroyed. The audit log at /agent/audit/toolcalls.jsonl exists only for the container's lifetime unless explicitly exported before teardown.

Agent Test Harness

test-agent.mjs is a self-contained agent loop that connects LM Studio local models to a live sandbox container. It implements a full tool-calling loop without any external agent framework.

Usage

# Default model (liquid/lfm2.5-1.2b), default prompt
node test-agent.mjs

# Custom prompt
node test-agent.mjs "create a python fibonacci script and run it"

# Different model
MODEL=gemma-3-270m-it-mlx node test-agent.mjs "write a shell sort in bash"

Requirements

LM Studio running on http://127.0.0.1:1234 with a tool-calling capable model loaded
Docker image built (docker build -t sandbox-bash-mcp .)

How It Works

Spawns a sandbox-bash-mcp Docker container with all security flags
Sends an MCP initialize handshake over the container's stdin
Calls LM Studio's OpenAI-compatible /v1/chat/completions with the tool schemas
On each turn: parses tool_calls from the model response, dispatches them to the container via MCP tools/call, feeds results back as tool role messages
Continues until finish_reason === "stop" or MAX_TURNS (15) is reached
Kills the container on exit

Example Output

--- Sandbox Agent ---
Model: liquid/lfm2.5-1.2b | Max turns: 15
Prompt: create a python fibonacci script and run it

[Turn 1] write_file({"path":"/agent/workspace/fib.py","content":"..."})
[Output] Written 89 bytes to /agent/workspace/fib.py

[Turn 2] run_bash({"command":"python3 fib.py"})
[Output] $ python3 fib.py
0 1 1 2 3 5 8 13 21 34
[exit: 0 | 234ms | audit: 1718000000000]

[Model] Done. The script prints Fibonacci numbers up to index 10.

--- Done (2 turns, 412 tokens) ---

Development

# Build image
npm run docker:build

# Run server directly (inside container)
npm run docker:run

# Run agent test
node test-agent.mjs "list all files"

The server itself has no external runtime dependencies beyond @modelcontextprotocol/sdk and zod. All tool logic is in server.mjs (245 lines). The Dockerfile installs packages at build time only.

File Structure

sandbox-bash-mcp/
├── Dockerfile        # Ubuntu/node:22-slim, installs tools, drops to agent user
├── server.mjs        # MCP server: 5 tools, blocklist, audit (245 lines)
├── test-agent.mjs    # LM Studio integration test harness
├── run.sh            # Docker run wrapper with all security flags
├── package.json      # Dependencies: @modelcontextprotocol/sdk, agentfs-sdk
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.dockerignore		.dockerignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
README.md		README.md
package.json		package.json
run.sh		run.sh
server.mjs		server.mjs
test-agent.mjs		test-agent.mjs

Folders and files

Latest commit

History

Repository files navigation

sandbox-bash-mcp

Organization Context

Architecture

MCP Tools

run_bash

write_file

read_file

list_files

audit_log

Security Model

7-Layer Defense

Blocked Command Patterns

Path Confinement

Output Limits

Setup

Prerequisites

Build the Image

Verify the Build

Registering as an MCP Server

Claude Code (~/.claude.json)

LM Studio (~/.lmstudio/mcp.json)

Codex (~/.codex/config.toml)

System Prompt for Best Results

Integration with nl2shell-web

Agent Test Harness

Usage

Requirements

How It Works

Example Output

Development

File Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run_bash`

`write_file`

`read_file`

`list_files`

`audit_log`

Claude Code (`~/.claude.json`)

LM Studio (`~/.lmstudio/mcp.json`)

Codex (`~/.codex/config.toml`)

Packages