Skip to content

nl2shell/sandbox-bash-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sandbox-bash-mcp

MCP server for sandboxed bash execution inside an isolated Docker container. Zero host access. Seven layers of security. Full audit trail. Used by nl2shell-web to give the NL2Shell agent a safe environment to run generated shell commands.

Part of the nl2shell organization — the AI-powered natural language to shell command system.

Keywords: MCP server bash sandbox Docker isolated container AI agent tool-calling secure shell execution model context protocol


Organization Context

Repo Role
nl2shell Core ML — NL-to-shell model training and inference
nl2shell-web Website — calls this sandbox via a relay server
vox Voice CLI — terminal voice interface
sandbox-bash-mcp (this repo) Docker sandbox — MCP server for safe bash execution
collab Desktop app — collaborative shell session UI

Architecture

 ┌─────────────────────────────────────────────────────────┐
 │  AI Agent Hosts                                          │
 │  LM Studio / Claude Code / Codex / nl2shell-web relay   │
 └───────────────────┬─────────────────────────────────────┘
                     │  MCP protocol over stdio (JSON-RPC 2.0)
                     ▼
 ┌─────────────────────────────────────────────────────────┐
 │  sandbox-bash-mcp  (server.mjs)                          │
 │  ─────────────────────────────────────────────────────  │
 │  MCP tools: run_bash, write_file, read_file,             │
 │             list_files, audit_log                        │
 │                                                          │
 │  Security middleware:                                    │
 │    └── Command blocklist (15 patterns)                   │
 │    └── Path confinement (/agent/ only)                   │
 │    └── Output cap (1 MB stdout, 4 KB audit)              │
 │    └── JSONL audit trail (/agent/audit/toolcalls.jsonl)  │
 └───────────────────┬─────────────────────────────────────┘
                     │  execSync inside container
                     ▼
 ┌─────────────────────────────────────────────────────────┐
 │  Docker container (Ubuntu/node:22-slim)                  │
 │  ─────────────────────────────────────────────────────  │
 │  Layer 1: --network none          (no outbound traffic)  │
 │  Layer 2: --cap-drop ALL          (no Linux caps)        │
 │  Layer 3: --security-opt no-new-privileges               │
 │  Layer 4: --memory 512m --cpus 1 --pids-limit 100        │
 │  Layer 5: USER agent              (non-root, uid!=0)     │
 │  Layer 6: command blocklist       (in server.mjs)        │
 │  Layer 7: JSONL audit trail       (/agent/audit/)        │
 │                                                          │
 │  Available tools: bash, python3, node v22, jq,           │
 │                   git, tree, bc, file, procps            │
 │  Workspace: /agent/workspace  (persistent in session)    │
 └─────────────────────────────────────────────────────────┘

MCP Tools

All tools speak JSON-RPC 2.0 over stdio. Inputs and outputs are MCP content arrays with type: "text".

run_bash

Execute a bash command inside the sandboxed container.

Input schema

Field Type Required Default Description
command string yes Bash command to execute
timeout_ms number no 30000 Timeout in ms (capped at 120000)
working_dir string no /agent/workspace CWD; must be under /agent/

Output (success)

$ echo hello
hello
[exit: 0 | 12ms | audit: 1718000000000]

Output (blocked)

BLOCKED: This command is not allowed in the sandbox.
Reason: matches security blocklist
Audit ID: 1718000000000

Output (error)

$ cat /etc/shadow
cat: /etc/shadow: Permission denied
[exit: 1 | 5ms | audit: 1718000000001]

write_file

Write content to a file in the sandbox. Parent directories are created automatically.

Input schema

Field Type Required Description
path string yes File path; prefixed to /agent/workspace/ if not under /agent/
content string yes File content (UTF-8)

Output

Written 128 bytes to /agent/workspace/script.py

read_file

Read a file from the sandbox filesystem.

Input schema

Field Type Required Description
path string yes File path; prefixed to /agent/workspace/ if not under /agent/

Output

<file contents as plain text>

list_files

List files and directories under a path in the sandbox.

Input schema

Field Type Required Default Description
path string no /agent/workspace Directory path; must be under /agent/

Output

total 8
drwxr-xr-x 2 agent agent 4096 Jan  1 00:00 .
drwxr-xr-x 4 agent agent 4096 Jan  1 00:00 ..
-rw-r--r-- 1 agent agent  128 Jan  1 00:00 script.py

audit_log

View the JSONL audit trail of all tool calls in the current container session.

Input schema

Field Type Required Default Description
last_n number no 20 Number of most recent entries to return

Output

[2024-01-01T00:00:00.000Z] run_bash -> success (45ms)
[2024-01-01T00:00:01.000Z] write_file -> success (3ms)
[2024-01-01T00:00:02.000Z] run_bash -> error (12ms)

Each entry in the underlying JSONL file at /agent/audit/toolcalls.jsonl has this shape:

{
  "id": 1718000000000,
  "tool": "run_bash",
  "input": { "command": "echo hello", "working_dir": "/agent/workspace" },
  "output": "hello\n",
  "status": "success",
  "duration_ms": 12,
  "timestamp": "2024-01-01T00:00:00.000Z"
}

Security Model

7-Layer Defense

Layer Mechanism What it prevents
1 --network none All outbound and inbound network traffic
2 --cap-drop ALL Privilege escalation via Linux capabilities
3 --security-opt no-new-privileges setuid/setgid privilege escalation
4 --memory 512m --cpus 1 --pids-limit 100 Resource exhaustion and fork bombs at OS level
5 Non-root agent user (home: /agent) Host filesystem access, root-only operations
6 Command blocklist in server.mjs Known destructive/escape patterns at application level
7 JSONL audit trail Full visibility into every tool call for post-hoc review

Blocked Command Patterns

The following 15 regex patterns are rejected before execution:

Pattern What it blocks
rm -rf / Recursive root deletion
mkfs Filesystem formatting
dd if= Raw disk writes
`:(){ : :& };:`
`curl ... sh`
`wget ... sh`
chmod +s setuid/setgid bit manipulation
chown root Ownership change to root
nsenter Container namespace escape
mount Filesystem mounting
umount Filesystem unmounting
shutdown System shutdown
reboot System reboot
halt System halt
docker Docker-in-Docker escape
kubectl Kubernetes API access

Path Confinement

  • run_bash: working directory is forced to /agent/workspace if not already under /agent/
  • write_file / read_file: paths outside /agent/ are prefixed with /agent/workspace/
  • list_files: directory outside /agent/ falls back to /agent/workspace

Output Limits

  • stdout buffer: 1 MB per command
  • audit entry output field: 4 KB (truncated)
  • command timeout: max 120 seconds regardless of timeout_ms input

Setup

Prerequisites

  • Docker (any recent version)
  • Node.js 22+ (only needed for test-agent.mjs; not required if registering as an MCP server)

Build the Image

git clone https://github.com/nl2shell/sandbox-bash-mcp
cd sandbox-bash-mcp
docker build -t sandbox-bash-mcp .

Verify the Build

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' \
  | docker run --rm -i --network none sandbox-bash-mcp

You should receive an MCP initialize response with serverInfo.name: "sandbox-bash".


Registering as an MCP Server

The server communicates over stdio. Register it using the wrapper script run.sh which applies all security flags automatically.

Claude Code (~/.claude.json)

{
  "mcpServers": {
    "sandbox-bash": {
      "command": "/path/to/sandbox-bash-mcp/run.sh",
      "args": []
    }
  }
}

LM Studio (~/.lmstudio/mcp.json)

{
  "servers": {
    "sandbox-bash": {
      "command": "/path/to/sandbox-bash-mcp/run.sh",
      "args": []
    }
  }
}

Codex (~/.codex/config.toml)

[[mcp_servers]]
name = "sandbox-bash"
command = "/path/to/sandbox-bash-mcp/run.sh"
args = []

System Prompt for Best Results

Add this to your agent's system prompt to encourage immediate tool use:

You are a bash operator with a sandboxed Linux container.
When asked to do anything, USE your tools immediately. Do not explain first.
Tools: run_bash (execute commands), write_file (create files), read_file (read files).
The sandbox has: bash, python3, node v22, jq, git, tree, bc.
Working directory: /agent/workspace. Files persist between calls in a session.

Integration with nl2shell-web

nl2shell-web uses this sandbox as the execution backend for the web interface. The relay pattern works as follows:

Browser -> nl2shell-web server -> spawns sandbox-bash-mcp container
                               -> sends MCP tool calls over stdio
                               -> streams results back to browser

The web server spawns a fresh container per session using the same docker run flags as run.sh. Tool results are forwarded as SSE or WebSocket messages to the browser.

Each container is ephemeral: when the web session ends, the container and all files inside it are destroyed. The audit log at /agent/audit/toolcalls.jsonl exists only for the container's lifetime unless explicitly exported before teardown.


Agent Test Harness

test-agent.mjs is a self-contained agent loop that connects LM Studio local models to a live sandbox container. It implements a full tool-calling loop without any external agent framework.

Usage

# Default model (liquid/lfm2.5-1.2b), default prompt
node test-agent.mjs

# Custom prompt
node test-agent.mjs "create a python fibonacci script and run it"

# Different model
MODEL=gemma-3-270m-it-mlx node test-agent.mjs "write a shell sort in bash"

Requirements

  • LM Studio running on http://127.0.0.1:1234 with a tool-calling capable model loaded
  • Docker image built (docker build -t sandbox-bash-mcp .)

How It Works

  1. Spawns a sandbox-bash-mcp Docker container with all security flags
  2. Sends an MCP initialize handshake over the container's stdin
  3. Calls LM Studio's OpenAI-compatible /v1/chat/completions with the tool schemas
  4. On each turn: parses tool_calls from the model response, dispatches them to the container via MCP tools/call, feeds results back as tool role messages
  5. Continues until finish_reason === "stop" or MAX_TURNS (15) is reached
  6. Kills the container on exit

Example Output

--- Sandbox Agent ---
Model: liquid/lfm2.5-1.2b | Max turns: 15
Prompt: create a python fibonacci script and run it

[Turn 1] write_file({"path":"/agent/workspace/fib.py","content":"..."})
[Output] Written 89 bytes to /agent/workspace/fib.py

[Turn 2] run_bash({"command":"python3 fib.py"})
[Output] $ python3 fib.py
0 1 1 2 3 5 8 13 21 34
[exit: 0 | 234ms | audit: 1718000000000]

[Model] Done. The script prints Fibonacci numbers up to index 10.

--- Done (2 turns, 412 tokens) ---

Development

# Build image
npm run docker:build

# Run server directly (inside container)
npm run docker:run

# Run agent test
node test-agent.mjs "list all files"

The server itself has no external runtime dependencies beyond @modelcontextprotocol/sdk and zod. All tool logic is in server.mjs (245 lines). The Dockerfile installs packages at build time only.


File Structure

sandbox-bash-mcp/
├── Dockerfile        # Ubuntu/node:22-slim, installs tools, drops to agent user
├── server.mjs        # MCP server: 5 tools, blocklist, audit (245 lines)
├── test-agent.mjs    # LM Studio integration test harness
├── run.sh            # Docker run wrapper with all security flags
├── package.json      # Dependencies: @modelcontextprotocol/sdk, agentfs-sdk
└── README.md

Releases

No releases published

Packages

 
 
 

Contributors