MCP server for sandboxed bash execution inside an isolated Docker container. Zero host access. Seven layers of security. Full audit trail. Used by nl2shell-web to give the NL2Shell agent a safe environment to run generated shell commands.
Part of the nl2shell organization — the AI-powered natural language to shell command system.
Keywords: MCP server bash sandbox Docker isolated container AI agent tool-calling secure shell execution model context protocol
| Repo | Role |
|---|---|
| nl2shell | Core ML — NL-to-shell model training and inference |
| nl2shell-web | Website — calls this sandbox via a relay server |
| vox | Voice CLI — terminal voice interface |
| sandbox-bash-mcp (this repo) | Docker sandbox — MCP server for safe bash execution |
| collab | Desktop app — collaborative shell session UI |
┌─────────────────────────────────────────────────────────┐
│ AI Agent Hosts │
│ LM Studio / Claude Code / Codex / nl2shell-web relay │
└───────────────────┬─────────────────────────────────────┘
│ MCP protocol over stdio (JSON-RPC 2.0)
▼
┌─────────────────────────────────────────────────────────┐
│ sandbox-bash-mcp (server.mjs) │
│ ───────────────────────────────────────────────────── │
│ MCP tools: run_bash, write_file, read_file, │
│ list_files, audit_log │
│ │
│ Security middleware: │
│ └── Command blocklist (15 patterns) │
│ └── Path confinement (/agent/ only) │
│ └── Output cap (1 MB stdout, 4 KB audit) │
│ └── JSONL audit trail (/agent/audit/toolcalls.jsonl) │
└───────────────────┬─────────────────────────────────────┘
│ execSync inside container
▼
┌─────────────────────────────────────────────────────────┐
│ Docker container (Ubuntu/node:22-slim) │
│ ───────────────────────────────────────────────────── │
│ Layer 1: --network none (no outbound traffic) │
│ Layer 2: --cap-drop ALL (no Linux caps) │
│ Layer 3: --security-opt no-new-privileges │
│ Layer 4: --memory 512m --cpus 1 --pids-limit 100 │
│ Layer 5: USER agent (non-root, uid!=0) │
│ Layer 6: command blocklist (in server.mjs) │
│ Layer 7: JSONL audit trail (/agent/audit/) │
│ │
│ Available tools: bash, python3, node v22, jq, │
│ git, tree, bc, file, procps │
│ Workspace: /agent/workspace (persistent in session) │
└─────────────────────────────────────────────────────────┘
All tools speak JSON-RPC 2.0 over stdio. Inputs and outputs are MCP content arrays with type: "text".
Execute a bash command inside the sandboxed container.
Input schema
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
command |
string | yes | — | Bash command to execute |
timeout_ms |
number | no | 30000 | Timeout in ms (capped at 120000) |
working_dir |
string | no | /agent/workspace |
CWD; must be under /agent/ |
Output (success)
$ echo hello
hello
[exit: 0 | 12ms | audit: 1718000000000]
Output (blocked)
BLOCKED: This command is not allowed in the sandbox.
Reason: matches security blocklist
Audit ID: 1718000000000
Output (error)
$ cat /etc/shadow
cat: /etc/shadow: Permission denied
[exit: 1 | 5ms | audit: 1718000000001]
Write content to a file in the sandbox. Parent directories are created automatically.
Input schema
| Field | Type | Required | Description |
|---|---|---|---|
path |
string | yes | File path; prefixed to /agent/workspace/ if not under /agent/ |
content |
string | yes | File content (UTF-8) |
Output
Written 128 bytes to /agent/workspace/script.py
Read a file from the sandbox filesystem.
Input schema
| Field | Type | Required | Description |
|---|---|---|---|
path |
string | yes | File path; prefixed to /agent/workspace/ if not under /agent/ |
Output
<file contents as plain text>
List files and directories under a path in the sandbox.
Input schema
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
path |
string | no | /agent/workspace |
Directory path; must be under /agent/ |
Output
total 8
drwxr-xr-x 2 agent agent 4096 Jan 1 00:00 .
drwxr-xr-x 4 agent agent 4096 Jan 1 00:00 ..
-rw-r--r-- 1 agent agent 128 Jan 1 00:00 script.py
View the JSONL audit trail of all tool calls in the current container session.
Input schema
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
last_n |
number | no | 20 | Number of most recent entries to return |
Output
[2024-01-01T00:00:00.000Z] run_bash -> success (45ms)
[2024-01-01T00:00:01.000Z] write_file -> success (3ms)
[2024-01-01T00:00:02.000Z] run_bash -> error (12ms)
Each entry in the underlying JSONL file at /agent/audit/toolcalls.jsonl has this shape:
{
"id": 1718000000000,
"tool": "run_bash",
"input": { "command": "echo hello", "working_dir": "/agent/workspace" },
"output": "hello\n",
"status": "success",
"duration_ms": 12,
"timestamp": "2024-01-01T00:00:00.000Z"
}| Layer | Mechanism | What it prevents |
|---|---|---|
| 1 | --network none |
All outbound and inbound network traffic |
| 2 | --cap-drop ALL |
Privilege escalation via Linux capabilities |
| 3 | --security-opt no-new-privileges |
setuid/setgid privilege escalation |
| 4 | --memory 512m --cpus 1 --pids-limit 100 |
Resource exhaustion and fork bombs at OS level |
| 5 | Non-root agent user (home: /agent) |
Host filesystem access, root-only operations |
| 6 | Command blocklist in server.mjs |
Known destructive/escape patterns at application level |
| 7 | JSONL audit trail | Full visibility into every tool call for post-hoc review |
The following 15 regex patterns are rejected before execution:
| Pattern | What it blocks |
|---|---|
rm -rf / |
Recursive root deletion |
mkfs |
Filesystem formatting |
dd if= |
Raw disk writes |
| `:(){ : | :& };:` |
| `curl ... | sh` |
| `wget ... | sh` |
chmod +s |
setuid/setgid bit manipulation |
chown root |
Ownership change to root |
nsenter |
Container namespace escape |
mount |
Filesystem mounting |
umount |
Filesystem unmounting |
shutdown |
System shutdown |
reboot |
System reboot |
halt |
System halt |
docker |
Docker-in-Docker escape |
kubectl |
Kubernetes API access |
run_bash: working directory is forced to/agent/workspaceif not already under/agent/write_file/read_file: paths outside/agent/are prefixed with/agent/workspace/list_files: directory outside/agent/falls back to/agent/workspace
- stdout buffer: 1 MB per command
- audit entry output field: 4 KB (truncated)
- command timeout: max 120 seconds regardless of
timeout_msinput
- Docker (any recent version)
- Node.js 22+ (only needed for
test-agent.mjs; not required if registering as an MCP server)
git clone https://github.com/nl2shell/sandbox-bash-mcp
cd sandbox-bash-mcp
docker build -t sandbox-bash-mcp .echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' \
| docker run --rm -i --network none sandbox-bash-mcpYou should receive an MCP initialize response with serverInfo.name: "sandbox-bash".
The server communicates over stdio. Register it using the wrapper script run.sh which applies all security flags automatically.
{
"mcpServers": {
"sandbox-bash": {
"command": "/path/to/sandbox-bash-mcp/run.sh",
"args": []
}
}
}{
"servers": {
"sandbox-bash": {
"command": "/path/to/sandbox-bash-mcp/run.sh",
"args": []
}
}
}[[mcp_servers]]
name = "sandbox-bash"
command = "/path/to/sandbox-bash-mcp/run.sh"
args = []Add this to your agent's system prompt to encourage immediate tool use:
You are a bash operator with a sandboxed Linux container.
When asked to do anything, USE your tools immediately. Do not explain first.
Tools: run_bash (execute commands), write_file (create files), read_file (read files).
The sandbox has: bash, python3, node v22, jq, git, tree, bc.
Working directory: /agent/workspace. Files persist between calls in a session.
nl2shell-web uses this sandbox as the execution backend for the web interface. The relay pattern works as follows:
Browser -> nl2shell-web server -> spawns sandbox-bash-mcp container
-> sends MCP tool calls over stdio
-> streams results back to browser
The web server spawns a fresh container per session using the same docker run flags as run.sh. Tool results are forwarded as SSE or WebSocket messages to the browser.
Each container is ephemeral: when the web session ends, the container and all files inside it are destroyed. The audit log at /agent/audit/toolcalls.jsonl exists only for the container's lifetime unless explicitly exported before teardown.
test-agent.mjs is a self-contained agent loop that connects LM Studio local models to a live sandbox container. It implements a full tool-calling loop without any external agent framework.
# Default model (liquid/lfm2.5-1.2b), default prompt
node test-agent.mjs
# Custom prompt
node test-agent.mjs "create a python fibonacci script and run it"
# Different model
MODEL=gemma-3-270m-it-mlx node test-agent.mjs "write a shell sort in bash"- LM Studio running on
http://127.0.0.1:1234with a tool-calling capable model loaded - Docker image built (
docker build -t sandbox-bash-mcp .)
- Spawns a
sandbox-bash-mcpDocker container with all security flags - Sends an MCP
initializehandshake over the container's stdin - Calls LM Studio's OpenAI-compatible
/v1/chat/completionswith the tool schemas - On each turn: parses
tool_callsfrom the model response, dispatches them to the container via MCPtools/call, feeds results back astoolrole messages - Continues until
finish_reason === "stop"orMAX_TURNS(15) is reached - Kills the container on exit
--- Sandbox Agent ---
Model: liquid/lfm2.5-1.2b | Max turns: 15
Prompt: create a python fibonacci script and run it
[Turn 1] write_file({"path":"/agent/workspace/fib.py","content":"..."})
[Output] Written 89 bytes to /agent/workspace/fib.py
[Turn 2] run_bash({"command":"python3 fib.py"})
[Output] $ python3 fib.py
0 1 1 2 3 5 8 13 21 34
[exit: 0 | 234ms | audit: 1718000000000]
[Model] Done. The script prints Fibonacci numbers up to index 10.
--- Done (2 turns, 412 tokens) ---
# Build image
npm run docker:build
# Run server directly (inside container)
npm run docker:run
# Run agent test
node test-agent.mjs "list all files"The server itself has no external runtime dependencies beyond @modelcontextprotocol/sdk and zod. All tool logic is in server.mjs (245 lines). The Dockerfile installs packages at build time only.
sandbox-bash-mcp/
├── Dockerfile # Ubuntu/node:22-slim, installs tools, drops to agent user
├── server.mjs # MCP server: 5 tools, blocklist, audit (245 lines)
├── test-agent.mjs # LM Studio integration test harness
├── run.sh # Docker run wrapper with all security flags
├── package.json # Dependencies: @modelcontextprotocol/sdk, agentfs-sdk
└── README.md