test-env

Spiderweb Test Environment

Docker-based testing environment for Spiderweb. This creates a clean, disposable Debian container for testing the install script and Spiderweb functionality without affecting your main server.

Quick Start

# Build and start the test environment
docker-compose up --build

# Or run in detached mode
docker-compose up -d --build

# Enter the running container
docker exec -it spiderweb-test bash

# Inside the container, run the install script
./install.sh

Manual Testing

# Build the image
docker build -t spiderweb-test .

# Run interactively
docker run -it --rm --name spiderweb-test spiderweb-test

# Run with API key from environment (for automated testing)
docker run -it --rm \
  -e SPIDERWEB_PROVIDER=openai \
  -e SPIDERWEB_MODEL=gpt-4o-mini \
  -e SPIDERWEB_API_KEY=sk-xxx \
  spiderweb-test

# Run with port forwarding (to test from host)
docker run -it --rm \
  -p 18790:18790 \
  --name spiderweb-test \
  spiderweb-test

Testing the Install Script

# Test the full interactive install
curl -fsSL https://raw.githubusercontent.com/DeanoC/Spiderweb/main/install.sh | bash

# Or drive the default non-interactive path explicitly.
# On Linux x86_64, auto now prefers the latest published GitHub release.
SPIDERWEB_NON_INTERACTIVE=1 \
SPIDERWEB_INSTALL_ZSS=0 \
SPIDERWEB_INSTALL_SYSTEMD=0 \
SPIDERWEB_START_AFTER_INSTALL=0 \
bash ./install.sh

# Force a local source build instead of the default release install
SPIDERWEB_NON_INTERACTIVE=1 \
SPIDERWEB_INSTALL_SOURCE=source \
SPIDERWEB_INSTALL_ZSS=0 \
SPIDERWEB_INSTALL_SYSTEMD=0 \
SPIDERWEB_START_AFTER_INSTALL=0 \
bash ./install.sh

# Release-binary path
SPIDERWEB_NON_INTERACTIVE=1 \
SPIDERWEB_INSTALL_SOURCE=release \
SPIDERWEB_RELEASE_ARCHIVE_URL=https://github.com/DeanoC/Spiderweb/releases/download/vX.Y.Z/spiderweb-linux-x86_64.tar.gz \
SPIDERWEB_RELEASE_ARCHIVE_SHA256=<sha256> \
SPIDERWEB_INSTALL_ZSS=0 \
SPIDERWEB_INSTALL_SYSTEMD=0 \
SPIDERWEB_START_AFTER_INSTALL=0 \
bash ./install.sh

External Codex Workspace E2E Harness

This harness documents and exercises the Linux-first external Codex operator path:

installer-first host flow (./install.sh on the Spiderweb host)
generic dev-template workspace baseline that can outlive any one agent session
isolated Spiderweb runtime root plus a clean standalone local workspace node
standalone spiderweb-fs-node as the remote filesystem node under test
namespace mount via spiderweb-fs-mount --namespace-url ...
plain Codex launch in live or manual-handoff mode
agent-driven in-workspace bootstrap, validation, and report artifact capture

The harness assumes one Spiderweb-owned mount model across macOS, Linux, and Windows: workers start in the mounted workspace directory, and .spiderweb is a server-projected part of that same namespace rather than a client overlay or direct endpoint shortcut.

Mounted namespace paths used by the harness:

local writable project tree: /nodes/local/fs
remote shared seed data: /shared_data
workspace metadata: /projects/<workspace_id>/meta/*
namespace metadata: /meta/*
generic workspace services: /services/*

Run the Linux harness directly from the repo root:

bash test-env/test-external-codex-workspace.sh

For a faster smoke iteration that still exercises bootstrap, mount, external Codex launch, and mounted writes, use the light scenario:

bash test-env/test-external-codex-workspace-light.sh

On macOS with OrbStack installed, use the Orb wrapper so the exact same Linux harness runs inside Orb:

bash test-env/test-external-codex-workspace-orb.sh

Light Orb smoke variant:

bash test-env/test-external-codex-workspace-orb-light.sh

Use ORB_MACHINE=<name> or ORB_USER=<user> if you need a non-default Orb target.

For the native macOS mount path, use the dedicated native harness after the current Spiderweb app build has been installed with spiderweb-config config install-fs-extension:

bash test-env/test-external-codex-workspace-macos.sh

Light native macOS smoke variant:

bash test-env/test-external-codex-workspace-macos-light.sh

Or through make:

cd test-env && make test-external-codex-workspace
cd test-env && make test-external-codex-workspace-light

Repeatability runner:

cd test-env && make test-external-codex-repeatability

Compatibility matrix runner:

cd test-env && make test-external-codex-cli-matrix

Repro bundle packager:

cd test-env && make package-external-codex-repro

Codex launch controls:

CODEX_MODE=auto: try a live Codex launch, then fall back to the dedicated handoff package if the launcher is unavailable or the live step cannot proceed
CODEX_MODE=live: require a real Codex launch; launch failure fails the harness
CODEX_MODE=manual: skip live launch and prepare the manual handoff package only
CODEX_BIN: override the detected Codex binary
CODEX_CLI_VERSION: pinned plain Codex CLI version the harness expects. Default: 0.111.0
CODEX_AUTH_MODE=auto|api_key|existing_login: choose isolated API-key auth or an existing login. auto prefers API-key auth when OPENAI_API_KEY is set
CODEX_API_KEY_ENV: environment variable name to read for api_key mode. Default: OPENAI_API_KEY
CODEX_LAUNCH_CMD: override the detected launcher when the default codex exec template is not correct for the machine
CODEX_TIMEOUT_SECONDS: maximum seconds to allow the live Codex phase before the harness fails with a diagnostic handoff/report. Default: 900
CODEX_IDLE_TIMEOUT_SECONDS: optional idle cutoff for the live Codex phase. Default: 0 (disabled), because codex exec --json can spend long periods silently reasoning before the next visible tool or file event.
CODEX_JSON_EVENTS=1: inject --json into common codex exec launch templates and preserve the raw Codex event stream in logs/codex.stdout.log
CODEX_USE_PTY=1: wrap the live Codex launch in script(1) so the run behaves like a real terminal session and preserves logs/codex.pty.log
CODEX_DISABLE_COLLABORATION_MODES=1: inject --disable collaboration_modes into common codex exec templates unless disabled
CODEX_DISABLE_APPS=1: inject --disable apps by default because the current live Spiderweb path is more reliable without the apps surface in non-interactive exec
CODEX_DISABLE_SHELL_SNAPSHOT=1: inject --disable shell_snapshot by default because the current live Spiderweb path is more reliable without shell snapshotting in non-interactive exec
CODEX_ALLOW_HOST_CODEX_HOME=1: temporarily allow writes under host ~/.codex for reliability while still reporting them as a codex_home machine-independence gap
SPIDERWEB_INSTALL_SOURCE=auto|source|release: choose whether the harness compiles Spiderweb locally or installs from a prebuilt archive. Default: auto, which delegates to install.sh defaults and retries with source if the selected release path cannot provide the current harness binary set
SPIDERWEB_RELEASE_ARCHIVE_URL: release asset URL to use when SPIDERWEB_INSTALL_SOURCE=release. Default: unset
SPIDERWEB_RELEASE_ARCHIVE_SHA256: optional checksum for the release archive
SPIDERWEB_RELEASE_VERSION: label recorded in installer output for the chosen release build. Default: unset

Current note:

The standalone installer now defaults to the latest published GitHub release on supported Linux machines to avoid unnecessary rebuilds for normal users.
The external Codex harness now follows the installer defaults and automatically retries with source when a selected release path is missing binaries that the current checkout expects.

Expected output artifacts:

codex_exec_summary.json
codex_usage_report.json
codex_usage_report.md
bootstrap_provenance.json
codex_progress_timeline.json
game_validation.json
codex_handoff/

Repeatability artifacts:

repeatability_summary.json
repeatability_summary.md
one subdirectory per run, each containing the normal live harness artifacts

Repeatability interruption behavior:

if you intentionally stop the repeatability runner mid-batch, it now writes partial repeatability_summary.json and repeatability_summary.md files from whatever artifacts already exist
interrupted runs are marked with interrupted=true plus an interrupt_reason, so you can still see whether the run had already reached bootstrap, validation, or report generation Matrix runner artifacts:
matrix_summary.json
matrix_summary.md
one subdirectory per case, each containing the normal live harness artifacts

Repro bundle artifacts:

README.md
BUG_REPORT.md
repro_manifest.json
source_summaries/
cases/
optional *.tar.gz bundle

Usage report result semantics:

reliability_ok: true only when the run stayed inside the mounted workspace plus harness-owned runtime roots, plus any explicit temporary host-write allowlists
workspace_bootstrap_ok: true only when the attached agent read the bootstrap metadata and performed the required in-workspace bootstrap actions
machine_independence_ok: true only when no host-runtime gaps were observed
workspace_bound_services: services bound under /services/* for the mounted workspace
namespace_visible_services: services visible somewhere in the namespace, even if not workspace-bound under /services/*
external_prereqs_observed: declared external prerequisites observed during the run, such as the operator-installed Codex runtime
candidate_venom_gaps: inferred local-runtime gaps such as codex_home, terminal_runtime, git_runtime, and search_code_bridge

Fallback behavior:

auto does not silently skip the Codex step
the harness should still preserve the namespace-mounted workspace context
codex_handoff/ is the dedicated resume package for manual continuation
codex_exec_summary.json captures the last observed Codex event, last completed item, and inferred stall stage from the live --json event stream
codex_progress_timeline.json records the observed timing of live-run milestones such as Codex launch, bootstrap completion, first workspace write, and validation start
validation and usage reports should still be written in fallback/manual mode

Operator notes:

prefer the installer-first Linux path for this harness; use ./install-fs-mount.sh only when the namespace mount happens on a separate Linux machine
the harness is about the standalone node + namespace story, not the older routed --workspace-url only flow
the mounted workspace directory exposed at nodes/local/fs is the canonical external-agent entrypoint
the clean writable project tree is nodes/local/fs; Spiderweb’s own runtime root is kept separate from that workspace on purpose
the harness creates only a generic dev-template workspace baseline; after attach, Spiderweb must surface a real workspace-root AGENTS.md, and the external agent is responsible for reading that file first and then following the workspace-local ./.spiderweb/* bootstrap projection from inside the workspace
AGENTS.md is the human-facing workspace contract; ./.spiderweb/agent_bootstrap.json and ./.spiderweb/agent_bootstrap_quickref.json are the exact machine-readable bootstrap surface for discovery order, preferred ./.spiderweb/services/* usage, self-home provisioning, service verification/repair, and persistence semantics
the expected interactive user flow is: start codex in the mounted workspace directory, give a short prompt that tells it to read AGENTS.md, and let it work relative to that directory
shared workspace binds persist across agent detach/reattach, while worker-private loopback state is expected to be ephemeral
CODEX_AUTH_MODE=api_key is still the strict fresh-install path, but existing_login is temporarily acceptable for reliability because host ~/.codex writes are allowlisted by default while still reported as a codex_home machine-independence gap
CODEX_LAUNCH_CMD is optional; the harness can build a default launcher around the pinned codex exec flow
the default live launcher now preserves both logs/codex.stdout.log and logs/codex.pty.log, which makes it much easier to distinguish “still progressing” from “stopped after a tool result”
test-env/test-external-codex-cli-matrix.sh is the fast way to compare pinned Codex CLI versions and PTY/JSON launch modes against the same Spiderweb scenario
test-env/test-external-codex-repeatability.sh is the fast way to prove the new workspace_bootstrap_ok milestone stays green across multiple live runs on the same machine
test-env/package-external-codex-repro.sh collects the matrix outputs into a single upstream-ready repro pack with a generated bug report
custom launch templates may use {codex_bin}, {workspace_root}, {namespace_root}, {namespace_meta_dir}, {workspace_meta_dir}, {shared_data_dir}, {prompt_file}, and {artifact_dir}
the default artifact directory is now outside the repo checkout so the harness does not create false host-repo leakage by itself
the current milestone is workspace_bootstrap_ok; plain Codex still cannot fully clear codex_home, terminal_runtime, and git_runtime under the no-launch-hook rule, so machine_independence_ok remains the follow-on milestone
if you still want to override the launcher, a working template is:

CODEX_MODE=live \
CODEX_AUTH_MODE=api_key \
OPENAI_API_KEY=... \
CODEX_LAUNCH_CMD='cat {prompt_file} | {codex_bin} exec --skip-git-repo-check --dangerously-bypass-approvals-and-sandbox --ephemeral --add-dir {namespace_meta_dir} --add-dir {workspace_meta_dir} --add-dir {shared_data_dir} --add-dir {artifact_dir} -C {workspace_root} -o {artifact_dir}/codex_last_message.txt -' \
bash test-env/test-external-codex-workspace.sh

Native macOS runs default to TRACE_BACKEND=none because the Linux strace path is not available there; the bootstrap and usage report still infer required reads from the Codex event log.

Embedded Multi-Service Integration Test

This repo also includes a local CI-style integration test for the embeddable filesystem + health services example.

# Run directly
bash test-env/test-embed-multi-service.sh

# Or through make
cd test-env && make test-embed-multi-service

What it validates:

boots embed-multi-service-node with a temporary export
probes /fs via spiderweb-fs-mount (readdir + cat)
probes /v1/health with a raw WebSocket handshake and validates ok: true

Useful env vars:

PORT (default 21910)
BIND_ADDR (default 127.0.0.1)
SKIP_BUILD=1 to skip zig build if binaries are already built

Distributed Workspace Failover Test

This test exercises the control-plane + mount integration flow end-to-end:

starts spiderweb
starts two embed-multi-service-node filesystem nodes
negotiates control.version (spiderweb-control) then runs control.node_invite_create, control.node_join, control.workspace_create, control.workspace_mount_set, and control.workspace_activate with workspace mutation auth (workspace_token)
restarts spiderweb and verifies control-plane state is recovered from persisted LTM snapshot
updates mounts live (/src -> /live) and validates the mount client converges to the new path
mounts both nodes at the same project mount path (/src) as a failover group
verifies reads initially come from node A/B, kills the active node, and verifies failover
restarts the stopped node, rejoins/remounts it, then kills the surviving node to verify second failover convergence

Additional focused scenarios:

test-distributed-workspace-bootstrap.sh: validates control.workspace_up bootstrap output and workspace desired/actual/drift schema.
test-distributed-workspace-drift.sh: forces a desired/actual mismatch and verifies drift + reconcile diagnostics.
test-distributed-workspace-matrix.sh: runs failover/reconnect/bootstrap/drift as one matrix entrypoint.

# Run directly
bash test-env/test-distributed-workspace.sh

# Or through make
cd test-env && make test-distributed-workspace
cd test-env && make test-distributed-workspace-bootstrap
cd test-env && make test-distributed-workspace-drift
cd test-env && make test-distributed-workspace-matrix
cd test-env && make test-distributed-workspace-encrypted
cd test-env && make test-distributed-workspace-operator-token
cd test-env && make test-distributed-soak-chaos
cd test-env && make test-spiderweb-control-protocol

Useful env vars:

SPIDERWEB_PORT (default 28790)
NODE1_PORT (default 28911)
NODE2_PORT (default 28912)
BIND_ADDR (default 127.0.0.1)
SPIDERWEB_CONTROL_OPERATOR_TOKEN (optional; include operator_token in protected mutations if enabled)
SPIDERWEB_CONTROL_STATE_KEY_HEX (optional; enables encrypted control-plane snapshot storage)
ASSERT_OPERATOR_TOKEN_GATE=1 (optional; assert mutation deny/allow behavior before the main workflow)
SPIDERWEB_METRICS_PORT (optional; enables HTTP /livez, /readyz, /metrics (Prometheus), /metrics.json (JSON))
SKIP_BUILD=1 to skip zig build if binaries are already built

Unified v2 Protocol Validation

Validates protocol-level contract points used in release checks:

control negotiation order (control.version -> control.connect)
runtime Acheron negotiation order (acheron.t_version -> acheron.t_attach)
standalone FS routing order (acheron.t_fs_hello must come first)
standalone FS HELLO auth-token enforcement (--auth-token)
source-level envelope/type guard in core client code paths

# Run directly
bash test-env/test-spiderweb-control-protocol.sh

# Or through make
cd test-env && make test-spiderweb-control-protocol

Useful env vars:

SPIDERWEB_PORT (default 28794)
FS_NODE_PORT (default 28931)
BIND_ADDR (default 127.0.0.1)
SKIP_BUILD=1 to skip zig build if binaries are already built

Soak / Chaos Suite

Runs the distributed workspace flow repeatedly with randomized ports and optional auth/encryption modes.

# Run directly
bash test-env/test-distributed-soak-chaos.sh

# Or through make
cd test-env && make test-distributed-soak-chaos

Useful env vars:

SOAK_ITERATIONS (default 10)
SOAK_ENABLE_OPERATOR_MODE=0|1 (default 1)
SOAK_ENABLE_ENCRYPTED_MODE=0|1 (default 1)

Wiping and Restarting

# Stop and remove container (data is lost - this is the point!)
docker-compose down

# Remove the image to force rebuild
docker-compose down --rmi local

# Start fresh
docker-compose up --build

Files

Dockerfile - Minimal Debian with dependencies pre-installed
docker-compose.yml - Container orchestration
test-install.sh - Automated test of the install script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Spiderweb Test Environment

Quick Start

Manual Testing

Testing the Install Script

External Codex Workspace E2E Harness

Embedded Multi-Service Integration Test

Distributed Workspace Failover Test

Unified v2 Protocol Validation

Soak / Chaos Suite

Wiping and Restarting

Files

Name		Name	Last commit message	Last commit date
parent directory ..
codex-assets		codex-assets
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-external-codex-repro.sh		package-external-codex-repro.sh
test-distributed-soak-chaos.sh		test-distributed-soak-chaos.sh
test-distributed-workspace-bootstrap.sh		test-distributed-workspace-bootstrap.sh
test-distributed-workspace-drift.sh		test-distributed-workspace-drift.sh
test-distributed-workspace-encrypted.sh		test-distributed-workspace-encrypted.sh
test-distributed-workspace-matrix.sh		test-distributed-workspace-matrix.sh
test-distributed-workspace-operator-token.sh		test-distributed-workspace-operator-token.sh
test-distributed-workspace.sh		test-distributed-workspace.sh
test-embed-multi-service.sh		test-embed-multi-service.sh
test-external-codex-cli-matrix.sh		test-external-codex-cli-matrix.sh
test-external-codex-repeatability.sh		test-external-codex-repeatability.sh
test-external-codex-workspace-light.sh		test-external-codex-workspace-light.sh
test-external-codex-workspace-macos-light.sh		test-external-codex-workspace-macos-light.sh
test-external-codex-workspace-macos.sh		test-external-codex-workspace-macos.sh
test-external-codex-workspace-orb-light.sh		test-external-codex-workspace-orb-light.sh
test-external-codex-workspace-orb.sh		test-external-codex-workspace-orb.sh
test-external-codex-workspace.sh		test-external-codex-workspace.sh
test-install.sh		test-install.sh
test-spiderweb-control-protocol.sh		test-spiderweb-control-protocol.sh
test-unified-v2-protocol.sh		test-unified-v2-protocol.sh

FilesExpand file tree

test-env

Directory actions

More options

Directory actions

More options

Latest commit

History

test-env

Folders and files

parent directory

README.md

Spiderweb Test Environment

Quick Start

Manual Testing

Testing the Install Script

External Codex Workspace E2E Harness

Embedded Multi-Service Integration Test

Distributed Workspace Failover Test

Unified v2 Protocol Validation

Soak / Chaos Suite

Wiping and Restarting

Files