This file provides context and guidelines for working with the bluebox codebase.
uv venv bluebox-env && source bluebox-env/bin/activate- Create and activate virtual environment (recommended)python3 -m venv bluebox-env && source bluebox-env/bin/activate- Alternative venv creationuv pip install -e .- Install package in editable mode (faster with uv)pip install -e .- Install package in editable mode (standard)
pytest tests/ -v- Run all tests with verbose outputpytest tests/unit/test_js_utils.py -v- Run specific test filepytest tests/unit/test_js_utils.py::test_function_name -v- Run specific testpython scripts/dev/run_benchmarks.py- Run routine discovery benchmarkspython scripts/dev/run_benchmarks.py -v- Run benchmarks with verbose output
bluebox-monitor --host 127.0.0.1 --port 9222 --output-dir ./cdp_captures --url about:blank --incognito- Start browser monitoringbluebox-discover --task "your task description" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5.2- Discover routines from capturesbluebox-execute --routine-path example_data/example_routines/amtrak_one_way_train_search_routine.json --parameters-path example_data/example_routines/amtrak_one_way_train_search_input.json- Execute a routinebluebox-api-index --cdp-captures-dir ./cdp_captures --task "your task" --output-dir ./api_indexing_output --model gpt-5.2 --post-run-analysis- Run the API indexing pipeline (exploration + routine construction)bluebox-agent-adapter --agent NetworkSpecialist --cdp-captures-dir ./cdp_captures- Start HTTP adapter for programmatic agent interaction (see Agent HTTP Adapter section below)bluebox-agent-adapter --list-agents- List all available agents and their required data
- macOS:
/Applications/Google Chrome.app/Contents/MacOS/Google Chrome --remote-debugging-address=127.0.0.1 --remote-debugging-port=9222 --user-data-dir="$HOME/tmp/chrome" --remote-allow-origins='*' --no-first-run --no-default-browser-check - Verify:
curl http://127.0.0.1:9222/json/version
pylint bluebox/- Run pylint (uses .pylintrc config)
- IMPORTANT: Every function and method MUST have type hints
- Use
-> ReturnTypefor return types - Use
param: Typefor parameters - Use
Optional[Type]orType | Nonefor nullable types - Use
list[Type]instead ofList[Type](Python 3.9+ style)
- IMPORTANT: NO lazy imports! All imports must be at the top of the file
- Use absolute imports from
bluebox.* - Group imports: stdlib, third-party, local (with blank lines between groups)
- Requires Python 3.12+ (specifically
>=3.12.3,<3.13) - Use modern Python features (type hints, f-strings, dataclasses, etc.)
- Use Pydantic
BaseModelfor all data models (seebluebox/data_models/) - Use
Field()for field descriptions and defaults - Use
model_validatorfor custom validation logic - All models should be in
bluebox/data_models/directory
- Use custom exceptions from
bluebox.utils.exceptions - Return
RoutineExecutionResultfor routine execution results - Log errors using
bluebox.utils.logger.get_logger()
- All JavaScript code should be generated through functions in
bluebox/utils/js_utils.py - JavaScript code must be wrapped in IIFE format:
(function() { ... })() - Use helper functions from
_get_placeholder_resolution_js_helpers()for placeholder resolution
- Explore: Read relevant files before coding
- Plan: Make a plan before implementing (use "think" for complex problems)
- Code: Implement with type hints and proper error handling
- Test: Write and run tests
- Commit: Use descriptive commit messages
- Launch Chrome in debug mode (or use quickstart.py)
- Run
bluebox-monitorand perform actions manually - Run
bluebox-discoverwith task description - Review generated
routine.json - Test with
bluebox-execute - Review generated
routine.jsonfor correct parameter types and placeholder usage
bluebox/data_models/routine/routine.py- Main Routine modelbluebox/data_models/routine/operation.py- Operation types and executionbluebox/data_models/routine/parameter.py- Parameter definitionsbluebox/data_models/routine/placeholder.py- Placeholder resolutionbluebox/cdp/connection.py- Chrome DevTools Protocol connectionbluebox/utils/js_utils.py- JavaScript code generationbluebox/utils/web_socket_utils.py- WebSocket utilities for CDPbluebox/sdk/client.py- Main SDK clientbluebox/workspace.py- Agent workspace (artifact-oriented file I/O with provenance tracking)
AI agents that power routine discovery, API indexing, and conversational interactions. All agents inherit from AbstractAgent (bluebox/agents/abstract_agent.py).
Core agents:
bluebox/agents/routine_discovery_agent.py- Analyzes CDP captures to generate routines (identifies transactions, extracts/resolves variables, constructs operations)bluebox/agents/guide_agent.py- Conversational agent for guiding users through routine creation/editing (maintains chat history, dynamic tool registration)bluebox/agents/bluebox_agent.py- General-purpose conversational agent
API Indexing Pipeline agents:
bluebox/agents/principal_investigator.py- Orchestrator: plans routine catalog, dispatches experiments to workers, reviews results, assembles and ships routinesbluebox/agents/workers/experiment_worker.py- Browser-capable execution agent: live browser tools + recorded capture lookup tools, executes experimentsbluebox/agents/routine_inspector.py- Independent quality gate: scores routines on 6 dimensions, hard-fails on 4xx/5xx or unresolved placeholders
Specialists (domain-specific agents for exploration):
bluebox/agents/specialists/network_specialist.py- Network traffic analysisbluebox/agents/specialists/dom_specialist.py- DOM structure analysisbluebox/agents/specialists/interaction_specialist.py- UI interaction analysisbluebox/agents/specialists/js_specialist.py- JavaScript file analysisbluebox/agents/specialists/value_trace_resolver_specialist.py- Storage & window property analysis
Agent HTTP Adapter (bluebox/scripts/agent_http_adapter.py):
HTTP wrapper that exposes any AbstractAgent subclass as a JSON API, enabling programmatic interaction via curl. Agents are auto-discovered at runtime — adding a new AbstractAgent subclass makes it available with zero adapter changes.
# Start adapter with a specific agent
bluebox-agent-adapter --agent NetworkSpecialist --cdp-captures-dir ./cdp_captures
# Agents with no data requirements (e.g. BlueBoxAgent) don't need --cdp-captures-dir
bluebox-agent-adapter --agent BlueBoxAgentEndpoints:
GET /health— liveness checkGET /status— agent type, chat state, discovery supportPOST /chat {"message": "..."}— send a chat message (all agents)POST /discover {"task": "..."}— run discovery/autonomous modeGET /routine— retrieve discovered routine JSON
Best practices when calling from Claude Code or scripts:
- Use
--max-time 300(5 min) on curl calls. The first/chator/discoverrequest triggers a cold start (agent construction + first LLM round-trip) that can take 2+ minutes. Subsequent requests are fast since the agent stays in memory. - Start the adapter in the background and poll
/healthuntil ready before sending requests. - Use
-q(quiet mode) to suppress Bluebox logging noise from the adapter process. - Save responses to files (
-o /tmp/response.json) rather than piping directly, to avoid losing data on timeout. - Constructor params are auto-wired via
inspect.signature— the adapter maps data loader param names (handling the_loader/_storenaming split) to canonical keys automatically.
LLM Infrastructure:
bluebox/llms/data_loaders/- Specialized data loaders for CDP capture analysis:NetworkDataLoader- HTTP request/response transactionsDOMDataLoader- DOM snapshots (string-interning tables, element classification by tag family)JSDataLoader- JavaScript filesStorageDataLoader- Cookies, localStorage, sessionStorage, IndexedDBWindowPropertyDataLoader- Window property changesInteractionsDataLoader- UI interaction eventsDocumentationDataLoader- Documentation files
bluebox/llms/infra/data_store.py- Legacy data stores (soon to be deprecated)
Import patterns:
from bluebox.agents.abstract_agent import AbstractAgent, agent_tool, AgentCard
from bluebox.agents.guide_agent import GuideAgent
from bluebox.agents.routine_discovery_agent import RoutineDiscoveryAgent
from bluebox.agents.principal_investigator import PrincipalInvestigator
from bluebox.agents.workers.experiment_worker import ExperimentWorker
from bluebox.agents.routine_inspector import RoutineInspector
from bluebox.workspace import AgentWorkspace, LocalAgentWorkspace
from bluebox.llms.data_loaders.network_data_loader import NetworkDataLoader
from bluebox.llms.data_loaders.dom_data_loader import DOMDataLoader
from bluebox.llms.data_loaders.js_data_loader import JSDataLoaderThe workspace (bluebox/workspace.py) is an artifact-oriented file I/O system attached to agents. Each workspace has a strict directory layout:
raw/(read-only): tool result artifacts and mounted external filesoutput/: agent-generated deliverablescontext/: reusable notes/context saved for later use in the same runmeta/: system-managed metadata (manifest.jsonl,input_mounts.jsonl) — not editablescratch/: ephemeral scratch space
External files (e.g. CDP capture JSONL) can be mounted into raw/ via hardlinks using attach_input_file(). The save_artifact() API records provenance in meta/manifest.jsonl (SHA-256, size, content type, timestamp).
End-to-end pipeline (bluebox-api-index) that turns raw CDP captures into a catalog of executable routines.
Phase 1 — Exploration (4 specialists in parallel): Network, Storage, DOM, and UI specialists each produce a structured exploration summary.
Phase 2 — Routine Construction: PrincipalInvestigator reads summaries, dispatches ExperimentWorker agents, reviews results, assembles routines, submits to RoutineInspector for quality gating. Incremental persistence to disk. PI crash recovery via DiscoveryLedger.
Data models:
bluebox/data_models/orchestration/-DiscoveryLedger,ExperimentEntry,RoutineSpec,RoutineAttempt,RoutineCatalog,RoutineInspectionResultbluebox/data_models/api_indexing/-NetworkExplorationSummary,StorageExplorationSummary,DOMExplorationSummary,UIExplorationSummary
- Routine Execution: Operations execute sequentially, maintaining state via
RoutineExecutionContext - Placeholder Resolution: All parameters use
{{paramName}}format;Parameter.typedrives coercion at runtime - Session Storage: Use
session_storage_keyto store and retrieve data between operations - CDP Sessions: Use flattened sessions for multiplexing via
session_id - Agent Tools: Decorate with
@agent_tool(). Supportspersist(NEVER/ALWAYS/OVERFLOW),max_characters, andtoken_optimizedparameters - Agent Card: Every concrete
AbstractAgentsubclass must declare anAGENT_CARD
- All parameters use uniform
"field": "{{param}}"format;Parameter.typehandles coercion - Chrome must be running in debug mode on
127.0.0.1:9222before executing routines - Runtime placeholders (
sessionStorage,localStorage,cookie,meta) are resolved in the browser, not Python-side - All parameters must be used in the routine (validation enforces this)
- Builtin parameters (
epoch_milliseconds,uuid) don't need to be defined in parameters list
- Unit tests in
tests/unit/ - Test data in
tests/data/input/andtests/data/expected_output/ - Use
pytestfixtures fromtests/conftest.py
- Test files should start with
test_ - Test functions should start with
test_ - Use descriptive test names
- Test both success and failure cases
- Test edge cases and validation
- Prefer running single tests or test files for faster iteration
- Run full test suite before committing:
pytest tests/ -v - Benchmarks validate routine discovery pipeline:
python scripts/dev/run_benchmarks.py
- Python 3.12.3+ (use
pyenv install 3.12.3if needed) - Google Chrome (stable)
- uv (recommended) or pip
- OpenAI API key (set
OPENAI_API_KEYenvironment variable)
- Always use a virtual environment
- Activate before working:
source bluebox-env/bin/activate - Install dependencies:
uv pip install -e .orpip install -e .
OPENAI_API_KEY- Required for routine discovery- Can use
.envfile withpython-dotenv(loaded automatically withuv run)
- Use descriptive branch names
- Examples:
feature/add-new-operation,fix/placeholder-resolution,refactor/cdp-connection
- Write clear, descriptive commit messages
- Reference issues when applicable
- Separate logical changes into different commits
- Add tests for new features
- Ensure all tests pass:
pytest tests/ -v - Update documentation if needed
- Follow existing code style
example_data/example_routines/amtrak_one_way_train_search_routine.json- Train search exampleexample_data/example_routines/download_arxive_paper_routine.json- Paper download exampleexample_data/example_routines/massachusetts_corp_search_routine.json- Corporate search example
Use these as references when creating new routines or understanding the routine format.