The Agent is in active development (Alpha)
A comprehensive, AI-powered browser automation framework that understands natural language instructions and executes web automation tasks intelligently. the agent provides multi-adapter support for different browser engines with real-time monitor and monitoring capabilities.
Overall Progress: ~30% οΏ½
- Core framework stabilization and testing
- Basic error handling and recovery mechanisms
- Essential documentation and setup guides
- Fundamental feature implementation
-
π€ AI-Powered Automation: Natural language instruction processing using multiple AI providers (Ollama, OpenAI, Mistral)
-
π Multi-Browser Support: Works with Playwright, Puppeteer, and Selenium adapters
-
π― Intelligent Element Detection: AI-driven element identification and interaction
-
π Real-Time Dashboard: Modern React-based web UI for monitoring and control
-
π WebSocket Integration: Live automation streaming and status updates
-
π οΈ CLI Interface: Command-line tools for scripting and automation
-
π§ͺ Unit Testing: Comprehensive unit test coverage with CI/CD integration
-
π LLM Observability: OpenTelemetry and Langfuse integration for monitoring AI interactions
-
Live Execution Monitoring - Watch your automation tasks execute in real-time
-
Plan Monitor - See complex tasks broken down into manageable sub-plans
-
Action Detail Display - View target selectors, action types, and execution values
-
Browser Preview - Live browser screenshots and page state
-
Event Stream Integration - Real-time status updates and progress tracking
This project is organized as a TypeScript monorepo with the following packages:
@theagent/core- Core automation framework with multi-adapter support and AI integration@theagent/api- HTTP API server with WebSocket support for real-time communication@theagent/web-ui- Modern Remix-based dashboard for monitoring and control@theagent/cli- Command-line interface for automation scripting@theagent/mcp-server- Model Context Protocol server for AI assistant integration
tools/test-server/- Local test server for automation testing (port 3005)tools/scripts/- Development and build scriptstools/config/- Shared configuration files
- Node.js 18+ (recommended: use latest LTS)
- npm or yarn package manager
- Git for version control
# Clone the repository
git clone https://github.com/Nuralyio/the-agent.git
cd the-agent
# Install dependencies for all packages
npm install
# Install browser dependencies (Playwright browsers)
npm run install:browsersThe Agent uses a unified configuration system that discovers settings from multiple sources in order of precedence:
- Environment variables (highest precedence)
- Configuration files discovered hierarchically
- Default values (lowest precedence)
Create a theagent.config.js file in your project root or any parent directory:
# Copy the template and customize
cp theagent.config.template.js theagent.config.jsExample configuration:
module.exports = {
browser: {
adapter: 'playwright',
type: 'chrome',
headless: false,
timeout: 30000,
retries: 3,
},
llm: {
// Active profile selection
active: 'local',
// Multiple LLM profiles
profiles: {
local: {
provider: 'ollama',
model: 'llama3:8b',
baseUrl: 'http://localhost:11434',
description: 'Local Ollama setup',
},
openai: {
provider: 'openai',
model: 'gpt-4o',
baseUrl: 'https://api.openai.com/v1',
// apiKey: process.env.OPENAI_API_KEY
description: 'OpenAI GPT-4o',
},
claude: {
provider: 'anthropic',
model: 'claude-3-sonnet',
// apiKey: process.env.ANTHROPIC_API_KEY
description: 'Anthropic Claude',
},
},
},
execution: {
logsDir: './execution-logs',
screenshotsDir: './screenshots',
screenshotOnError: true,
},
};Alternatively, use environment variables:
# Copy the template and customize
cp .env.template .envKey environment variables:
# LLM Configuration with multiple profiles
THEAGENT_LLM_ACTIVE=local # Select which profile to use
# Single profile from environment (backward compatibility)
THEAGENT_LLM_PROFILE=env # Profile name for env-based config
THEAGENT_LLM_PROVIDER=ollama
THEAGENT_LLM_MODEL=llama3:8b
THEAGENT_LLM_BASE_URL=http://localhost:11434
THEAGENT_LLM_API_KEY=your-api-key # for cloud providers
# Browser Configuration
THEAGENT_ADAPTER=playwright
THEAGENT_BROWSER=chrome
THEAGENT_HEADLESS=false
# Legacy AI format (still supported)
THEAGENT_AI_PROVIDER=ollama
THEAGENT_AI_MODEL=llama3:8b
# Provider-specific keys
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
β οΈ Security Note: Configuration files and.envfiles are excluded from git to prevent accidental exposure of API keys and sensitive data.
The new configuration system supports multiple LLM profiles, allowing you to easily switch between different providers and models:
Benefits:
- π Easy Switching: Switch between local and cloud models instantly
- π·οΈ Named Profiles: Descriptive names for different use cases
- βοΈ Per-Profile Settings: Different temperature/token settings per profile
- π Secure API Keys: Keep sensitive keys in environment variables
Profile Management:
// In your application code
const configManager = ConfigManager.getInstance();
// List all profiles
const profiles = configManager.listLLMProfiles();
console.log('Available profiles:', profiles);
// Switch profiles programmatically
configManager.switchLLMProfile('openai');
// Get current active profile
const activeProfile = configManager.getActiveLLMProfile();Use Cases:
- Development: Use fast local models for development
- Production: Switch to high-quality cloud models for production
- Testing: Use specific models for different test scenarios
- Cost Optimization: Use cheaper models for simple tasks, premium for complex ones
# Build core package first (required for development)
npm run build -w packages/core
# Start all services simultaneously
npm run dev
# Or start individual services
npm run dev:core # Core package development
npm run dev:api # API server only (port 3002)
npm run dev:ui # Web UI only (port 3003)
npm run dev:cli # CLI developmentImportant: For initial setup or after clean install, build the core package first before running development servers.
- API Server: http://localhost:3002
- Web Dashboard: http://localhost:3003
- Test Server: http://localhost:3005
import { TheAgent } from '@theagent/core';
const automation = new TheAgent({
adapter: 'playwright',
browserType: 'chromium',
headless: false,
ai: {
provider: 'ollama',
model: 'llama3.2',
},
});
await automation.initialize();
const result = await automation.executeTask(
"Navigate to google.com and search for 'TypeScript automation'",
);# Install CLI globally
npm install -g @theagent/cli
# Run automation from command line
theagent execute "Take a screenshot of github.com"
theagent navigate "https://example.com" --adapter playwright# Start automation task via REST API
curl -X POST http://localhost:3002/api/execute \
-H "Content-Type: application/json" \
-d '{"instruction": "Click the login button", "url": "https://example.com"}'# Run unit tests (recommended for CI/CD)
npm run test:unit
# Run all tests (unit + integration locally)
npm test
# Run integration tests (local development only)
npm run test:integration
# Test specific package
npm run test -w packages/core
# Watch mode for development
npm run test:watch
# Generate coverage report
npm run test:coverage- Unit Tests: Located in
src/**/*.test.tsfiles - Integration Tests: Located in
src/tests/integration/(local development only) - Test Environment: Node.js with Jest and ts-jest
- CI/CD: Only unit tests run in GitHub Actions for reliability
# Build all packages (builds in dependency order)
npm run build
# Build core package first (required for other packages)
npm run build -w packages/core
# Build specific package
npm run build -w packages/api
npm run build -w packages/cli
# Clean build artifacts
npm run cleanNote: Always build @theagent/core first as other packages depend on it.
| File | Purpose | Audience |
|---|---|---|
README.md |
Main project overview and setup guide | All users |
CONTRIBUTING.md |
Development guidelines and contribution process | Contributors |
CHANGELOG.md |
Version history and release notes | All users |
LICENSE |
MIT license terms | All users |
| File | Purpose | Audience |
|---|---|---|
.github/WORKFLOWS.md |
CI/CD workflows and testing strategy | Contributors |
mainprompt.md |
Project architecture and design specifications | Developers |
| Package | README | Purpose |
|---|---|---|
| Core | packages/core/README.md |
Browser automation framework API |
| API Server | packages/api/README.md |
HTTP server and WebSocket documentation |
| Web UI | packages/web-ui/README.md |
Dashboard setup and customization |
| CLI | packages/cli/README.md |
Command-line interface usage |
| MCP Server | packages/mcp-server/README.md |
Model Context Protocol server integration |
| Tool | README | Purpose |
|---|---|---|
| Test Server | tools/test-server/README.md |
Local test server for development |
Key configuration options available in .env:
# AI Provider Settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama2
OPENAI_API_KEY=your-key
MISTRAL_API_KEY=your-key
# Server Ports
PORT=3002 # API server port
TEST_SERVER_PORT=3005 # Test server port
# Browser Settings
DEFAULT_BROWSER=chromium
HEADLESS=true
VIEWPORT_WIDTH=1280
VIEWPORT_HEIGHT=720Supported browser automation adapters:
- Playwright (recommended): Modern, fast, reliable
- Puppeteer: Chrome/Chromium focused
- Selenium: Legacy support, broad compatibility
# Lint all packages
npm run lint
# Fix linting issues
npm run lint:fix
# Format code
npm run format
# Check formatting
npm run format:check
# Type checking
npm run typecheckHusky is configured to run:
- ESLint with auto-fix
- Prettier formatting
- TypeScript type checking
- Test suite
We welcome contributions! Please read our Contributing Guide for details on:
- π Quick Start: Setting up your development environment
- π Development Guidelines: Code standards and testing requirements
- π§ Development Workflow: Step-by-step contribution process
- π Pull Request Process: How to submit changes
- π Bug Reports: How to report issues effectively
- π‘ Feature Requests: How to suggest new features
This project is licensed under the MIT License. See the LICENSE file for
details.
- Main Guide: README.md - Setup and usage
- Contributing: CONTRIBUTING.md - Development guide
- Workflows: .github/WORKFLOWS.md - CI/CD documentation
- Package Docs: Individual README files in each package
- Changelog: CHANGELOG.md - Version history
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Issues
- Questions: GitHub Discussions
- Security Issues: Email [email protected]
- Discussions: GitHub Discussions
- Contributing: See CONTRIBUTING.md
- Code of Conduct: Be respectful and inclusive
- Project Lead: @maintainer
- Core Team: @team
We're grateful for the support from our sponsors who help make this project possible!
AI platform for building apps
Support the development of the agent and help us build the future of AI-powered browser automation!
Why Sponsor?
- π Accelerate feature development
- π οΈ Priority support and feature requests
- π Your logo featured here and in our documentation
- π― Help shape the project roadmap
How to Sponsor:
- GitHub Sponsors - Preferred method
- LLM Observability - OpenTelemetry and Langfuse integration β
- Additional AI provider integrations
- Enhanced element detection algorithms
- Mobile browser automation support
- Cloud deployment templates
- Performance benchmarking tools
- Plugin system architecture
Note: This is an active development project. Features and APIs may change. Please check the documentation and releases for the latest updates.