Skip to content

Nuralyio/the-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

263 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

The agent - AI-Powered Browser Automation Framework

CI Tests Release TypeScript License: MIT Node.js Contributions Welcome

The Agent Web Dashboard
Real-time browser automation

🚧 Project Status

The Agent is in active development (Alpha)

A comprehensive, AI-powered browser automation framework that understands natural language instructions and executes web automation tasks intelligently. the agent provides multi-adapter support for different browser engines with real-time monitor and monitoring capabilities.

πŸ“Š Project Completion Status

Overall Progress: ~30% οΏ½

πŸ”„ In Progress (Alpha Development)

  • Core framework stabilization and testing
  • Basic error handling and recovery mechanisms
  • Essential documentation and setup guides
  • Fundamental feature implementation

✨ Key Features

  • πŸ€– AI-Powered Automation: Natural language instruction processing using multiple AI providers (Ollama, OpenAI, Mistral)

  • 🌐 Multi-Browser Support: Works with Playwright, Puppeteer, and Selenium adapters

  • 🎯 Intelligent Element Detection: AI-driven element identification and interaction

  • πŸ“Š Real-Time Dashboard: Modern React-based web UI for monitoring and control

  • πŸ”„ WebSocket Integration: Live automation streaming and status updates

  • πŸ› οΈ CLI Interface: Command-line tools for scripting and automation

  • πŸ§ͺ Unit Testing: Comprehensive unit test coverage with CI/CD integration

  • πŸ“ˆ LLM Observability: OpenTelemetry and Langfuse integration for monitoring AI interactions

  • Live Execution Monitoring - Watch your automation tasks execute in real-time

  • Plan Monitor - See complex tasks broken down into manageable sub-plans

  • Action Detail Display - View target selectors, action types, and execution values

  • Browser Preview - Live browser screenshots and page state

  • Event Stream Integration - Real-time status updates and progress tracking

οΏ½πŸ—οΈ Architecture

This project is organized as a TypeScript monorepo with the following packages:

πŸ“¦ Core Packages

  • @theagent/core - Core automation framework with multi-adapter support and AI integration
  • @theagent/api - HTTP API server with WebSocket support for real-time communication
  • @theagent/web-ui - Modern Remix-based dashboard for monitoring and control
  • @theagent/cli - Command-line interface for automation scripting
  • @theagent/mcp-server - Model Context Protocol server for AI assistant integration

πŸ”§ Development Tools

  • tools/test-server/ - Local test server for automation testing (port 3005)
  • tools/scripts/ - Development and build scripts
  • tools/config/ - Shared configuration files

πŸš€ Quick Start

Prerequisites

  • Node.js 18+ (recommended: use latest LTS)
  • npm or yarn package manager
  • Git for version control

Installation

# Clone the repository
git clone https://github.com/Nuralyio/the-agent.git
cd the-agent

# Install dependencies for all packages
npm install

# Install browser dependencies (Playwright browsers)
npm run install:browsers

Configuration

The Agent uses a unified configuration system that discovers settings from multiple sources in order of precedence:

  1. Environment variables (highest precedence)
  2. Configuration files discovered hierarchically
  3. Default values (lowest precedence)

Configuration Files

Create a theagent.config.js file in your project root or any parent directory:

# Copy the template and customize
cp theagent.config.template.js theagent.config.js

Example configuration:

module.exports = {
  browser: {
    adapter: 'playwright',
    type: 'chrome',
    headless: false,
    timeout: 30000,
    retries: 3,
  },
  llm: {
    // Active profile selection
    active: 'local',

    // Multiple LLM profiles
    profiles: {
      local: {
        provider: 'ollama',
        model: 'llama3:8b',
        baseUrl: 'http://localhost:11434',
        description: 'Local Ollama setup',
      },
      openai: {
        provider: 'openai',
        model: 'gpt-4o',
        baseUrl: 'https://api.openai.com/v1',
        // apiKey: process.env.OPENAI_API_KEY
        description: 'OpenAI GPT-4o',
      },
      claude: {
        provider: 'anthropic',
        model: 'claude-3-sonnet',
        // apiKey: process.env.ANTHROPIC_API_KEY
        description: 'Anthropic Claude',
      },
    },
  },
  execution: {
    logsDir: './execution-logs',
    screenshotsDir: './screenshots',
    screenshotOnError: true,
  },
};

Environment Variables

Alternatively, use environment variables:

# Copy the template and customize
cp .env.template .env

Key environment variables:

# LLM Configuration with multiple profiles
THEAGENT_LLM_ACTIVE=local  # Select which profile to use

# Single profile from environment (backward compatibility)
THEAGENT_LLM_PROFILE=env   # Profile name for env-based config
THEAGENT_LLM_PROVIDER=ollama
THEAGENT_LLM_MODEL=llama3:8b
THEAGENT_LLM_BASE_URL=http://localhost:11434
THEAGENT_LLM_API_KEY=your-api-key  # for cloud providers

# Browser Configuration
THEAGENT_ADAPTER=playwright
THEAGENT_BROWSER=chrome
THEAGENT_HEADLESS=false

# Legacy AI format (still supported)
THEAGENT_AI_PROVIDER=ollama
THEAGENT_AI_MODEL=llama3:8b

# Provider-specific keys
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key

⚠️ Security Note: Configuration files and .env files are excluded from git to prevent accidental exposure of API keys and sensitive data.

Multiple LLM Profiles

The new configuration system supports multiple LLM profiles, allowing you to easily switch between different providers and models:

Benefits:

  • πŸ”„ Easy Switching: Switch between local and cloud models instantly
  • 🏷️ Named Profiles: Descriptive names for different use cases
  • βš™οΈ Per-Profile Settings: Different temperature/token settings per profile
  • πŸ” Secure API Keys: Keep sensitive keys in environment variables

Profile Management:

// In your application code
const configManager = ConfigManager.getInstance();

// List all profiles
const profiles = configManager.listLLMProfiles();
console.log('Available profiles:', profiles);

// Switch profiles programmatically
configManager.switchLLMProfile('openai');

// Get current active profile
const activeProfile = configManager.getActiveLLMProfile();

Use Cases:

  • Development: Use fast local models for development
  • Production: Switch to high-quality cloud models for production
  • Testing: Use specific models for different test scenarios
  • Cost Optimization: Use cheaper models for simple tasks, premium for complex ones

Development

# Build core package first (required for development)
npm run build -w packages/core

# Start all services simultaneously
npm run dev

# Or start individual services
npm run dev:core    # Core package development
npm run dev:api     # API server only (port 3002)
npm run dev:ui      # Web UI only (port 3003)
npm run dev:cli     # CLI development

Important: For initial setup or after clean install, build the core package first before running development servers.

Access Points

πŸ› οΈ Usage Examples

Basic Automation

import { TheAgent } from '@theagent/core';

const automation = new TheAgent({
  adapter: 'playwright',
  browserType: 'chromium',
  headless: false,
  ai: {
    provider: 'ollama',
    model: 'llama3.2',
  },
});

await automation.initialize();
const result = await automation.executeTask(
  "Navigate to google.com and search for 'TypeScript automation'",
);

CLI Usage

# Install CLI globally
npm install -g @theagent/cli

# Run automation from command line
theagent execute "Take a screenshot of github.com"
theagent navigate "https://example.com" --adapter playwright

API Integration

# Start automation task via REST API
curl -X POST http://localhost:3002/api/execute \
  -H "Content-Type: application/json" \
  -d '{"instruction": "Click the login button", "url": "https://example.com"}'

πŸ§ͺ Testing

Running Tests

# Run unit tests (recommended for CI/CD)
npm run test:unit

# Run all tests (unit + integration locally)
npm test

# Run integration tests (local development only)
npm run test:integration

# Test specific package
npm run test -w packages/core

# Watch mode for development
npm run test:watch

# Generate coverage report
npm run test:coverage

Test Structure

  • Unit Tests: Located in src/**/*.test.ts files
  • Integration Tests: Located in src/tests/integration/ (local development only)
  • Test Environment: Node.js with Jest and ts-jest
  • CI/CD: Only unit tests run in GitHub Actions for reliability

πŸ”¨ Building

# Build all packages (builds in dependency order)
npm run build

# Build core package first (required for other packages)
npm run build -w packages/core

# Build specific package
npm run build -w packages/api
npm run build -w packages/cli

# Clean build artifacts
npm run clean

Note: Always build @theagent/core first as other packages depend on it.

πŸ“š Documentation

Core Documentation

File Purpose Audience
README.md Main project overview and setup guide All users
CONTRIBUTING.md Development guidelines and contribution process Contributors
CHANGELOG.md Version history and release notes All users
LICENSE MIT license terms All users

Framework Documentation

File Purpose Audience
.github/WORKFLOWS.md CI/CD workflows and testing strategy Contributors
mainprompt.md Project architecture and design specifications Developers

Package Documentation

Package README Purpose
Core packages/core/README.md Browser automation framework API
API Server packages/api/README.md HTTP server and WebSocket documentation
Web UI packages/web-ui/README.md Dashboard setup and customization
CLI packages/cli/README.md Command-line interface usage
MCP Server packages/mcp-server/README.md Model Context Protocol server integration

Development Tools Documentation

Tool README Purpose
Test Server tools/test-server/README.md Local test server for development

βš™οΈ Configuration

Environment Variables

Key configuration options available in .env:

# AI Provider Settings
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama2
OPENAI_API_KEY=your-key
MISTRAL_API_KEY=your-key

# Server Ports
PORT=3002                    # API server port
TEST_SERVER_PORT=3005        # Test server port

# Browser Settings
DEFAULT_BROWSER=chromium
HEADLESS=true
VIEWPORT_WIDTH=1280
VIEWPORT_HEIGHT=720

Browser Adapters

Supported browser automation adapters:

  • Playwright (recommended): Modern, fast, reliable
  • Puppeteer: Chrome/Chromium focused
  • Selenium: Legacy support, broad compatibility

Code Quality

Linting and Formatting

# Lint all packages
npm run lint

# Fix linting issues
npm run lint:fix

# Format code
npm run format

# Check formatting
npm run format:check

# Type checking
npm run typecheck

Pre-commit Hooks

Husky is configured to run:

  • ESLint with auto-fix
  • Prettier formatting
  • TypeScript type checking
  • Test suite

🀝 Contributing

We welcome contributions! Please read our Contributing Guide for details on:

  • πŸš€ Quick Start: Setting up your development environment
  • πŸ“‹ Development Guidelines: Code standards and testing requirements
  • πŸ”§ Development Workflow: Step-by-step contribution process
  • πŸ“ Pull Request Process: How to submit changes
  • πŸ› Bug Reports: How to report issues effectively
  • πŸ’‘ Feature Requests: How to suggest new features

License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ†˜ Support & Contact

πŸ“– Documentation

πŸ› Issues & Support

πŸš€ Community

Maintainers

πŸ’ Sponsors

We're grateful for the support from our sponsors who help make this project possible!

🌟 Main Sponsor

Nuraly
AI platform for building apps

🀝 Become a Sponsor

Support the development of the agent and help us build the future of AI-powered browser automation!

Why Sponsor?

  • πŸš€ Accelerate feature development
  • πŸ› οΈ Priority support and feature requests
  • πŸ“ˆ Your logo featured here and in our documentation
  • 🎯 Help shape the project roadmap

How to Sponsor:

🎯 Roadmap

  • LLM Observability - OpenTelemetry and Langfuse integration βœ…
  • Additional AI provider integrations
  • Enhanced element detection algorithms
  • Mobile browser automation support
  • Cloud deployment templates
  • Performance benchmarking tools
  • Plugin system architecture

Note: This is an active development project. Features and APIs may change. Please check the documentation and releases for the latest updates.

About

A prompt-based test automation tool that uses AI agents to generate, run, and maintain tests with natural language, reducing manual effort and boosting efficiency.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages