A powerful web automation framework that utilizes Large Language Models (LLMs) and computer vision to adapt to dynamic websites by assigning specialized agents to individual subtasks like element detection, action selection, and monitoring outcomes.
- Agents handle new websites without customized scripts
- Intelligent element detection using both DOM parsing and computer vision
- Dynamic adaptation to layout changes and website updates
- Layout changes don't hinder workflow execution
- Multiple fallback strategies for element detection
- Robust error handling and recovery mechanisms
- Multi-agent architecture with specialized roles
- Seamless agent communication and coordination
- Scalable and maintainable system design
- Agents leverage LLMs to manage sophisticated interactions
- Advanced decision-making capabilities
- Context-aware action planning
The system employs a multi-agent architecture where each agent specializes in specific tasks:
graph TD
A[Task Input] --> B[Element Selector Agent]
B --> C[Action Selector Agent]
C --> D[Action Formatter Agent]
D --> E[Monitor Agent]
F[Vision Agent] --> G[Hybrid Element Selector]
B --> G
G --> C
E --> H[Execution Results]
- Element Selector Agent: Extracts relevant interactive elements from web pages using DOM parsing
- Vision Agent: Performs visual element detection and analysis using computer vision
- Hybrid Element Selector: Combines DOM parsing and computer vision for robust element detection
- Action Selector Agent: Determines the most appropriate actions based on objectives
- Action Formatter Agent: Formats actions into executable structured formats
- Monitor Agent: Monitors execution process and handles errors
- Node.js 16+
- Python 3.8+
- Chrome Browser (for web automation)
git clone <repository-url>
cd multi-agent-web-automation
python scripts/start.py# Install dependencies
npm run install:all
# Start both frontend and backend
npm run devAccess Points:
- ๐ Frontend: http://localhost:3000
- ๐ง Backend API: http://localhost:8000
- ๐ API Documentation: http://localhost:8000/docs
For detailed setup instructions, see QUICK_START.md.
- Insurance Quotes: Automated form filling across multiple insurance websites
- Competitive Analysis: Systematic data collection from competitor websites
- Job Applications: Streamlined application submission across job portals
- Price Monitoring: Real-time tracking of product prices across e-commerce sites
- Website Compatibility Testing: Automated testing across different browsers and devices
- User Experience Analysis: Systematic evaluation of user interaction flows
- A/B Testing: Automated testing of different website variants
- Market Research: Automated survey and data collection
- Content Aggregation: Systematic gathering of information from multiple sources
- Social Media Monitoring: Automated tracking of brand mentions and engagement
- React 18+ with TypeScript
- Material-UI for component library
- React Flow for visual workflow building
- Vite for fast development and building
- FastAPI for high-performance API
- Python with async/await support
- Selenium for browser automation
- OpenAI API for LLM integration
- OpenCV for image processing
- Tesseract OCR for text recognition
- EasyOCR for advanced text detection
- NumPy/SciPy for numerical computations
- Create Task Node: Set target URL and automation goal
- Add Agent Nodes: Configure specialized agents for your workflow
- Connect Nodes: Define execution flow with visual connections
- Execute Workflow: Monitor real-time execution and results
// Example workflow configuration
const workflow = {
nodes: [
{
id: 'task1',
type: 'taskNode',
data: {
url: 'https://example-insurance.com',
goal: 'Get auto insurance quote',
payload: { vehicleYear: 2020, zipCode: '12345' }
}
},
{
id: 'agent1',
type: 'agentNode',
data: {
agentType: 'hybrid_element_selector',
config: { model: 'gpt-4o-mini', temperature: 0.7 }
}
}
],
edges: [
{ source: 'task1', target: 'agent1' }
]
};The system includes advanced computer vision features:
- Screenshot-based element identification
- Layout analysis and spatial reasoning
- Fallback detection when DOM parsing fails
- Tesseract OCR for traditional text recognition
- EasyOCR for advanced multi-language support
- Dynamic text extraction from images and canvas elements
- Combines DOM parsing with visual detection
- Cross-validation of element identification
- Improved accuracy and reliability
- High Accuracy: 90%+ success rate across diverse websites
- Fast Execution: Average workflow completion in 30-60 seconds
- Scalability: Handles 100+ concurrent workflow executions
- Error Recovery: Automatic fallback strategies
- Monitoring: Real-time execution tracking and logging
- Validation: Multi-layer verification of actions and results
# OpenAI API (recommended)
export OPENAI_API_KEY="your-api-key"
# Custom browser settings
export CHROME_BINARY_PATH="/path/to/chrome"
export CHROMEDRIVER_PATH="/path/to/chromedriver"
# API configuration
export BACKEND_PORT=8000
export FRONTEND_PORT=3000Each agent can be customized with specific parameters:
agent_config = {
"model": "gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 2000,
"description": "Custom agent behavior"
}- Execution progress tracking
- Error reporting and diagnosis
- Performance metrics collection
- Step-by-step execution mode
- Screenshot capture at each step
- Detailed agent communication logs
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
# Clone for development
git clone <repository-url>
cd multi-agent-web-automation
# Install development dependencies
npm install
cd frontend && npm install
cd ../backend && pip install -r requirements.txtThis project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for GPT model access
- Selenium team for web automation framework
- React Flow for visual workflow capabilities
- FastAPI for high-performance backend framework
- Documentation: See QUICK_START.md for detailed setup
- Issues: Report bugs and feature requests via GitHub issues
- Discussions: Join community discussions for questions and ideas
Built with โค๏ธ for the automation community