A DIY "Bright Data-class" proxy platform that provides AI agents with reliable web search capabilities at approximately 10x lower cost than commercial solutions.
This open-source platform combines cheap datacenter proxies (OVH + Hetzner) with on-demand residential proxies (DataImpulse) to create a cost-effective, scalable proxy infrastructure optimized for AI agent workloads.
- Smart Routing: Cache-first β Search API fallback β Datacenter proxies β Residential backup
- Cost Optimization: ~$0.00026 per page vs commercial solutions
- High Availability: 96%+ success rate with automatic failover
- Scalable Architecture: Start with 16 nodes, scale as needed
- WireGuard Mesh: Secure inter-node communication
- Real-time Monitoring: Prometheus + Grafana dashboards
- MCP Integration: Model Context Protocol server for AI agent integration
- Ban Detection: Automatic proxy rotation and health checking
| Metric | Target |
|---|---|
| Google/Bing success rate | β₯96% with β€3 retries |
| Median SERP latency | β€3 seconds |
| Cost per fetched page | β€$0.003 |
| Operational overhead | 1 DevOps + 0.2 FTE on-call |
| Component | Cost |
|---|---|
| 8x OVH VLE-2 instances | $44/month |
| 8x Hetzner CX11 instances | $34.60/month |
| DataImpulse residential (10GB) | $10/month |
| Total | ~$104/month |
| Cost per page | $0.00026 |
graph TD
subgraph "Edge Proxies (OVH & Hetzner)"
P1((Squid/3proxy)) -->|WireGuard| Core
P2((Squid/3proxy)) -->|WireGuard| Core
Pn((...))
end
subgraph "Core Services"
Rota[Rota Router] --> Redis[(Cache)]
Rota --> Prom[Prometheus]
Rota --> Queue[Request Queue]
Crawlee{{Worker pods}} --> Target[Web]
end
Target -.->|Blocked| Rota
Rota -.->|Retry via residential| DataImpulse[(DataImpulse)]
- Cloud Accounts: OVH Public Cloud + Hetzner Cloud accounts
- Tools: Terraform, Ansible, Go 1.21+, Python 3.9+
- DataImpulse: Account for residential proxy backup
git clone <repository-url>
cd search-infra
# Install dependencies
go mod tidy
pip install -r mcp-server/requirements.txt# Copy and configure Terraform variables
cp terraform/environments/prod/terraform.tfvars.example terraform/environments/prod/terraform.tfvars
# Edit with your credentials
vim terraform/environments/prod/terraform.tfvarsRequired configuration:
# OVH Configuration
ovh_application_key = "your-ovh-application-key"
ovh_application_secret = "your-ovh-application-secret"
ovh_consumer_key = "your-ovh-consumer-key"
ovh_project_service_name = "your-ovh-project-service-name"
# Hetzner Configuration
hcloud_token = "your-hetzner-cloud-token"
# SSH Configuration (will be auto-generated if not provided)
ssh_public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ..."# Deploy to production
./scripts/deploy.sh prod
# Or deploy to staging
./scripts/deploy.sh stagingThe deployment script will:
- β Check prerequisites
- π Generate SSH keys
- ποΈ Build Go applications
- βοΈ Deploy infrastructure with Terraform
- βοΈ Configure services with Ansible
- π Deploy applications
- β Verify deployment
Update the residential proxy configuration:
# config/rota.yaml
residential_proxy:
enabled: true
endpoint: "residential.dataimpulse.com:823"
username: "your-dataimpulse-username"
password: "your-dataimpulse-password"# Fetch a URL through the proxy infrastructure
curl -X POST http://your-rota-ip:8080/api/v1/fetch \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.google.com/search?q=test",
"method": "GET",
"timeout": 30
}'
# Get proxy health status
curl http://your-rota-ip:8080/api/v1/proxies
# Get performance statistics
curl http://your-rota-ip:8080/api/v1/statsThe platform includes a Model Context Protocol (MCP) server for seamless AI agent integration:
# Example: Using the MCP server with an AI agent
from mcp import Client
client = Client("search-infra-proxy")
# Fetch a URL through the proxy infrastructure
result = await client.call_tool("fetch_url", {
"url": "https://example.com",
"method": "GET"
})
# Get proxy health status
health = await client.call_tool("get_proxy_health", {})
# Scale the proxy pool
await client.call_tool("scale_proxy_pool", {
"action": "scale_up",
"count": 2,
"provider": "hetzner"
})package main
import (
"context"
"search-infra/pkg/router"
)
func main() {
config, _ := router.LoadConfig("config/rota.yaml")
r, _ := router.NewRouter(config)
result, err := r.Fetch(context.Background(), router.FetchRequest{
URL: "https://www.google.com/search?q=test",
Method: "GET",
})
if err == nil {
fmt.Printf("Status: %d, Proxy: %s\n",
result.StatusCode, result.ProxyUsed)
}
}Access the monitoring dashboards:
- Rota API:
http://your-core-ip:8080 - Prometheus:
http://your-core-ip:9090 - Grafana:
http://your-core-ip:3000
Key metrics to monitor:
- Proxy success rates
- Cache hit ratios
- Request latency
- Ban rates by provider
- Residential proxy usage
# Scale up (add 2 Hetzner instances)
./scripts/scale.sh up 2 hetzner
# Scale down (remove 1 OVH instance)
./scripts/scale.sh down 1 ovh
# Check current status
./scripts/deploy.sh status# Ban a specific proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/ban \
-H "Content-Type: application/json" \
-d '{"duration": "1h", "reason": "High error rate"}'
# Unban a proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/unban# Destroy infrastructure (be careful!)
./scripts/deploy.sh destroy prodsearch-infra/
βββ terraform/ # Infrastructure as Code
β βββ modules/ # Reusable Terraform modules
β β βββ ovh-vps/ # OVH VPS module
β β βββ hetzner-vps/ # Hetzner VPS module
β β βββ wireguard-mesh/ # WireGuard mesh networking
β βββ environments/ # Environment-specific configs
β βββ dev/
β βββ staging/
β βββ prod/
βββ ansible/ # Configuration management
β βββ playbooks/ # Ansible playbooks
β βββ roles/ # Reusable roles
β βββ inventory/ # Dynamic inventory
βββ cmd/ # Go applications
β βββ rota/ # Main proxy router service
βββ pkg/ # Go packages
β βββ router/ # Smart routing logic
β βββ health/ # Health checking
β βββ metrics/ # Prometheus metrics
βββ mcp-server/ # Model Context Protocol server
βββ config/ # Configuration files
βββ scripts/ # Deployment and management scripts
βββ monitoring/ # Grafana dashboards and alerts
βββ docs/ # Additional documentation
- WireGuard Mesh: All inter-node communication encrypted
- Firewall Rules: Minimal attack surface with UFW
- Fail2Ban: Automatic IP blocking for suspicious activity
- SSH Keys: Key-based authentication only
- Regular Updates: Automated security updates
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Documentation: Wiki
- Discussions: GitHub Discussions
- OVH Cloud: Affordable datacenter infrastructure
- Hetzner: Reliable European hosting
- DataImpulse: Residential proxy services
- WireGuard: Secure VPN technology
- Prometheus: Monitoring and alerting
Built with β€οΈ for the AI agent community
Achieve enterprise-grade proxy infrastructure at a fraction of the cost.