Search Infrastructure Proxy Platform

A DIY "Bright Data-class" proxy platform that provides AI agents with reliable web search capabilities at approximately 10x lower cost than commercial solutions.

🎯 Overview

This open-source platform combines cheap datacenter proxies (OVH + Hetzner) with on-demand residential proxies (DataImpulse) to create a cost-effective, scalable proxy infrastructure optimized for AI agent workloads.

Key Features

Smart Routing: Cache-first → Search API fallback → Datacenter proxies → Residential backup
Cost Optimization: ~$0.00026 per page vs commercial solutions
High Availability: 96%+ success rate with automatic failover
Scalable Architecture: Start with 16 nodes, scale as needed
WireGuard Mesh: Secure inter-node communication
Real-time Monitoring: Prometheus + Grafana dashboards
MCP Integration: Model Context Protocol server for AI agent integration
Ban Detection: Automatic proxy rotation and health checking

📊 Performance Targets

Metric	Target
Google/Bing success rate	≥96% with ≤3 retries
Median SERP latency	≤3 seconds
Cost per fetched page	≤$0.003
Operational overhead	1 DevOps + 0.2 FTE on-call

💰 Cost Breakdown (100k SERPs + 300k pages/month)

Component	Cost
8x OVH VLE-2 instances	$44/month
8x Hetzner CX11 instances	$34.60/month
DataImpulse residential (10GB)	$10/month
Total	~$104/month
Cost per page	$0.00026

🏗️ Architecture

graph TD
    subgraph "Edge Proxies (OVH & Hetzner)"
        P1((Squid/3proxy)) -->|WireGuard| Core
        P2((Squid/3proxy)) -->|WireGuard| Core
        Pn((...))
    end
    subgraph "Core Services"
        Rota[Rota Router] --> Redis[(Cache)]
        Rota --> Prom[Prometheus]
        Rota --> Queue[Request Queue]
        Crawlee{{Worker pods}} --> Target[Web]
    end
    Target -.->|Blocked| Rota
    Rota -.->|Retry via residential| DataImpulse[(DataImpulse)]

🚀 Quick Start

Prerequisites

Cloud Accounts: OVH Public Cloud + Hetzner Cloud accounts
Tools: Terraform, Ansible, Go 1.21+, Python 3.9+
DataImpulse: Account for residential proxy backup

1. Clone and Setup

git clone <repository-url>
cd search-infra

# Install dependencies
go mod tidy
pip install -r mcp-server/requirements.txt

2. Configure Credentials

# Copy and configure Terraform variables
cp terraform/environments/prod/terraform.tfvars.example terraform/environments/prod/terraform.tfvars

# Edit with your credentials
vim terraform/environments/prod/terraform.tfvars

Required configuration:

# OVH Configuration
ovh_application_key      = "your-ovh-application-key"
ovh_application_secret   = "your-ovh-application-secret"
ovh_consumer_key         = "your-ovh-consumer-key"
ovh_project_service_name = "your-ovh-project-service-name"

# Hetzner Configuration
hcloud_token = "your-hetzner-cloud-token"

# SSH Configuration (will be auto-generated if not provided)
ssh_public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ..."

3. Deploy Infrastructure

# Deploy to production
./scripts/deploy.sh prod

# Or deploy to staging
./scripts/deploy.sh staging

The deployment script will:

✅ Check prerequisites
🔑 Generate SSH keys
🏗️ Build Go applications
☁️ Deploy infrastructure with Terraform
⚙️ Configure services with Ansible
🚀 Deploy applications
✅ Verify deployment

4. Configure DataImpulse

Update the residential proxy configuration:

# config/rota.yaml
residential_proxy:
  enabled: true
  endpoint: "residential.dataimpulse.com:823"
  username: "your-dataimpulse-username"
  password: "your-dataimpulse-password"

🔧 Usage

REST API

# Fetch a URL through the proxy infrastructure
curl -X POST http://your-rota-ip:8080/api/v1/fetch \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.google.com/search?q=test",
    "method": "GET",
    "timeout": 30
  }'

# Get proxy health status
curl http://your-rota-ip:8080/api/v1/proxies

# Get performance statistics
curl http://your-rota-ip:8080/api/v1/stats

MCP Integration

The platform includes a Model Context Protocol (MCP) server for seamless AI agent integration:

# Example: Using the MCP server with an AI agent
from mcp import Client

client = Client("search-infra-proxy")

# Fetch a URL through the proxy infrastructure
result = await client.call_tool("fetch_url", {
    "url": "https://example.com",
    "method": "GET"
})

# Get proxy health status
health = await client.call_tool("get_proxy_health", {})

# Scale the proxy pool
await client.call_tool("scale_proxy_pool", {
    "action": "scale_up",
    "count": 2,
    "provider": "hetzner"
})

Go SDK

package main

import (
    "context"
    "search-infra/pkg/router"
)

func main() {
    config, _ := router.LoadConfig("config/rota.yaml")
    r, _ := router.NewRouter(config)
    
    result, err := r.Fetch(context.Background(), router.FetchRequest{
        URL:    "https://www.google.com/search?q=test",
        Method: "GET",
    })
    
    if err == nil {
        fmt.Printf("Status: %d, Proxy: %s\n", 
            result.StatusCode, result.ProxyUsed)
    }
}

📊 Monitoring

Access the monitoring dashboards:

Rota API: http://your-core-ip:8080
Prometheus: http://your-core-ip:9090
Grafana: http://your-core-ip:3000

Key metrics to monitor:

Proxy success rates
Cache hit ratios
Request latency
Ban rates by provider
Residential proxy usage

🔧 Management

Scaling Operations

# Scale up (add 2 Hetzner instances)
./scripts/scale.sh up 2 hetzner

# Scale down (remove 1 OVH instance)
./scripts/scale.sh down 1 ovh

# Check current status
./scripts/deploy.sh status

Proxy Management

# Ban a specific proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/ban \
  -H "Content-Type: application/json" \
  -d '{"duration": "1h", "reason": "High error rate"}'

# Unban a proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/unban

Infrastructure Destruction

# Destroy infrastructure (be careful!)
./scripts/deploy.sh destroy prod

🏗️ Project Structure

search-infra/
├── terraform/                 # Infrastructure as Code
│   ├── modules/               # Reusable Terraform modules
│   │   ├── ovh-vps/          # OVH VPS module
│   │   ├── hetzner-vps/      # Hetzner VPS module
│   │   └── wireguard-mesh/   # WireGuard mesh networking
│   └── environments/         # Environment-specific configs
│       ├── dev/
│       ├── staging/
│       └── prod/
├── ansible/                   # Configuration management
│   ├── playbooks/            # Ansible playbooks
│   ├── roles/                # Reusable roles
│   └── inventory/            # Dynamic inventory
├── cmd/                      # Go applications
│   └── rota/                 # Main proxy router service
├── pkg/                      # Go packages
│   ├── router/               # Smart routing logic
│   ├── health/               # Health checking
│   └── metrics/              # Prometheus metrics
├── mcp-server/               # Model Context Protocol server
├── config/                   # Configuration files
├── scripts/                  # Deployment and management scripts
├── monitoring/               # Grafana dashboards and alerts
└── docs/                     # Additional documentation

🔒 Security

WireGuard Mesh: All inter-node communication encrypted
Firewall Rules: Minimal attack surface with UFW
Fail2Ban: Automatic IP blocking for suspicious activity
SSH Keys: Key-based authentication only
Regular Updates: Automated security updates

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Issues: GitHub Issues
Documentation: Wiki
Discussions: GitHub Discussions

🙏 Acknowledgments

OVH Cloud: Affordable datacenter infrastructure
Hetzner: Reliable European hosting
DataImpulse: Residential proxy services
WireGuard: Secure VPN technology
Prometheus: Monitoring and alerting

Built with ❤️ for the AI agent community

Achieve enterprise-grade proxy infrastructure at a fraction of the cost.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ansible		ansible
cmd/rota		cmd/rota
config		config
mcp-server		mcp-server
pkg		pkg
scripts		scripts
terraform		terraform
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Makefile		Makefile
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Infrastructure Proxy Platform

🎯 Overview

Key Features

📊 Performance Targets

💰 Cost Breakdown (100k SERPs + 300k pages/month)

🏗️ Architecture

🚀 Quick Start

Prerequisites

1. Clone and Setup

2. Configure Credentials

3. Deploy Infrastructure

4. Configure DataImpulse

🔧 Usage

REST API

MCP Integration

Go SDK

📊 Monitoring

🔧 Management

Scaling Operations

Proxy Management

Infrastructure Destruction

🏗️ Project Structure

🔒 Security

🤝 Contributing

📄 License

🆘 Support

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Search Infrastructure Proxy Platform

🎯 Overview

Key Features

📊 Performance Targets

💰 Cost Breakdown (100k SERPs + 300k pages/month)

🏗️ Architecture

🚀 Quick Start

Prerequisites

1. Clone and Setup

2. Configure Credentials

3. Deploy Infrastructure

4. Configure DataImpulse

🔧 Usage

REST API

MCP Integration

Go SDK

📊 Monitoring

🔧 Management

Scaling Operations

Proxy Management

Infrastructure Destruction

🏗️ Project Structure

🔒 Security

🤝 Contributing

📄 License

🆘 Support

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages