Skip to content

gregcmartin/agentsearch_infra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Search Infrastructure Proxy Platform

A DIY "Bright Data-class" proxy platform that provides AI agents with reliable web search capabilities at approximately 10x lower cost than commercial solutions.

🎯 Overview

This open-source platform combines cheap datacenter proxies (OVH + Hetzner) with on-demand residential proxies (DataImpulse) to create a cost-effective, scalable proxy infrastructure optimized for AI agent workloads.

Key Features

  • Smart Routing: Cache-first β†’ Search API fallback β†’ Datacenter proxies β†’ Residential backup
  • Cost Optimization: ~$0.00026 per page vs commercial solutions
  • High Availability: 96%+ success rate with automatic failover
  • Scalable Architecture: Start with 16 nodes, scale as needed
  • WireGuard Mesh: Secure inter-node communication
  • Real-time Monitoring: Prometheus + Grafana dashboards
  • MCP Integration: Model Context Protocol server for AI agent integration
  • Ban Detection: Automatic proxy rotation and health checking

πŸ“Š Performance Targets

Metric Target
Google/Bing success rate β‰₯96% with ≀3 retries
Median SERP latency ≀3 seconds
Cost per fetched page ≀$0.003
Operational overhead 1 DevOps + 0.2 FTE on-call

πŸ’° Cost Breakdown (100k SERPs + 300k pages/month)

Component Cost
8x OVH VLE-2 instances $44/month
8x Hetzner CX11 instances $34.60/month
DataImpulse residential (10GB) $10/month
Total ~$104/month
Cost per page $0.00026

πŸ—οΈ Architecture

graph TD
    subgraph "Edge Proxies (OVH & Hetzner)"
        P1((Squid/3proxy)) -->|WireGuard| Core
        P2((Squid/3proxy)) -->|WireGuard| Core
        Pn((...))
    end
    subgraph "Core Services"
        Rota[Rota Router] --> Redis[(Cache)]
        Rota --> Prom[Prometheus]
        Rota --> Queue[Request Queue]
        Crawlee{{Worker pods}} --> Target[Web]
    end
    Target -.->|Blocked| Rota
    Rota -.->|Retry via residential| DataImpulse[(DataImpulse)]
Loading

πŸš€ Quick Start

Prerequisites

  • Cloud Accounts: OVH Public Cloud + Hetzner Cloud accounts
  • Tools: Terraform, Ansible, Go 1.21+, Python 3.9+
  • DataImpulse: Account for residential proxy backup

1. Clone and Setup

git clone <repository-url>
cd search-infra

# Install dependencies
go mod tidy
pip install -r mcp-server/requirements.txt

2. Configure Credentials

# Copy and configure Terraform variables
cp terraform/environments/prod/terraform.tfvars.example terraform/environments/prod/terraform.tfvars

# Edit with your credentials
vim terraform/environments/prod/terraform.tfvars

Required configuration:

# OVH Configuration
ovh_application_key      = "your-ovh-application-key"
ovh_application_secret   = "your-ovh-application-secret"
ovh_consumer_key         = "your-ovh-consumer-key"
ovh_project_service_name = "your-ovh-project-service-name"

# Hetzner Configuration
hcloud_token = "your-hetzner-cloud-token"

# SSH Configuration (will be auto-generated if not provided)
ssh_public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQ..."

3. Deploy Infrastructure

# Deploy to production
./scripts/deploy.sh prod

# Or deploy to staging
./scripts/deploy.sh staging

The deployment script will:

  1. βœ… Check prerequisites
  2. πŸ”‘ Generate SSH keys
  3. πŸ—οΈ Build Go applications
  4. ☁️ Deploy infrastructure with Terraform
  5. βš™οΈ Configure services with Ansible
  6. πŸš€ Deploy applications
  7. βœ… Verify deployment

4. Configure DataImpulse

Update the residential proxy configuration:

# config/rota.yaml
residential_proxy:
  enabled: true
  endpoint: "residential.dataimpulse.com:823"
  username: "your-dataimpulse-username"
  password: "your-dataimpulse-password"

πŸ”§ Usage

REST API

# Fetch a URL through the proxy infrastructure
curl -X POST http://your-rota-ip:8080/api/v1/fetch \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.google.com/search?q=test",
    "method": "GET",
    "timeout": 30
  }'

# Get proxy health status
curl http://your-rota-ip:8080/api/v1/proxies

# Get performance statistics
curl http://your-rota-ip:8080/api/v1/stats

MCP Integration

The platform includes a Model Context Protocol (MCP) server for seamless AI agent integration:

# Example: Using the MCP server with an AI agent
from mcp import Client

client = Client("search-infra-proxy")

# Fetch a URL through the proxy infrastructure
result = await client.call_tool("fetch_url", {
    "url": "https://example.com",
    "method": "GET"
})

# Get proxy health status
health = await client.call_tool("get_proxy_health", {})

# Scale the proxy pool
await client.call_tool("scale_proxy_pool", {
    "action": "scale_up",
    "count": 2,
    "provider": "hetzner"
})

Go SDK

package main

import (
    "context"
    "search-infra/pkg/router"
)

func main() {
    config, _ := router.LoadConfig("config/rota.yaml")
    r, _ := router.NewRouter(config)
    
    result, err := r.Fetch(context.Background(), router.FetchRequest{
        URL:    "https://www.google.com/search?q=test",
        Method: "GET",
    })
    
    if err == nil {
        fmt.Printf("Status: %d, Proxy: %s\n", 
            result.StatusCode, result.ProxyUsed)
    }
}

πŸ“Š Monitoring

Access the monitoring dashboards:

  • Rota API: http://your-core-ip:8080
  • Prometheus: http://your-core-ip:9090
  • Grafana: http://your-core-ip:3000

Key metrics to monitor:

  • Proxy success rates
  • Cache hit ratios
  • Request latency
  • Ban rates by provider
  • Residential proxy usage

πŸ”§ Management

Scaling Operations

# Scale up (add 2 Hetzner instances)
./scripts/scale.sh up 2 hetzner

# Scale down (remove 1 OVH instance)
./scripts/scale.sh down 1 ovh

# Check current status
./scripts/deploy.sh status

Proxy Management

# Ban a specific proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/ban \
  -H "Content-Type: application/json" \
  -d '{"duration": "1h", "reason": "High error rate"}'

# Unban a proxy
curl -X POST http://your-rota-ip:8080/api/v1/proxies/ovh-proxy-prod-1/unban

Infrastructure Destruction

# Destroy infrastructure (be careful!)
./scripts/deploy.sh destroy prod

πŸ—οΈ Project Structure

search-infra/
β”œβ”€β”€ terraform/                 # Infrastructure as Code
β”‚   β”œβ”€β”€ modules/               # Reusable Terraform modules
β”‚   β”‚   β”œβ”€β”€ ovh-vps/          # OVH VPS module
β”‚   β”‚   β”œβ”€β”€ hetzner-vps/      # Hetzner VPS module
β”‚   β”‚   └── wireguard-mesh/   # WireGuard mesh networking
β”‚   └── environments/         # Environment-specific configs
β”‚       β”œβ”€β”€ dev/
β”‚       β”œβ”€β”€ staging/
β”‚       └── prod/
β”œβ”€β”€ ansible/                   # Configuration management
β”‚   β”œβ”€β”€ playbooks/            # Ansible playbooks
β”‚   β”œβ”€β”€ roles/                # Reusable roles
β”‚   └── inventory/            # Dynamic inventory
β”œβ”€β”€ cmd/                      # Go applications
β”‚   └── rota/                 # Main proxy router service
β”œβ”€β”€ pkg/                      # Go packages
β”‚   β”œβ”€β”€ router/               # Smart routing logic
β”‚   β”œβ”€β”€ health/               # Health checking
β”‚   └── metrics/              # Prometheus metrics
β”œβ”€β”€ mcp-server/               # Model Context Protocol server
β”œβ”€β”€ config/                   # Configuration files
β”œβ”€β”€ scripts/                  # Deployment and management scripts
β”œβ”€β”€ monitoring/               # Grafana dashboards and alerts
└── docs/                     # Additional documentation

πŸ”’ Security

  • WireGuard Mesh: All inter-node communication encrypted
  • Firewall Rules: Minimal attack surface with UFW
  • Fail2Ban: Automatic IP blocking for suspicious activity
  • SSH Keys: Key-based authentication only
  • Regular Updates: Automated security updates

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

πŸ™ Acknowledgments

  • OVH Cloud: Affordable datacenter infrastructure
  • Hetzner: Reliable European hosting
  • DataImpulse: Residential proxy services
  • WireGuard: Secure VPN technology
  • Prometheus: Monitoring and alerting

Built with ❀️ for the AI agent community

Achieve enterprise-grade proxy infrastructure at a fraction of the cost.

About

Host your own BrightData like service 10x less expensive

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors