A Precision Multi-Site Search & LLM-Powered Resource Refinement Agent.
SiteScout is a specialized AI Agent designed for "Digital Archeology." It solves the pain points of searching for niche resources in specific forums or vertical sites where information density is low, ads are rampant, and links are often dead or hidden.
Unlike blind global searches, SiteScout follows a "Targeted Penetration + Funnel Refinement" architecture. It allows users to define a "Trusted Site Pool," concurrently scrapes these sites, and leverages LLM to automatically identify, extract, and verify authentic download links.
-
🎯 Domain-Specific "Sniper" Search: Supports custom
site:operator lists. Dig deep into specific niche forums (e.g., Reddit, V2EX, GitHub) rather than the noisy open web. -
🌪️ The Resource Funnel (Map-Reduce):
- Raw Data: Retains up to 20 raw search results per site for manual verification.
-
Refined Summary: An LLM-distilled "Quick List" generated from the top
$K$ results across all sites.
- 🧠 Intelligent De-noising: Automatically filters out "Join group for link," "Reply to see," and fake redirection ads.
-
⚡ High-Performance Concurrency: Built with
httpxandasyncio, performing multi-site lookups simultaneously, reducing response latency by up to 70%. - 🌐 Developer Friendly: Native support for DeepSeek, Bocha AI, and Exa AI APIs.
SiteScout utilizes a decoupled modular design:
- Input Layer: Receives search queries, a target domain list,
raw_n(display count), andllm_k(refinement count). - Dispatcher: Orchestrates the workflow into parallel asynchronous retrieval tasks.
- Retrieval Layer: Uses Search Engine APIs (Bocha/Exa/Serper) with specialized
site:operators. - Pre-filter: Sorts results based on keyword relevance (e.g., "pan", "magnet", "download", "release").
- LLM Refiner: DeepSeek-R1 extracts download links, passwords, and resource integrity status.
- Presentation Layer: Structurally renders a Markdown table summary followed by the raw data logs.
git clone https://github.com/YourName/SiteScout.git
cd SiteScoutCreate a .env file and add your API keys:
DEEPSEEK_API_KEY=your_key_here
BOCHA_API_KEY=your_key_here # Or TAVILY_API_KEY / EXA_API_KEYpip install -r requirements.txt
python main.pyInput:
- Query:
Black Myth Wukong Optimization Patch - Sites:
v2ex.com, github.com, reddit.com - Params:
raw_n=20, llm_k=5
Output:
💎 AI Refined Summary
Resource Source Download Link Password Status v1.0.8 Performance Fix GitHub https://github.com/.../releasesN/A ✅ Verified Community Mod https://mega.nz/...scout2026⚠️ Use with caution🔍 Raw Search Details (60 total results)
- [GitHub] Update on performance optimization...
- [V2EX] Does anyone have the new patch...
- Brain: DeepSeek-R1 / V3
- Search Engine: Bocha AI / Exa AI
- Framework: FastAPI / Antigravity / Dify
- Concurrency: Python Asyncio &
httpx
Contributions are welcome! If you find this tool helpful, please give it a Star 🌟. It means a lot to a growing AI Engineer!
