English | ็ฎไฝไธญๆ
Build your agent from 200,000+ skills via skill
RETRIEVAL & ORCHESTRATION
้่ฟๆ่ฝๆฃ็ดขไธ็ผๆ๏ผไป 200,000+ ๆ่ฝไธญๆๅปบAgent
News
- [2026/03] Our new project homepage is now live!
- [2026/03] Benchmark released โ 30 multi-format creative tasks across 5 categories with pairwise Bradley-Terry evaluation.
- [2026/03] Modular Architecture released โ pluggable retrieval/orchestration modules. See ARCHITECTURE.md for details.
- [2026/03] Batch CLI released โ headless parallel execution with YAML configs, resume support, and Rich progress UI.
๐ฅ The agent skill ecosystem is explodingโover 200,000+skills are now publicly available.
But with so many options, how do you find the right skills for your task? And when one skill isnโt enough, how do you compose and orchestrate multiple skills into a working pipeline?
AgentSkillOS is the operating system for agent skillsโhelping you discover, compose, and run skill pipelines end-to-end.
WEB UI ยท Visual workflow overview in the browser
CLI ยท Headless execution with terminal progress and logs
- ๐ Skill Search & Discovery โ Creatively discover task-relevant skills with a skill tree that organizes skills into a hierarchy based on their capabilities.
- ๐ Skill Orchestration โ Compose and orchestrate multiple skills into a single workflow with a directed acyclic graph, automatically managing execution order, dependencies, and data flow across steps.
- ๐ฅ๏ธ GUI (Human-in-the-Loop) โ A built-in GUI enables human intervention at every step, making workflows controllable, auditable, and easy to steer.
- โญ High-Quality Skill Pool โ A curated collection of high-quality skills, selected based on Claude's implementation, GitHub stars, and download volume.
- ๐ Observability & Debugging โ Trace each step with logs and metadata to debug faster and iterate on workflows with confidence.
- ๐งฉ Extensible Skill Registry โ Easily plug in new skills, bring your own skills via a flexible registry.
- ๐ Benchmark โ 30 multi-format creative tasks across 5 categories, evaluated with pairwise comparison and Bradley-Terry aggregation.
๐ View detailed workflows on Landing Page โ
๐ Check out the comparison report: AgentSkillOS vs. without skills โ
Qualitative comparison between the vanilla baseline and AgentSkillOS Quality-First outputs.
- Skill tree construction: Organizes over 200,000+ skills into a capability tree, providing structured, coarse-to-fine access for efficient and creative skill discovery.
- Skill retrieval: Automatically selects a task-relevant subset of usable skills given a userโs request.
- Skill orchestration: Composes the selected skills into a coordinated plan (e.g., a DAG-based workflow) to solve tasks beyond the reach of any single skill. Note that we also support a freestyle mode (i.e., Claude Code).
Left: Pure semantic retrieval prioritizes texutal similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the taskโleading to narrow, myopic skill usage.
Right: Our LLM + Skill Tree navigates the capability hierarchy to surface non-obvious but functionally relevant skills, enabling broader, more creative, and more effective skill composition.
| 200 Skills | 1,000 Skills | 10,000 Skills |
![]() |
![]() |
![]() |
We propose a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.
Three key properties:
- Multi-format creative tasks โ Tasks require end-user artifacts in formats such as PDF, PPTX, DOCX, HTML, video, and generated images.
- Pairwise evaluation โ Outputs are compared in both orders to reduce position bias and capture reliable preference signals.
- Bradley-Terry scores โ Pairwise preferences are aggregated into continuous ranking scores for fine-grained system comparisons.
|
|
Evaluated across 200 / 1K / 200K skill ecosystems, AgentSkillOS demonstrates consistent superiority over baselines, with ablation confirming that both retrieval and orchestration are indispensable, and strategy selection producing structurally distinct execution graphs.
Key findings:
- Substantial Gains over Baselines at Every Scale โ All three AgentSkillOS variants achieve the highest Bradley-Terry scores across 200 / 1K / 200K ecosystems. The w/ Full Pool baseline scores poorly because a growing fraction of skills becomes invisible โ structured retrieval and orchestration overcome this scalability bottleneck.
- Ablation: Both Retrieval and Orchestration Are Essential โ Removing components reveals a clear degradation gradient: without DAG orchestration, retrieval alone is insufficient; without retrieval, even oracle skills cannot close the gap. Quality-First shows only a modest deficit versus the oracle upper bound, and the gap narrows as the ecosystem grows.
- Strategy Choice Shapes Execution Structure โ Each orchestration strategy faithfully translates its design intent into a distinct DAG topology. Quality-First builds deep, multi-stage pipelines; Efficiency-First trades depth for width to maximize parallelism; Simplicity-First retains only essential steps.
Installation & Configuration
- Python 3.10+
- Claude Code (must be installed and available in PATH)
- Use cc-switch to switch to other LLM providers
git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS
pip install -e .
cp .env.example .env # Edit with your API keys
python run.py --port 8765| Tree | Skills | Description |
|---|---|---|
๐ฑ skill_seeds |
~50 | Curated skill set (default) |
๐ฆ skill_200 |
200 | 200 skills |
๐๏ธ skill_1000 |
~1,000 | 1,000 skills |
๐๏ธ skill_10000 |
~10,000 | 10,000 active + layered dormant skills |
# .env
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key
EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key- Create
data/my_skills/skill-name/SKILL.md - Register in
src/config.pyโSKILL_GROUPS - Build:
python run.py build -g my_skills -v
Batch Execution (Headless CLI)
Run multiple tasks in parallel without the Web UI:
python run.py cli --task config/batch.yamlSee config/eval/ for ready-made batch configs covering different skill managers (tree, vector), orchestrators (dag, free-style), and skill pool sizes.
batch_id: my_batch
defaults:
skill_mode: auto # "auto" (discover) or "specified"
skill_group: skill_200 # Which skill pool to use
output_dir: ./runs
continue_on_error: true
execution:
parallel: 2 # Max concurrent tasks
retry_failed: 0
tasks:
- file: path/to/task1.json
- file: path/to/task2.json
- dir: path/to/tasks/ # Scan directory
pattern: "*.json"| Flag | Description |
|---|---|
--task PATH, -T |
Path to batch YAML config (required) |
--parallel N, -p |
Override parallel task count |
--resume PATH, -R |
Resume an interrupted batch run |
--output-dir PATH, -o |
Override output directory |
--dry-run |
Preview tasks without execution |
--verbose, -v |
Show detailed logs |
--manager PLUGIN, -m |
Override skill manager (e.g., tree, vector) |
--orchestrator PLUGIN |
Override orchestrator (e.g., dag, free-style) |
python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000Completed tasks are skipped; only remaining tasks are re-executed.
./runs/{batch_id}/
โโโ batch_result.json # Batch summary (metrics, costs, eval scores)
โโโ {task_id}__{run_id}/ # Per-task directory
โโโ meta.json
โโโ result.json
โโโ evaluation.json
โโโ artifacts/ # Task outputs (PDF, HTML, video, etc.)
- Recipe Generation & Storage
- Interactive Agent Execution
- Plan Refinement
- Auto Skill Import
- Dependency Detection
- History Management
- Multi-CLI Support (Codex, Gemini CLI, Cursor)
If you find AgentSKillOS useful, consider citing our paper:
@article{li2026organizing,
title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
journal={arXiv preprint arXiv:2603.02176},
year={2026}
}












