Skip to content

ynulihao/AgentSkillOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AgentSkillOS

English | ็ฎ€ไฝ“ไธญๆ–‡

Build your agent from 200,000+ skills via skill
RETRIEVAL & ORCHESTRATION

้€š่ฟ‡ๆŠ€่ƒฝๆฃ€็ดขไธŽ็ผ–ๆŽ’๏ผŒไปŽ 200,000+ ๆŠ€่ƒฝไธญๆž„ๅปบAgent

Main Page Python 3.10+ License: MIT arXiv Hugging Face Dataset

Method Benchmark Examples How to Use

News

  • [2026/03] Our new project homepage is now live!
  • [2026/03] Benchmark released โ€” 30 multi-format creative tasks across 5 categories with pairwise Bradley-Terry evaluation.
  • [2026/03] Modular Architecture released โ€” pluggable retrieval/orchestration modules. See ARCHITECTURE.md for details.
  • [2026/03] Batch CLI released โ€” headless parallel execution with YAML configs, resume support, and Rich progress UI.

๐ŸŒ Overview

๐Ÿ”ฅ The agent skill ecosystem is explodingโ€”over 200,000+skills are now publicly available.

But with so many options, how do you find the right skills for your task? And when one skill isnโ€™t enough, how do you compose and orchestrate multiple skills into a working pipeline?

AgentSkillOS is the operating system for agent skillsโ€”helping you discover, compose, and run skill pipelines end-to-end.

Watch the video

Skill Workflow Overview

WEB UI ยท Visual workflow overview in the browser

CLI Workflow Run

CLI ยท Headless execution with terminal progress and logs

๐ŸŒŸ Highlights

  • ๐Ÿ” Skill Search & Discovery โ€” Creatively discover task-relevant skills with a skill tree that organizes skills into a hierarchy based on their capabilities.
  • ๐Ÿ”— Skill Orchestration โ€” Compose and orchestrate multiple skills into a single workflow with a directed acyclic graph, automatically managing execution order, dependencies, and data flow across steps.
  • ๐Ÿ–ฅ๏ธ GUI (Human-in-the-Loop) โ€” A built-in GUI enables human intervention at every step, making workflows controllable, auditable, and easy to steer.
  • โญ High-Quality Skill Pool โ€” A curated collection of high-quality skills, selected based on Claude's implementation, GitHub stars, and download volume.
  • ๐Ÿ“Š Observability & Debugging โ€” Trace each step with logs and metadata to debug faster and iterate on workflows with confidence.
  • ๐Ÿงฉ Extensible Skill Registry โ€” Easily plug in new skills, bring your own skills via a flexible registry.
  • ๐Ÿ“ˆ Benchmark โ€” 30 multi-format creative tasks across 5 categories, evaluated with pairwise comparison and Bradley-Terry aggregation.

๐Ÿ’ก Examples

๐Ÿ‘‰ View detailed workflows on Landing Page โ†’

๐Ÿ“Š Check out the comparison report: AgentSkillOS vs. without skills โ†’

Case Study

Qualitative comparison between the vanilla baseline and AgentSkillOS Quality-First outputs.

Bug Diagnosis Report
Example 01 ยท Bug Diagnosis Report
Mobile bug localization, fix validation, and visual bug report generation with before/after evidence.
UI Design Research
Example 02 ยท UI Design Research
Design-language research, report generation, and multi-direction concept mockups for knowledge software.
Paper Promotion
Example 03 ยท Paper Promotion
Transforms academic papers into social slides, scientific pages, and platform-specific promotion content.
Meme Video
Example 04 ยท Meme Video
Green-screen compositing, subtitle timing, and viral short-video production with multi-version outputs.

๐Ÿ—๏ธ Method

  • Skill tree construction: Organizes over 200,000+ skills into a capability tree, providing structured, coarse-to-fine access for efficient and creative skill discovery.
  • Skill retrieval: Automatically selects a task-relevant subset of usable skills given a userโ€™s request.
  • Skill orchestration: Composes the selected skills into a coordinated plan (e.g., a DAG-based workflow) to solve tasks beyond the reach of any single skill. Note that we also support a freestyle mode (i.e., Claude Code).

AgentSkillOS Framework

๐ŸŒฒ Why Skill Tree?

Skill Retrieval Comparison

Left: Pure semantic retrieval prioritizes texutal similarity, often missing skills that look unrelated in embedding space but are crucial for actually solving the taskโ€”leading to narrow, myopic skill usage.

Right: Our LLM + Skill Tree navigates the capability hierarchy to surface non-obvious but functionally relevant skills, enabling broader, more creative, and more effective skill composition.

200 Skills 1,000 Skills 10,000 Skills

๐Ÿ“ˆ Benchmark

We propose a benchmark of 30 multi-format creative tasks spanning 5 categories, evaluated via pairwise comparison with Bradley-Terry aggregation.

Three key properties:

  • Multi-format creative tasks โ€” Tasks require end-user artifacts in formats such as PDF, PPTX, DOCX, HTML, video, and generated images.
  • Pairwise evaluation โ€” Outputs are compared in both orders to reduce position bias and capture reliable preference signals.
  • Bradley-Terry scores โ€” Pairwise preferences are aggregated into continuous ranking scores for fine-grained system comparisons.
Benchmark Framework Task Overview

๐Ÿงช Experiments

Evaluated across 200 / 1K / 200K skill ecosystems, AgentSkillOS demonstrates consistent superiority over baselines, with ablation confirming that both retrieval and orchestration are indispensable, and strategy selection producing structurally distinct execution graphs.

Key findings:

  • Substantial Gains over Baselines at Every Scale โ€” All three AgentSkillOS variants achieve the highest Bradley-Terry scores across 200 / 1K / 200K ecosystems. The w/ Full Pool baseline scores poorly because a growing fraction of skills becomes invisible โ€” structured retrieval and orchestration overcome this scalability bottleneck.
  • Ablation: Both Retrieval and Orchestration Are Essential โ€” Removing components reveals a clear degradation gradient: without DAG orchestration, retrieval alone is insufficient; without retrieval, even oracle skills cannot close the gap. Quality-First shows only a modest deficit versus the oracle upper bound, and the gap narrows as the ecosystem grows.
  • Strategy Choice Shapes Execution Structure โ€” Each orchestration strategy faithfully translates its design intent into a distinct DAG topology. Quality-First builds deep, multi-stage pipelines; Efficiency-First trades depth for width to maximize parallelism; Simplicity-First retains only essential steps.
Category Radar
Category Radar โ€” Per-category Bradley-Terry performance across ecosystem scales.
Ablation Study
Ablation โ€” Separates retrieval and orchestration effects; confirms both are required.
DAG Structure Metrics
DAG Structure Metrics โ€” Different orchestration strategies induce distinct topology profiles.

๐Ÿš€ How to Use

Installation & Configuration

Prerequisites

  • Python 3.10+
  • Claude Code (must be installed and available in PATH)
  • Use cc-switch to switch to other LLM providers

Install & Run

git clone https://github.com/ynulihao/AgentSkillOS.git
cd AgentSkillOS
pip install -e .
cp .env.example .env  # Edit with your API keys
python run.py --port 8765

Download Pre-built Trees

Tree Skills Description
๐ŸŒฑ skill_seeds ~50 Curated skill set (default)
๐Ÿ“ฆ skill_200 200 200 skills
๐Ÿ—ƒ๏ธ skill_1000 ~1,000 1,000 skills
๐Ÿ—๏ธ skill_10000 ~10,000 10,000 active + layered dormant skills

Configuration

# .env
LLM_MODEL=openai/anthropic/claude-opus-4.5
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your-key

EMBEDDING_MODEL=openai/text-embedding-3-large
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_API_KEY=your-key

Custom Skill Groups

  1. Create data/my_skills/skill-name/SKILL.md
  2. Register in src/config.py โ†’ SKILL_GROUPS
  3. Build: python run.py build -g my_skills -v
Batch Execution (Headless CLI)

Run a Batch

Run multiple tasks in parallel without the Web UI:

python run.py cli --task config/batch.yaml

See config/eval/ for ready-made batch configs covering different skill managers (tree, vector), orchestrators (dag, free-style), and skill pool sizes.

Batch Config (YAML)

batch_id: my_batch

defaults:
  skill_mode: auto          # "auto" (discover) or "specified"
  skill_group: skill_200    # Which skill pool to use
  output_dir: ./runs
  continue_on_error: true

execution:
  parallel: 2               # Max concurrent tasks
  retry_failed: 0

tasks:
  - file: path/to/task1.json
  - file: path/to/task2.json
  - dir: path/to/tasks/     # Scan directory
    pattern: "*.json"

CLI Flags

Flag Description
--task PATH, -T Path to batch YAML config (required)
--parallel N, -p Override parallel task count
--resume PATH, -R Resume an interrupted batch run
--output-dir PATH, -o Override output directory
--dry-run Preview tasks without execution
--verbose, -v Show detailed logs
--manager PLUGIN, -m Override skill manager (e.g., tree, vector)
--orchestrator PLUGIN Override orchestrator (e.g., dag, free-style)

Resume Interrupted Runs

python run.py cli -T config/batch.yaml --resume ./runs/my_batch_20260306_120000

Completed tasks are skipped; only remaining tasks are re-executed.

Output Structure

./runs/{batch_id}/
โ”œโ”€โ”€ batch_result.json          # Batch summary (metrics, costs, eval scores)
โ””โ”€โ”€ {task_id}__{run_id}/       # Per-task directory
    โ”œโ”€โ”€ meta.json
    โ”œโ”€โ”€ result.json
    โ”œโ”€โ”€ evaluation.json
    โ””โ”€โ”€ artifacts/             # Task outputs (PDF, HTML, video, etc.)

๐Ÿ”ฎ Future Work

  • Recipe Generation & Storage
  • Interactive Agent Execution
  • Plan Refinement
  • Auto Skill Import
  • Dependency Detection
  • History Management
  • Multi-CLI Support (Codex, Gemini CLI, Cursor)

Citation

If you find AgentSKillOS useful, consider citing our paper:

@article{li2026organizing,
  title={Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale},
  author={Li, Hao and Mu, Chunjiang and Chen, Jianhao and Ren, Siyue and Cui, Zhiyao and Zhang, Yiqun and Bai, Lei and Hu, Shuyue},
  journal={arXiv preprint arXiv:2603.02176},
  year={2026}
}

About

Build your agent from 200,000+ skills via skill RETRIEVAL & ORCHESTRATION

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors