Skip to content

cxlRay/skill-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skill Evaluation Platform

A web-based platform for evaluating AI agent skills with automated test case generation and scoring.

Features

  • Multi-Provider Support: Compatible with OpenAI, Anthropic, DeepSeek, SiliconFlow, Zhipu GLM, Moonshot, Ollama, and custom endpoints
  • Automated Testing: Generates test cases from SKILL.md definitions
  • Multi-Dimension Scoring: Evaluates trigger accuracy, output quality, instruction following, robustness, and efficiency
  • Visual Reports: Radar charts and bar charts for result visualization
  • Export Options: Export results as JSON or HTML reports

Quick Start

# Install dependencies
npm install

# Build bundle
npm run build

# Start server
npm run start

Open http://localhost:3001 in your browser.

Usage

  1. Configure API: Select a provider and enter your API key
  2. Upload Skills: Upload SKILL.md files or ZIP packages containing skill definitions
  3. Set Options: Configure number of test cases per skill
  4. Run Evaluation: Click "开始评测" to start evaluation
  5. View Results: Review scores, charts, and detailed reports

Development

# Watch mode (future)
npm run dev

Project Structure

├── ts/              # TypeScript source files
├── dist/            # Build output
├── index.html       # Main HTML file
├── server.js        # Static file server
├── package.json     # Project configuration
└── tsconfig.json    # TypeScript configuration

License

MIT License - see LICENSE for details.

About

skill评测

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors