Skill Evaluation Platform

A web-based platform for evaluating AI agent skills with automated test case generation and scoring.

Features

Multi-Provider Support: Compatible with OpenAI, Anthropic, DeepSeek, SiliconFlow, Zhipu GLM, Moonshot, Ollama, and custom endpoints
Automated Testing: Generates test cases from SKILL.md definitions
Multi-Dimension Scoring: Evaluates trigger accuracy, output quality, instruction following, robustness, and efficiency
Visual Reports: Radar charts and bar charts for result visualization
Export Options: Export results as JSON or HTML reports

Quick Start

# Install dependencies
npm install

# Build bundle
npm run build

# Start server
npm run start

Open http://localhost:3001 in your browser.

Usage

Configure API: Select a provider and enter your API key
Upload Skills: Upload SKILL.md files or ZIP packages containing skill definitions
Set Options: Configure number of test cases per skill
Run Evaluation: Click "开始评测" to start evaluation
View Results: Review scores, charts, and detailed reports

Development

# Watch mode (future)
npm run dev

Project Structure

├── ts/              # TypeScript source files
├── dist/            # Build output
├── index.html       # Main HTML file
├── server.js        # Static file server
├── package.json     # Project configuration
└── tsconfig.json    # TypeScript configuration

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
tests		tests
ts		ts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skill Evaluation Platform

Features

Quick Start

Usage

Development

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Skill Evaluation Platform

Features

Quick Start

Usage

Development

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages