Zero-error skill creation framework — fuses agentskills.io spec with MAKER reliability principles to generate production-ready AI skills that follow directions perfectly.
SkillForge takes your high-level workflow descriptions and transforms them into battle-tested, agentskills.io-compatible Skills with built-in validation, red-flag detection, and consensus verification — so the AI never hallucinates, truncates, or goes off-script.
- Quick Start
- Installation
- Core Concepts
- CLI Reference
- Configuration Reference
- Creating Skills
- Plugin System
- Red-Flag Rules
- Testing
- SDK API Reference
- Project Structure
- Troubleshooting
- FAQ
# 1. Install
cd skill_sdk && npm install
# 2. Scaffold a skill
npx tsx bin/skillforge.ts init my-first-skill \
--description "Generates unit tests for Python functions" \
--language python
# 3. Edit the config
# Open my-first-skill/skillforge.config.json and define your workflow
# 4. Build the skill
npx tsx bin/skillforge.ts build \
--config my-first-skill/skillforge.config.json \
--output my-first-skill/dist --verbose
# 5. Test the skill
npx tsx bin/skillforge.ts test \
--config my-first-skill/skillforge.config.json# Clone and install
git clone <repo-url> skill_sdk
cd skill_sdk
npm install
# Verify everything works
npm testRequires Node.js ≥ 18 and npm.
| Script | Command | Description |
|---|---|---|
npm test |
vitest run |
Run all 73 unit tests |
npm run test:watch |
vitest |
Watch mode for tests |
npm run build |
tsc |
Compile TypeScript to dist/ |
npm run lint |
tsc --noEmit |
Type-check without emitting |
npm start |
tsx bin/skillforge.ts |
Run the CLI directly |
SkillForge is built on three pillars from the MAKER framework (Maximal Agentic Knowledge-Error Reduction):
flowchart LR
A["📝 Workflow Steps"] --> B["🧬 Decompose"]
B --> C["🚩 Validate"]
C --> D["🗳️ Consensus"]
D --> E["✅ Accepted Output"]
D -->|"No agreement"| F["🔄 Retry"]
F --> C
C -->|"Red-flag triggered"| G["⛔ Discard & Resample"]
G --> C
style A fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style B fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style C fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style D fill:#1e293b,stroke:#10b981,color:#e2e8f0
style E fill:#065f46,stroke:#10b981,color:#e2e8f0
style F fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style G fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0
Every workflow step you write gets automatically broken down into the smallest possible atomic sub-steps. Each sub-step has:
- A single, clear instruction
- An input/output contract
- Its own validation script
- Red-flag rules
flowchart TD
WS["Workflow Step\n'Build a React component with tests'"]
WS --> AS1["1.1 Read user request"]
WS --> AS2["1.2 Identify component name"]
WS --> AS3["1.3 List required props"]
WS --> AS4["1.4 Define TypeScript interface"]
WS --> AS5["1.5 Implement component"]
WS --> AS6["1.6 Add ARIA attributes"]
WS --> AS7["1.7 Write unit tests"]
WS --> AS8["... +3 more"]
AS1 -.-> V1["✅ Validate"]
AS2 -.-> V2["✅ Validate"]
AS3 -.-> V3["✅ Validate"]
AS4 -.-> V4["✅ Validate"]
style WS fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style AS1 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS2 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS4 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS5 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS6 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS7 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style AS8 fill:#1e293b,stroke:#475569,color:#94a3b8
style V1 fill:#065f46,stroke:#10b981,color:#e2e8f0
style V2 fill:#065f46,stroke:#10b981,color:#e2e8f0
style V3 fill:#065f46,stroke:#10b981,color:#e2e8f0
style V4 fill:#065f46,stroke:#10b981,color:#e2e8f0
Why? Smaller steps = less room for the AI to go off course. A step like "Build a React component with tests" becomes 10+ atomic steps, each verifiable independently.
When multiple agents run the same step, their outputs are clustered by structural similarity. If at least K agents agree (default: 2 of 3), the consensus output is accepted. Otherwise, the step is retried.
flowchart TD
Step["Atomic Step 1.3"]
Step --> A1["Agent A"]
Step --> A2["Agent B"]
Step --> A3["Agent C"]
A1 --> O1["Output A"]
A2 --> O2["Output B"]
A3 --> O3["Output C"]
O1 --> Cluster["Similarity Clustering"]
O2 --> Cluster
O3 --> Cluster
Cluster --> Vote{"K ≥ 2 agree?"}
Vote -->|"✅ Yes"| Accept["Accept Consensus Output"]
Vote -->|"❌ No"| Retry["Retry Step"]
style Step fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style A1 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style A2 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style A3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style O1 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style O2 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style O3 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style Cluster fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style Vote fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style Accept fill:#065f46,stroke:#10b981,color:#e2e8f0
style Retry fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0
Why? If three agents independently produce the same answer, it's almost certainly correct.
Every output is scanned for suspicious patterns before acceptance:
- Length violations — Output too long or too short
- Regex patterns — Apologies, placeholders, truncation markers
- Syntax errors — Invalid code in the target language
- AST analysis — Hallucinated imports, duplicate definitions
- Custom rules — Your own validation scripts
flowchart LR
Output["LLM Output"] --> RF["Red-Flag\nEngine"]
RF --> L["📏 Length"]
RF --> R["🔍 Regex"]
RF --> S["💻 Syntax"]
RF --> A["🌳 AST"]
RF --> C["⚙️ Custom"]
L & R & S & A & C --> Decision{"Any flags?"}
Decision -->|"Clean"| Pass["✅ Accept"]
Decision -->|"Warning"| Discard["🔄 Discard & Resample"]
Decision -->|"Error"| Halt["⛔ Halt Pipeline"]
style Output fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style RF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style L fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style R fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style S fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style A fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style C fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style Decision fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style Pass fill:#065f46,stroke:#10b981,color:#e2e8f0
style Discard fill:#78350f,stroke:#f59e0b,color:#e2e8f0
style Halt fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0
Why? Catching confusion before it propagates is cheaper than debugging downstream.
All commands follow the pattern:
npx tsx bin/skillforge.ts <command> [options]Creates a complete skill project directory with config, docs, and folder structure.
npx tsx bin/skillforge.ts init my-api-skill \
--description "Generates REST API endpoints with Express" \
--language typescript \
--license MIT \
--author "My Organization" \
--output ./skills| Option | Default | Description |
|---|---|---|
-d, --description <text> |
"A new SkillForge skill" |
Skill description |
-l, --language <lang> |
typescript |
Primary language (typescript, python, javascript, dart) |
--license <license> |
- | License (e.g., MIT, Apache-2.0) |
--author <author> |
- | Author name or organization (stored in metadata) |
-o, --output <dir> |
. |
Where to create the project directory |
Generated structure:
my-api-skill/
├── SKILL.md # Starter agentskills.io skill definition
├── skillforge.config.json # Your skill's configuration (edit this!)
├── README.md # Auto-generated project readme
├── scripts/ # Validation scripts (auto-filled on build)
├── references/ # Reference docs your skill can consult
├── examples/ # Usage examples
└── assets/ # Static assets (templates, images, data)
Language presets: The --language flag controls which red-flag rules are included. Each language has a full preset automatically configured via the Plugin System:
| Language | Plugin Used | Rules Included |
|---|---|---|
typescript / javascript |
javascriptPlugin |
Bracket balance, unclosed strings/templates, hallucinated imports, duplicate declarations, 7 preset rules |
python |
pythonPlugin |
Mixed indentation, missing colons, bracket balance, 6 preset rules |
json |
jsonPlugin |
JSON.parse validation, 2 preset rules |
dart / flutter |
dartPlugin |
Bracket balance, missing semicolons, empty build(), hallucinated imports, duplicate classes, missing @override, orphan StatefulWidget, 13 preset rules including deprecated widget detection |
Runs the complete SkillForge pipeline: Decompose → Validate → Assemble → Write.
flowchart LR
Config["skillforge.config.json"] --> D["🧬 Decompose"]
D --> V["🚩 Validate"]
V --> A["📦 Assemble"]
A --> W["💾 Write"]
W --> SK["SKILL.md"]
W --> SC["scripts/"]
W --> RF["references/"]
W --> EX["examples/"]
W --> RP["report.json"]
style Config fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style D fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style V fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style A fill:#1e293b,stroke:#10b981,color:#e2e8f0
style W fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style SK fill:#065f46,stroke:#10b981,color:#e2e8f0
style SC fill:#065f46,stroke:#10b981,color:#e2e8f0
style RF fill:#065f46,stroke:#10b981,color:#e2e8f0
style EX fill:#065f46,stroke:#10b981,color:#e2e8f0
style RP fill:#065f46,stroke:#10b981,color:#e2e8f0
npx tsx bin/skillforge.ts build \
--config ./skillforge.config.json \
--output ./dist \
--verbose| Option | Default | Description |
|---|---|---|
-c, --config <path> |
./skillforge.config.json |
Config file path |
-o, --output <dir> |
./dist |
Output directory |
-v, --verbose |
false |
Show detailed pipeline output |
Output includes:
SKILL.md— The complete gentskills.io-compatible skillscripts/— Validation scripts for each atomic step + master validatorreferences/— Decomposition guide and reference docsexamples/— Example usage filedecomposition-report.json— Full decomposition data
Shows how your workflow steps will be broken down without generating output files. Useful for tuning your instructions before a full build.
npx tsx bin/skillforge.ts decompose --config ./skillforge.config.json| Option | Default | Description |
|---|---|---|
-c, --config <path> |
./skillforge.config.json |
Config file path |
Example output:
📋 Atomic Decomposition
═══════════════════════════════════════════
Step: analyze-requirements
→ 1.1 Read the user's component request carefully
→ 1.2 Identify the component name, purpose, and visual behavior
→ 1.3 List all required props and their types
...
Total: 4 parent steps → 22 atomic steps
Auto-generates and runs trigger tests and functional tests based on your config.
npx tsx bin/skillforge.ts test --config ./skillforge.config.json| Option | Default | Description |
|---|---|---|
-c, --config <path> |
./skillforge.config.json |
Config file path |
Example output:
🧪 Test Suite Results
═══════════════════════════════════════════
✅ [trigger] trigger_positive_0: Correctly triggered on: "create a React component"
✅ [trigger] trigger_negative_0: Correctly did NOT trigger on: "explain React concepts"
✅ [functional] functional_step-1: Skill structure validated
❌ [functional] functional_step-2: Step "step-2" requires input "analysis" not produced
Results: 3/4 passed (4ms)
Exits with code 1 if any test fails — CI-friendly.
Lists all red-flag rules for every atomic step — useful for auditing your skill's safety net.
npx tsx bin/skillforge.ts inspect --config ./skillforge.config.jsonExample output:
🚩 Red-Flag Analysis
═══════════════════════════════════════════
Step: analyze-requirements_1.1
🔴 [syntax] Syntax Error Detection: Generated code contains syntax errors
🟡 [regex] Placeholder Content: Output contains TODO/FIXME placeholders
Total red-flag rules: 42
Steps covered: 22/22
Shows configuration for runtime consensus validation. Outputs example SDK code.
npx tsx bin/skillforge.ts validate \
--config ./skillforge.config.json \
--threshold 2 \
--agents 3| Option | Default | Description |
|---|---|---|
-c, --config <path> |
./skillforge.config.json |
Config file path |
-k, --threshold <n> |
2 |
K-threshold for consensus |
-a, --agents <n> |
3 |
Number of parallel agents |
The skillforge.config.json file is the heart of every skill:
The instruction field is what gets decomposed into atomic steps. Write it as a numbered list for the cleanest decomposition:
"instruction": "1. Read the user's request carefully. 2. Identify all required props and their types. 3. Define the TypeScript interface. 4. Add JSDoc comments. 5. Export the interface."Decomposition modes (auto-detected by priority):
| Priority | Format | Example |
|---|---|---|
| 1 | Numbered lists | 1. ... 2. ... 3. ... |
| 2 | Bullet points | - ... - ... |
| 3 | Sentence splitting | Splits at period boundaries |
| 4 | Length-based | Force-splits very long instructions |
"steps": [
{
"id": "step-1",
"inputs": ["user_request"], // Always available
"outputs": ["component_spec"] // Produced by this step
},
{
"id": "step-2",
"inputs": ["component_spec"], // Consumed from step-1
"outputs": ["implementation"]
}
]SkillForge validates this data flow at build-time — if step-2 requires an input that no previous step produces, you'll get a clear error.
Goal: Generate production-ready React components with TypeScript, tests, and accessibility.
{
"name": "react-component-generator",
"description": "Generates production-ready React functional components with TypeScript props interfaces, ARIA accessibility attributes, and comprehensive unit tests.",
"license": "MIT",
"metadata": {
"author": "Acme Corp",
"version": "1.0.0"
},
"triggers": [
"create a React component",
"build a React component",
"generate a React component with TypeScript"
],
"negativeTriggers": [
"explain React concepts",
"debug an existing React component"
],
"steps": [
{
"id": "analyze-requirements",
"title": "Analyze Component Requirements",
"instruction": "1. Read the user's component request carefully. 2. Identify the component name, purpose, and visual behavior. 3. List all required props and their types. 4. Identify any state management needs. 5. Note accessibility requirements.",
"inputs": ["user_request"],
"outputs": ["component_spec"],
"validationCriteria": "Component spec includes: name, props interface, state requirements, and accessibility notes."
},
{
"id": "implement-component",
"title": "Implement Component",
"instruction": "1. Create the functional component using React.FC with the props interface. 2. Destructure props with default values. 3. Implement hooks for state and side effects. 4. Add all necessary ARIA attributes.",
"inputs": ["component_spec"],
"outputs": ["component_code"],
"validationCriteria": "Component compiles without errors, uses proper hooks, has all ARIA attributes."
},
{
"id": "write-tests",
"title": "Write Unit Tests",
"instruction": "1. Import React Testing Library and the component. 2. Write a test for default rendering. 3. Write tests for each prop variation. 4. Write tests for user interactions. 5. Verify all ARIA attributes.",
"inputs": ["component_code"],
"outputs": ["test_code"],
"validationCriteria": "All branches covered. Tests verify rendering, interactions, and accessibility."
}
],
"language": "typescript",
"maker": {
"agentCount": 3,
"kThreshold": 2,
"maxOutputLength": 10000,
"maxRetries": 3,
"globalRedFlagRules": [
{
"id": "no_class_components",
"name": "No Class Components",
"description": "Must use functional components, not class components",
"type": "regex",
"pattern": "class\\s+\\w+\\s+extends\\s+(React\\.)?Component",
"severity": "warning"
}
]
}
}Goal: Generate API documentation from source code.
{
"name": "api-docs-generator",
"description": "Generates comprehensive API documentation from source code with function signatures, parameter tables, and code examples.",
"triggers": ["generate API documentation", "document this API"],
"negativeTriggers": ["write a README", "create a tutorial"],
"steps": [
{
"id": "parse-source",
"title": "Parse Source Code",
"instruction": "1. Read all provided source files. 2. Extract every exported function, class, and type. 3. Capture: name, parameters with types, return type, docstrings.",
"inputs": ["user_request"],
"outputs": ["parsed_exports"],
"validationCriteria": "Every public export is captured with full type information."
},
{
"id": "generate-docs",
"title": "Generate Documentation",
"instruction": "1. Create a markdown file with title and overview. 2. For each function, generate: signature, description, parameter table, return type, and code example. 3. Add a table of contents.",
"inputs": ["parsed_exports"],
"outputs": ["documentation"],
"validationCriteria": "Every export has a complete documentation section."
}
],
"language": "typescript",
"maker": {
"agentCount": 3,
"kThreshold": 2,
"maxOutputLength": 20000,
"maxRetries": 3,
"globalRedFlagRules": [
{
"id": "no_placeholder_docs",
"name": "No Placeholder Documentation",
"description": "Must not contain TBD or placeholder text",
"type": "regex",
"pattern": "(TBD|TODO|PLACEHOLDER|\\[description here\\])",
"severity": "error"
}
]
}
}Goal: Data transformation pipeline: ingest → clean → validate → transform → report.
- Multiple outputs per step —
ingestproduces bothraw_recordsANDschema_info - Higher retries (
maxRetries: 5) — data processing is harder to get right - Python language — validation scripts check Python syntax
{
"name": "data-pipeline-builder",
"description": "Creates data transformation pipelines: clean, validate, transform, and report.",
"triggers": ["build a data pipeline", "process this dataset"],
"negativeTriggers": ["visualize data", "create a chart"],
"steps": [
{
"id": "ingest",
"title": "Ingest Raw Data",
"instruction": "1. Read the input file (CSV or JSON). 2. Parse into structured records. 3. Log total record count and column names.",
"inputs": ["user_request"],
"outputs": ["raw_records", "schema_info"],
"validationCriteria": "All records parsed. Column names detected."
},
{
"id": "transform",
"title": "Transform Data",
"instruction": "1. Apply user-requested transformations. 2. Calculate summary statistics for numeric columns.",
"inputs": ["raw_records", "schema_info"],
"outputs": ["transformed_data"],
"validationCriteria": "All transformations applied. Statistics are correct."
},
{
"id": "report",
"title": "Generate Report",
"instruction": "1. Create summary with total records processed. 2. Include statistics. 3. Sample of transformed output (first 10 records).",
"inputs": ["transformed_data"],
"outputs": ["pipeline_report"],
"validationCriteria": "Report includes all statistics."
}
],
"language": "python",
"maker": {
"agentCount": 3,
"kThreshold": 2,
"maxOutputLength": 15000,
"maxRetries": 5,
"globalRedFlagRules": [
{
"id": "no_hardcoded_paths",
"name": "No Hardcoded File Paths",
"description": "Pipeline code must not contain hardcoded paths",
"type": "regex",
"pattern": "(/Users/|/home/|C:\\\\|D:\\\\)",
"severity": "error"
}
]
}
}Goal: SQL migrations with strict safety rules.
- Higher consensus (
agentCount: 5,kThreshold: 3) — critical operations need stronger agreement - Error-severity rules —
DROPwithout backup andTRUNCATEimmediately halt the pipeline
{
"name": "safe-sql-migration",
"description": "Generates safe, reversible SQL migrations with rollback scripts.",
"triggers": ["create a database migration", "add a column to the database"],
"negativeTriggers": ["write a SQL query", "optimize a query"],
"steps": [
{
"id": "generate-up",
"title": "Generate Forward Migration",
"instruction": "1. Write SQL using CREATE/ALTER. 2. Wrap in transaction. 3. Use snake_case. 4. Add IF NOT EXISTS guards.",
"inputs": ["user_request"],
"outputs": ["up_migration"],
"validationCriteria": "SQL is valid. Uses transactions. Snake_case identifiers."
},
{
"id": "generate-down",
"title": "Generate Rollback",
"instruction": "1. Write SQL to reverse the forward migration. 2. Create backup tables for destructive ops. 3. Wrap in transaction.",
"inputs": ["up_migration"],
"outputs": ["down_migration"],
"validationCriteria": "Rollback exactly reverses the forward migration."
}
],
"maker": {
"agentCount": 5,
"kThreshold": 3,
"maxOutputLength": 8000,
"maxRetries": 3,
"globalRedFlagRules": [
{
"id": "no_drop_without_backup",
"name": "No DROP Without Backup",
"type": "regex",
"pattern": "DROP\\s+TABLE(?!.*CREATE\\s+TABLE.*backup)",
"severity": "error"
},
{
"id": "no_truncate",
"name": "No TRUNCATE",
"type": "regex",
"pattern": "TRUNCATE\\s+TABLE",
"severity": "error"
}
]
}
}Goal: Security vulnerability review with structured audit report.
- Parallel steps —
check-injectionandcheck-xssboth consumeattack_surfaceindependently - Fan-in —
generate-reportconsumes from two parent steps
flowchart TD
UR["user_request"] --> SCAN["scan-surface"]
SCAN --> AS["attack_surface"]
AS --> INJ["check-injection"]
AS --> XSS["check-xss"]
INJ --> IF["injection_findings"]
XSS --> XF["xss_findings"]
IF --> RPT["generate-report"]
XF --> RPT
RPT --> AR["audit_report"]
style UR fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style SCAN fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style AS fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style INJ fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style XSS fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style IF fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style XF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style RPT fill:#1e293b,stroke:#10b981,color:#e2e8f0
style AR fill:#065f46,stroke:#10b981,color:#e2e8f0
{
"name": "security-audit",
"description": "Reviews code for security vulnerabilities and produces a structured audit report.",
"triggers": ["audit this code for security", "find vulnerabilities"],
"negativeTriggers": ["implement authentication", "write a security policy"],
"steps": [
{
"id": "scan-surface",
"title": "Identify Attack Surface",
"instruction": "1. Identify user input entry points. 2. Identify database queries. 3. Identify file system operations.",
"inputs": ["user_request"],
"outputs": ["attack_surface"],
"validationCriteria": "All entry points identified."
},
{
"id": "check-injection",
"title": "Check Injection Vulnerabilities",
"instruction": "1. Check for SQL injection. 2. Check for command injection. 3. Note file, line, and severity.",
"inputs": ["attack_surface"],
"outputs": ["injection_findings"],
"validationCriteria": "Every query and exec call checked."
},
{
"id": "check-xss",
"title": "Check XSS and CSRF",
"instruction": "1. Check for unescaped user input. 2. Check for CSRF token validation. 3. Check cookie flags.",
"inputs": ["attack_surface"],
"outputs": ["xss_findings"],
"validationCriteria": "All rendering paths checked."
},
{
"id": "generate-report",
"title": "Generate Audit Report",
"instruction": "1. Combine findings. 2. Sort by severity. 3. Add recommended fixes. 4. Add executive summary.",
"inputs": ["injection_findings", "xss_findings"],
"outputs": ["audit_report"],
"validationCriteria": "All findings included. Sorted by severity."
}
],
"maker": {
"agentCount": 3,
"kThreshold": 2,
"maxOutputLength": 15000,
"maxRetries": 3,
"globalRedFlagRules": [
{
"id": "no_blanket_approval",
"name": "No Blanket Safe Approval",
"description": "Must not claim 'no vulnerabilities found' without analysis",
"type": "regex",
"pattern": "(no (security )?issues found|code is secure|no vulnerabilities)",
"severity": "warning"
}
]
}
}SkillForge uses a plugin architecture for language-specific validation. Each language is a self-contained plugin that provides syntax checking, AST analysis, and red-flag presets.
flowchart TD
RF["red-flag.ts\ndetectRedFlags"]
RF -->|"length, regex, custom"| Direct["Handle Directly"]
RF -->|"syntax / ast"| PR["PluginRegistry\nSingleton"]
PR --> JS["JavaScript /\nTypeScript Plugin"]
PR --> PY["Python\nPlugin"]
PR --> JSON["JSON\nPlugin"]
PR --> DART["Dart / Flutter\nPlugin"]
PR --> CUSTOM["Your Custom\nPlugin"]
JS --> R1["checkSyntax\ncheckAST\ngetRedFlagPreset"]
PY --> R2["checkSyntax\ncheckAST\ngetRedFlagPreset"]
JSON --> R3["checkSyntax\ncheckAST\ngetRedFlagPreset"]
DART --> R4["checkSyntax\ncheckAST\ngetRedFlagPreset"]
CUSTOM --> R5["checkSyntax\ncheckAST\ngetRedFlagPreset"]
style RF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style Direct fill:#1e293b,stroke:#6366f1,color:#e2e8f0
style PR fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
style JS fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
style PY fill:#1e293b,stroke:#10b981,color:#e2e8f0
style JSON fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style DART fill:#1e293b,stroke:#06b6d4,color:#e2e8f0
style CUSTOM fill:#1e293b,stroke:#ec4899,color:#e2e8f0
style R1 fill:#0f172a,stroke:#3b82f6,color:#94a3b8
style R2 fill:#0f172a,stroke:#10b981,color:#94a3b8
style R3 fill:#0f172a,stroke:#f59e0b,color:#94a3b8
style R4 fill:#0f172a,stroke:#06b6d4,color:#94a3b8
style R5 fill:#0f172a,stroke:#ec4899,color:#94a3b8
SkillForge ships with 4 built-in plugins, automatically registered on import:
Languages: javascript, typescript
Syntax checks:
- Balanced braces
{}, parentheses(), brackets[] - Unclosed template literals
- Unclosed string literals (per-line)
AST checks:
- Hallucinated imports (suspiciously long package names like
my-super-awesome-package) - Duplicate function/variable declarations
Preset: 7 rules — length, syntax, AST, empty output, apology, placeholders, truncation
Languages: python
Syntax checks:
- Balanced parentheses and brackets
- Mixed tabs and spaces in indentation
- Missing colons after
def,class,if,for,while, etc.
Preset: 6 rules — length, syntax, empty output, apology, placeholders (including pass #), truncation
Languages: json
Syntax checks:
- Full
JSON.parse()validation — catches all JSON syntax errors
Preset: 2 rules — syntax, empty output
Languages: dart, flutter
Syntax checks:
- Balanced braces, parentheses, brackets, and generic angle brackets
- Unclosed string literals (handles raw strings
r'...') - Missing semicolons after
var,final,const,return,throw,late - Empty
build()method bodies (Flutter-specific) - Broken cascade notation
..
AST checks:
- Hallucinated
package:imports - Duplicate class declarations
- Duplicate top-level function declarations
- Missing
@overrideon lifecycle methods (build,initState,dispose, etc.) StatefulWidgetwithout matchingState<WidgetName>class
Preset: 13 rules — length, syntax, AST, empty output, apology, placeholders, truncation, debug print(), deprecated widgets (RaisedButton, FlatButton, OutlineButton), excessive dynamic type, hardcoded colors
Create a plugin for any language in 3 steps:
import type { LanguagePlugin } from 'skillforge';
const swiftPlugin: LanguagePlugin = {
name: 'swift',
languages: ['swift', 'swiftui'],
checkSyntax(code: string): string[] {
const errors: string[] = [];
// Example: check for balanced braces
const opens = (code.match(/{/g) ?? []).length;
const closes = (code.match(/}/g) ?? []).length;
if (opens !== closes) {
errors.push(`Unbalanced braces: ${opens} opening vs ${closes} closing`);
}
// Example: check for force-unwrap abuse
const forceUnwraps = (code.match(/\w+!/g) ?? []).length;
if (forceUnwraps > 5) {
errors.push(`Excessive force-unwrapping: ${forceUnwraps} instances — use optional binding`);
}
return errors;
},
checkAST(code: string): string[] {
const issues: string[] = [];
// Example: detect duplicate struct declarations
const structPattern = /struct\s+(\w+)/g;
const structs = new Map<string, number>();
let match;
while ((match = structPattern.exec(code)) !== null) {
structs.set(match[1], (structs.get(match[1]) ?? 0) + 1);
}
for (const [name, count] of structs) {
if (count > 1) {
issues.push(`Duplicate struct: "${name}" declared ${count} times`);
}
}
return issues;
},
getRedFlagPreset() {
return [
{
id: 'swift_syntax',
name: 'Swift Syntax Error Detection',
description: 'Generated Swift code contains syntax errors',
type: 'syntax' as const,
language: 'swift',
severity: 'error' as const,
},
{
id: 'swift_ast',
name: 'Swift AST Issue Detection',
description: 'Generated Swift code has structural issues',
type: 'ast' as const,
language: 'swift',
severity: 'warning' as const,
},
{
id: 'swift_force_unwrap',
name: 'Force Unwrap Detection',
description: 'Code uses excessive force-unwrapping (!)',
type: 'regex' as const,
pattern: '\\w+!\\.',
severity: 'warning' as const,
},
];
},
};import { registerPlugin } from 'skillforge';
registerPlugin(swiftPlugin);That's it! Now swift and swiftui are supported everywhere in SkillForge.
# CLI uses it automatically
npx tsx bin/skillforge.ts init my-swift-skill --language swift
# Or programmatically
import { getRedFlagPreset, detectRedFlags } from 'skillforge';
const rules = getRedFlagPreset('swift');
const detections = detectRedFlags(swiftCode, rules);interface LanguagePlugin {
name: string; // Unique plugin name
languages: string[]; // Language identifiers this plugin handles
checkSyntax(code: string): string[]; // Returns error descriptions
checkAST(code: string): string[]; // Returns issue descriptions
getRedFlagPreset(): RedFlagRule[]; // Returns preset rules
}Register a custom plugin with the global registry. If a language identifier is already registered, the new plugin overrides it.
import { registerPlugin } from 'skillforge';
registerPlugin(myPlugin);Direct access to the singleton registry for advanced use:
import { pluginRegistry } from 'skillforge';
pluginRegistry.getAllPlugins(); // LanguagePlugin[]
pluginRegistry.getSupportedLanguages(); // string[]
pluginRegistry.isSupported('swift'); // boolean
pluginRegistry.getPlugin('dart'); // LanguagePlugin | undefined
pluginRegistry.getPreset('dart'); // RedFlagRule[]
pluginRegistry.checkSyntax(code, 'dart'); // string[]
pluginRegistry.checkAST(code, 'dart'); // string[]Red-flag rules are the safety net that prevents bad output from being accepted.
| Type | Field | Description | Example |
|---|---|---|---|
length |
maxLength |
Output exceeds character limit | Code over 10,000 chars |
regex |
pattern |
Output matches a forbidden pattern | Apology phrases, TODOs |
syntax |
language |
Output has syntax errors (delegated to plugin) | Invalid TypeScript |
ast |
language |
Output has structural issues (delegated to plugin) | Hallucinated imports |
custom |
scriptPath |
Custom validation script | Your own checker |
| Severity | Behavior |
|---|---|
warning |
Output is discarded and resampled (agent tries again) |
error |
Pipeline halts immediately and reports the failure |
Get language-specific rule presets via the unified getRedFlagPreset() function:
import { getRedFlagPreset } from 'skillforge';
// Returns the full preset for any registered language
const tsRules = getRedFlagPreset('typescript'); // 7 rules
const dartRules = getRedFlagPreset('dart'); // 13 rules
const pyRules = getRedFlagPreset('python'); // 6 rules
const jsonRules = getRedFlagPreset('json'); // 2 rules
// Documentation preset (not language-specific)
import { getDocRedFlagPreset } from 'skillforge';
const docRules = getDocRedFlagPreset(); // 3 rules{
"id": "my_custom_rule",
"name": "Human-Readable Name",
"description": "What this rule catches and why",
"type": "regex",
"pattern": "your_regex_pattern_here",
"severity": "warning"
}Tips:
- Use
\\bfor word boundaries to avoid false positives - Use
\\s+for flexible whitespace matching - Test your regex against sample outputs before adding
- Prefer
warningfor style issues,errorfor correctness
SkillForge generates three types of tests automatically:
Verify the skill activates (or doesn't) for given prompts. Uses four weighted strategies:
| Strategy | Weight | How it works |
|---|---|---|
| Exact match | 1.0 | Direct string comparison |
| Keyword overlap | 0.7 | Token intersection |
| Fuzzy n-gram similarity | 0.5 | Character-level similarity |
| Description match | 0.3 | Semantic description comparison |
Verify structural integrity:
- All steps have validation criteria
- All steps have inputs and outputs
- Data flow is valid (no orphaned inputs)
- MAKER config is consistent (K ≤ agent count)
Score output quality on 9 heuristics (0-100%):
- Meaningful content, reasonable length, no placeholders, no apologies
- Structured code, error handling, documentation
- No debug logging, type annotations
Import everything from the main entry point:
import { detectRedFlags, decompose, generateSkill, registerPlugin } from 'skillforge';Scan output against an array of RedFlagRule objects. Returns RedFlagDetection[].
import { detectRedFlags, getRedFlagPreset } from 'skillforge';
const rules = getRedFlagPreset('typescript');
const detections = detectRedFlags(llmOutput, rules);
for (const d of detections) {
console.log(`🚩 ${d.ruleName}: ${d.reason}`);
if (d.discarded) console.log(' → Output discarded');
}Break a SkillConfig into AtomicStep[]. Each atomic step is a single-responsibility micro-step.
import { decompose, getDecompositionReport } from 'skillforge';
const steps = decompose(config);
const report = getDecompositionReport(steps);
console.log(`${report.parentSteps} steps → ${report.totalAtomicSteps} atomic steps`);Run K-threshold consensus validation on multiple LLM outputs for a step.
import { validateConsensus } from 'skillforge';
const result = validateConsensus(step, [outputA, outputB, outputC], {
kThreshold: 2,
similarityThreshold: 0.8,
});
if (result.consensusReached) {
console.log(`Consensus: ${result.votes} agents agreed`);
console.log(result.accepted);
}Run the full pipeline: decompose → validate → assemble → write.
import { createDefaultConfig, generateSkill } from 'skillforge';
const config = createDefaultConfig('my-skill', 'Does amazing things');
const result = generateSkill(config, {
outputDir: './dist',
verbose: true,
validate: true,
});Generate the final SKILL.md and all output files from decomposed steps.
Shared utility — counts bracket balance respecting string literals. Used by plugins internally.
import {
registerPlugin, // Register a custom LanguagePlugin
pluginRegistry, // Direct registry access
getRedFlagPreset, // Get rules for any language
// Individual built-in plugins
javascriptPlugin,
pythonPlugin,
jsonPlugin,
dartPlugin,
} from 'skillforge';
import type { LanguagePlugin } from 'skillforge';import {
generateSkill, // Full pipeline
createDefaultConfig, // Create a starter SkillConfig
// Template rendering
render, // Mustache-style template rendering
renderConfigJson, // Render config as formatted JSON
renderFrontmatter, // Render YAML frontmatter
renderDecompositionTable, // Render decomposition as table
// Script generation
generateCodeValidator, // Generate JS/TS validation script
generatePythonValidator, // Generate Python validation script
generateTestSuiteScript, // Generate test runner script
// Templates
getCodeSkillTemplate, // Get the SKILL.md template
} from 'skillforge';import {
generateDefaultTests, // Auto-generate tests from config
runTestSuite, // Run all tests
formatTestReport, // Format results for console
runTriggerTest, // Run a single trigger test
runFunctionalTest, // Run a single functional test
validateOutputAgainstExpected, // Compare output to expected
} from 'skillforge';
// Full test flow
const tests = generateDefaultTests(config);
const results = runTestSuite(config, tests);
console.log(formatTestReport(results));All TypeScript interfaces are exported for type-safe usage:
import type {
// Config
SkillConfig,
WorkflowStep,
MAKERConfig,
RedFlagRule,
// Decomposition
AtomicStep,
IOContract,
FieldSpec,
// Validation
ValidationResult,
ValidationError,
RedFlagDetection,
ConsensusResult,
CandidateOutput,
// Generation
GeneratedSkill,
GeneratedFile,
SkillFrontmatter,
GenerateResult,
GeneratorOptions,
// Testing
TestCase,
TestResult,
TestSuiteResult,
TestType,
// Plugin
LanguagePlugin,
// CLI
CLIOptions,
} from 'skillforge';skill_sdk/
├── bin/
│ └── skillforge.ts # CLI entry point (6 commands)
├── src/
│ ├── index.ts # Public SDK API (barrel export)
│ ├── types/
│ │ ├── index.ts # All TypeScript interfaces
│ │ └── plugin.ts # LanguagePlugin interface
│ ├── core/
│ │ ├── decomposer.ts # Maximal Agentic Decomposition
│ │ ├── validator.ts # K-threshold consensus voting
│ │ ├── red-flag.ts # Red-flag orchestrator (delegates to plugins)
│ │ ├── assembler.ts # SKILL.md generation
│ │ ├── plugin-registry.ts # Plugin registration & lookup
│ │ └── helpers.ts # Shared utilities (countBalance)
│ ├── plugins/
│ │ ├── index.ts # Auto-registers all built-in plugins
│ │ ├── javascript.ts # JS/TS plugin
│ │ ├── python.ts # Python plugin
│ │ ├── json.ts # JSON plugin
│ │ └── dart.ts # Dart/Flutter plugin
│ ├── generators/
│ │ ├── skill-generator.ts # Full pipeline orchestrator
│ │ ├── script-generator.ts # Validation script generators
│ │ └── template-engine.ts # Mustache-like renderer
│ └── testing/
│ ├── test-runner.ts # Test orchestrator
│ ├── trigger-test.ts # Trigger matching tests
│ └── functional-test.ts # Structural validation tests
├── tests/
│ ├── decomposer.test.ts # 12 tests
│ ├── validator.test.ts # 10 tests
│ ├── red-flag.test.ts # 24 tests
│ ├── plugins.test.ts # 22 tests
│ └── integration.test.ts # 5 tests
├── examples/
│ ├── react-component-skill/
│ ├── express-rest-api-generator/
│ └── flutter-accessibility-audit/
├── package.json
└── tsconfig.json
❌ Config not found: /path/to/skillforge.config.json
Ensure you're passing the correct --config path. Default looks for ./skillforge.config.json.
Structural issues found: Step "step-2" has no inputs or outputs
Every step needs at least one entry in inputs and outputs. The first step typically takes ["user_request"].
Step "step-3" requires input "analysis" which is not produced by any previous step
Check that outputs of a previous step includes the exact string used in inputs. Names must match exactly.
K-threshold (4) exceeds agent count (3)
kThreshold must be ≤ agentCount. If K > agents, consensus is impossible.
- Make the pattern more specific (add
\b,^,$) - Change severity from
errortowarning - Remove the rule from
globalRedFlagRules
getRedFlagPreset('swift') returns []
Custom plugins must be registered before use:
import { registerPlugin } from 'skillforge';
registerPlugin(swiftPlugin);Q: Does SkillForge call an LLM during build? No. SkillForge is a pure scaffolding and validation tool. It generates the skill definition, decomposition, validation scripts, and test suite — but the actual LLM calls happen at runtime.
Q: What's the difference between warning and error severity?
warning discards the output and retries. error halts the pipeline immediately — use this for safety-critical rules.
Q: Can steps run in parallel? Yes! If two steps consume the same input but don't depend on each other, they are implicitly parallel. SkillForge wires dependencies automatically.
Q: How many atomic steps should I expect? Roughly 5-12 per workflow step. A typical 4-step skill produces 20-40 atomic steps.
Q: Can I add a new programming language?
Yes! Create a LanguagePlugin and call registerPlugin(). See Writing a Custom Plugin.
Q: What languages are supported out of the box? JavaScript, TypeScript, Python, JSON, Dart, and Flutter. Each has a dedicated plugin with language-specific heuristics.
Q: What happens if no agents agree?
The step is retried up to maxRetries times. If it still fails, the pipeline reports the failure with all candidate outputs.
Q: Can I use custom validation scripts?
Yes. Use the custom red-flag rule type with a scriptPath. The script receives the output and should exit with 0 (pass) or 1 (fail).
Q: How do I override a built-in plugin?
Call registerPlugin() with a plugin that uses the same languages array. The new plugin replaces the built-in one in the registry.
MIT
{ // REQUIRED: Kebab-case name, max 64 characters "name": "my-skill-name", // REQUIRED: What the skill does + when to use it (max 1024 chars) "description": "Generates production-ready React components...", // OPTIONAL: agentskills.io metadata "license": "MIT", "metadata": { "author": "SkillForge Team", "version": "1.0.0" }, "compatibility": "Requires Node.js >= 18", // REQUIRED: Phrases that should activate this skill "triggers": [ "create a React component", "build a UI component" ], // RECOMMENDED: Phrases that should NOT activate this skill "negativeTriggers": [ "explain React concepts", "debug existing code" ], // REQUIRED: Your workflow steps (SkillForge decomposes these) "steps": [ { "id": "step-1", // Unique ID (kebab-case) "title": "Analyze Requirements", // Human-readable title "instruction": "...", // What the AI should do "inputs": ["user_request"], // Data this step needs "outputs": ["requirements_spec"], // Data this step produces "validationCriteria": "..." // How to verify the output } ], // OPTIONAL: Paths to reference docs the skill can consult "references": ["api-docs.md", "style-guide.md"], // OPTIONAL: Primary language for code output "language": "typescript", // REQUIRED: MAKER engine configuration "maker": { "agentCount": 3, // Parallel agents per step (default: 3) "kThreshold": 2, // Votes needed for consensus (default: 2) "maxOutputLength": 10000, // Max chars before red-flagging "maxRetries": 3, // Retry limit after red-flag discard "globalRedFlagRules": [] // Red-flag rules applied to ALL steps } }