SkillForge

Zero-error skill creation framework — fuses agentskills.io spec with MAKER reliability principles to generate production-ready AI skills that follow directions perfectly.

SkillForge takes your high-level workflow descriptions and transforms them into battle-tested, agentskills.io-compatible Skills with built-in validation, red-flag detection, and consensus verification — so the AI never hallucinates, truncates, or goes off-script.

Quick Start

# 1. Install
cd skill_sdk && npm install

# 2. Scaffold a skill
npx tsx bin/skillforge.ts init my-first-skill \
  --description "Generates unit tests for Python functions" \
  --language python

# 3. Edit the config
#    Open my-first-skill/skillforge.config.json and define your workflow

# 4. Build the skill
npx tsx bin/skillforge.ts build \
  --config my-first-skill/skillforge.config.json \
  --output my-first-skill/dist --verbose

# 5. Test the skill
npx tsx bin/skillforge.ts test \
  --config my-first-skill/skillforge.config.json

Installation

# Clone and install
git clone <repo-url> skill_sdk
cd skill_sdk
npm install

# Verify everything works
npm test

Requires Node.js ≥ 18 and npm.

Key npm Scripts

Script	Command	Description
`npm test`	`vitest run`	Run all 73 unit tests
`npm run test:watch`	`vitest`	Watch mode for tests
`npm run build`	`tsc`	Compile TypeScript to `dist/`
`npm run lint`	`tsc --noEmit`	Type-check without emitting
`npm start`	`tsx bin/skillforge.ts`	Run the CLI directly

Core Concepts

SkillForge is built on three pillars from the MAKER framework (Maximal Agentic Knowledge-Error Reduction):

flowchart LR
    A["📝 Workflow Steps"] --> B["🧬 Decompose"]
    B --> C["🚩 Validate"]
    C --> D["🗳️ Consensus"]
    D --> E["✅ Accepted Output"]
    D -->|"No agreement"| F["🔄 Retry"]
    F --> C
    C -->|"Red-flag triggered"| G["⛔ Discard & Resample"]
    G --> C

    style A fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style B fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style C fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style D fill:#1e293b,stroke:#10b981,color:#e2e8f0
    style E fill:#065f46,stroke:#10b981,color:#e2e8f0
    style F fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style G fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0

🧬 Maximal Agentic Decomposition (MAD)

Every workflow step you write gets automatically broken down into the smallest possible atomic sub-steps. Each sub-step has:

A single, clear instruction
An input/output contract
Its own validation script
Red-flag rules

flowchart TD
    WS["Workflow Step\n'Build a React component with tests'"]
    WS --> AS1["1.1 Read user request"]
    WS --> AS2["1.2 Identify component name"]
    WS --> AS3["1.3 List required props"]
    WS --> AS4["1.4 Define TypeScript interface"]
    WS --> AS5["1.5 Implement component"]
    WS --> AS6["1.6 Add ARIA attributes"]
    WS --> AS7["1.7 Write unit tests"]
    WS --> AS8["... +3 more"]

    AS1 -.-> V1["✅ Validate"]
    AS2 -.-> V2["✅ Validate"]
    AS3 -.-> V3["✅ Validate"]
    AS4 -.-> V4["✅ Validate"]

    style WS fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style AS1 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS2 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS4 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS5 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS6 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS7 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style AS8 fill:#1e293b,stroke:#475569,color:#94a3b8
    style V1 fill:#065f46,stroke:#10b981,color:#e2e8f0
    style V2 fill:#065f46,stroke:#10b981,color:#e2e8f0
    style V3 fill:#065f46,stroke:#10b981,color:#e2e8f0
    style V4 fill:#065f46,stroke:#10b981,color:#e2e8f0

Why? Smaller steps = less room for the AI to go off course. A step like "Build a React component with tests" becomes 10+ atomic steps, each verifiable independently.

🗳️ K-Threshold Consensus Voting

When multiple agents run the same step, their outputs are clustered by structural similarity. If at least K agents agree (default: 2 of 3), the consensus output is accepted. Otherwise, the step is retried.

flowchart TD
    Step["Atomic Step 1.3"]
    Step --> A1["Agent A"]
    Step --> A2["Agent B"]
    Step --> A3["Agent C"]

    A1 --> O1["Output A"]
    A2 --> O2["Output B"]
    A3 --> O3["Output C"]

    O1 --> Cluster["Similarity Clustering"]
    O2 --> Cluster
    O3 --> Cluster

    Cluster --> Vote{"K ≥ 2 agree?"}
    Vote -->|"✅ Yes"| Accept["Accept Consensus Output"]
    Vote -->|"❌ No"| Retry["Retry Step"]

    style Step fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style A1 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style A2 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style A3 fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style O1 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style O2 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style O3 fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style Cluster fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style Vote fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style Accept fill:#065f46,stroke:#10b981,color:#e2e8f0
    style Retry fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0

Why? If three agents independently produce the same answer, it's almost certainly correct.

🚩 Red-Flag Detection

Every output is scanned for suspicious patterns before acceptance:

Length violations — Output too long or too short
Regex patterns — Apologies, placeholders, truncation markers
Syntax errors — Invalid code in the target language
AST analysis — Hallucinated imports, duplicate definitions
Custom rules — Your own validation scripts

flowchart LR
    Output["LLM Output"] --> RF["Red-Flag\nEngine"]

    RF --> L["📏 Length"]
    RF --> R["🔍 Regex"]
    RF --> S["💻 Syntax"]
    RF --> A["🌳 AST"]
    RF --> C["⚙️ Custom"]

    L & R & S & A & C --> Decision{"Any flags?"}
    Decision -->|"Clean"| Pass["✅ Accept"]
    Decision -->|"Warning"| Discard["🔄 Discard & Resample"]
    Decision -->|"Error"| Halt["⛔ Halt Pipeline"]

    style Output fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style RF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style L fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style R fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style S fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style A fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style C fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style Decision fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style Pass fill:#065f46,stroke:#10b981,color:#e2e8f0
    style Discard fill:#78350f,stroke:#f59e0b,color:#e2e8f0
    style Halt fill:#7f1d1d,stroke:#ef4444,color:#e2e8f0

Why? Catching confusion before it propagates is cheaper than debugging downstream.

CLI Reference

All commands follow the pattern:

npx tsx bin/skillforge.ts <command> [options]

`init` — Scaffold a new skill

Creates a complete skill project directory with config, docs, and folder structure.

npx tsx bin/skillforge.ts init my-api-skill \
  --description "Generates REST API endpoints with Express" \
  --language typescript \
  --license MIT \
  --author "My Organization" \
  --output ./skills

Option	Default	Description
`-d, --description <text>`	`"A new SkillForge skill"`	Skill description
`-l, --language <lang>`	`typescript`	Primary language (`typescript`, `python`, `javascript`, `dart`)
`--license <license>`	-	License (e.g., `MIT`, `Apache-2.0`)
`--author <author>`	-	Author name or organization (stored in metadata)
`-o, --output <dir>`	`.`	Where to create the project directory

Generated structure:

my-api-skill/
├── SKILL.md                    # Starter agentskills.io skill definition
├── skillforge.config.json      # Your skill's configuration (edit this!)
├── README.md                   # Auto-generated project readme
├── scripts/                    # Validation scripts (auto-filled on build)
├── references/                 # Reference docs your skill can consult
├── examples/                   # Usage examples
└── assets/                     # Static assets (templates, images, data)

Language presets: The --language flag controls which red-flag rules are included. Each language has a full preset automatically configured via the Plugin System:

Language	Plugin Used	Rules Included
`typescript` / `javascript`	`javascriptPlugin`	Bracket balance, unclosed strings/templates, hallucinated imports, duplicate declarations, 7 preset rules
`python`	`pythonPlugin`	Mixed indentation, missing colons, bracket balance, 6 preset rules
`json`	`jsonPlugin`	JSON.parse validation, 2 preset rules
`dart` / `flutter`	`dartPlugin`	Bracket balance, missing semicolons, empty `build()`, hallucinated imports, duplicate classes, missing `@override`, orphan StatefulWidget, 13 preset rules including deprecated widget detection

`build` — Generate the full skill

Runs the complete SkillForge pipeline: Decompose → Validate → Assemble → Write.

flowchart LR
    Config["skillforge.config.json"] --> D["🧬 Decompose"]
    D --> V["🚩 Validate"]
    V --> A["📦 Assemble"]
    A --> W["💾 Write"]
    W --> SK["SKILL.md"]
    W --> SC["scripts/"]
    W --> RF["references/"]
    W --> EX["examples/"]
    W --> RP["report.json"]

    style Config fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style D fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style V fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style A fill:#1e293b,stroke:#10b981,color:#e2e8f0
    style W fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style SK fill:#065f46,stroke:#10b981,color:#e2e8f0
    style SC fill:#065f46,stroke:#10b981,color:#e2e8f0
    style RF fill:#065f46,stroke:#10b981,color:#e2e8f0
    style EX fill:#065f46,stroke:#10b981,color:#e2e8f0
    style RP fill:#065f46,stroke:#10b981,color:#e2e8f0

npx tsx bin/skillforge.ts build \
  --config ./skillforge.config.json \
  --output ./dist \
  --verbose

Option	Default	Description
`-c, --config <path>`	`./skillforge.config.json`	Config file path
`-o, --output <dir>`	`./dist`	Output directory
`-v, --verbose`	`false`	Show detailed pipeline output

Output includes:

SKILL.md — The complete gentskills.io-compatible skill
scripts/ — Validation scripts for each atomic step + master validator
references/ — Decomposition guide and reference docs
examples/ — Example usage file
decomposition-report.json — Full decomposition data

`decompose` — Preview atomic breakdown

Shows how your workflow steps will be broken down without generating output files. Useful for tuning your instructions before a full build.

npx tsx bin/skillforge.ts decompose --config ./skillforge.config.json

Option	Default	Description
`-c, --config <path>`	`./skillforge.config.json`	Config file path

Example output:

📋 Atomic Decomposition
═══════════════════════════════════════════

  Step: analyze-requirements
    → 1.1  Read the user's component request carefully
    → 1.2  Identify the component name, purpose, and visual behavior
    → 1.3  List all required props and their types
    ...

  Total: 4 parent steps → 22 atomic steps

`test` — Run the test suite

Auto-generates and runs trigger tests and functional tests based on your config.

npx tsx bin/skillforge.ts test --config ./skillforge.config.json

Option	Default	Description
`-c, --config <path>`	`./skillforge.config.json`	Config file path

Example output:

🧪 Test Suite Results
═══════════════════════════════════════════
✅ [trigger]     trigger_positive_0: Correctly triggered on: "create a React component"
✅ [trigger]     trigger_negative_0: Correctly did NOT trigger on: "explain React concepts"
✅ [functional]  functional_step-1: Skill structure validated
❌ [functional]  functional_step-2: Step "step-2" requires input "analysis" not produced

Results: 3/4 passed (4ms)

Exits with code 1 if any test fails — CI-friendly.

`inspect` — Red-flag analysis

Lists all red-flag rules for every atomic step — useful for auditing your skill's safety net.

npx tsx bin/skillforge.ts inspect --config ./skillforge.config.json

Example output:

🚩 Red-Flag Analysis
═══════════════════════════════════════════

Step: analyze-requirements_1.1
  🔴 [syntax] Syntax Error Detection: Generated code contains syntax errors
  🟡 [regex]  Placeholder Content: Output contains TODO/FIXME placeholders

Total red-flag rules: 42
Steps covered: 22/22

`validate` — Consensus validation info

Shows configuration for runtime consensus validation. Outputs example SDK code.

npx tsx bin/skillforge.ts validate \
  --config ./skillforge.config.json \
  --threshold 2 \
  --agents 3

Option	Default	Description
`-c, --config <path>`	`./skillforge.config.json`	Config file path
`-k, --threshold <n>`	`2`	K-threshold for consensus
`-a, --agents <n>`	`3`	Number of parallel agents

Configuration Reference

The skillforge.config.json file is the heart of every skill:

{
  // REQUIRED: Kebab-case name, max 64 characters
  "name": "my-skill-name",

  // REQUIRED: What the skill does + when to use it (max 1024 chars)
  "description": "Generates production-ready React components...",

  // OPTIONAL: agentskills.io metadata
  "license": "MIT",
  "metadata": {
    "author": "SkillForge Team",
    "version": "1.0.0"
  },
  "compatibility": "Requires Node.js >= 18",

  // REQUIRED: Phrases that should activate this skill
  "triggers": [
    "create a React component",
    "build a UI component"
  ],

  // RECOMMENDED: Phrases that should NOT activate this skill
  "negativeTriggers": [
    "explain React concepts",
    "debug existing code"
  ],

  // REQUIRED: Your workflow steps (SkillForge decomposes these)
  "steps": [
    {
      "id": "step-1",                     // Unique ID (kebab-case)
      "title": "Analyze Requirements",    // Human-readable title
      "instruction": "...",               // What the AI should do
      "inputs": ["user_request"],         // Data this step needs
      "outputs": ["requirements_spec"],   // Data this step produces
      "validationCriteria": "..."         // How to verify the output
    }
  ],

  // OPTIONAL: Paths to reference docs the skill can consult
  "references": ["api-docs.md", "style-guide.md"],

  // OPTIONAL: Primary language for code output
  "language": "typescript",

  // REQUIRED: MAKER engine configuration
  "maker": {
    "agentCount": 3,              // Parallel agents per step (default: 3)
    "kThreshold": 2,              // Votes needed for consensus (default: 2)
    "maxOutputLength": 10000,     // Max chars before red-flagging
    "maxRetries": 3,              // Retry limit after red-flag discard
    "globalRedFlagRules": []      // Red-flag rules applied to ALL steps
  }
}

Writing Effective Instructions

The instruction field is what gets decomposed into atomic steps. Write it as a numbered list for the cleanest decomposition:

"instruction": "1. Read the user's request carefully. 2. Identify all required props and their types. 3. Define the TypeScript interface. 4. Add JSDoc comments. 5. Export the interface."

Decomposition modes (auto-detected by priority):

Priority	Format	Example
1	Numbered lists	`1. ... 2. ... 3. ...`
2	Bullet points	`- ... - ...`
3	Sentence splitting	Splits at period boundaries
4	Length-based	Force-splits very long instructions

Defining Data Flow

"steps": [
  {
    "id": "step-1",
    "inputs": ["user_request"],        // Always available
    "outputs": ["component_spec"]      // Produced by this step
  },
  {
    "id": "step-2",
    "inputs": ["component_spec"],      // Consumed from step-1
    "outputs": ["implementation"]
  }
]

SkillForge validates this data flow at build-time — if step-2 requires an input that no previous step produces, you'll get a clear error.

Creating Skills

Scenario 1: Code Generation Skill

Goal: Generate production-ready React components with TypeScript, tests, and accessibility.

{
  "name": "react-component-generator",
  "description": "Generates production-ready React functional components with TypeScript props interfaces, ARIA accessibility attributes, and comprehensive unit tests.",
  "license": "MIT",
  "metadata": {
    "author": "Acme Corp",
    "version": "1.0.0"
  },
  "triggers": [
    "create a React component",
    "build a React component",
    "generate a React component with TypeScript"
  ],
  "negativeTriggers": [
    "explain React concepts",
    "debug an existing React component"
  ],
  "steps": [
    {
      "id": "analyze-requirements",
      "title": "Analyze Component Requirements",
      "instruction": "1. Read the user's component request carefully. 2. Identify the component name, purpose, and visual behavior. 3. List all required props and their types. 4. Identify any state management needs. 5. Note accessibility requirements.",
      "inputs": ["user_request"],
      "outputs": ["component_spec"],
      "validationCriteria": "Component spec includes: name, props interface, state requirements, and accessibility notes."
    },
    {
      "id": "implement-component",
      "title": "Implement Component",
      "instruction": "1. Create the functional component using React.FC with the props interface. 2. Destructure props with default values. 3. Implement hooks for state and side effects. 4. Add all necessary ARIA attributes.",
      "inputs": ["component_spec"],
      "outputs": ["component_code"],
      "validationCriteria": "Component compiles without errors, uses proper hooks, has all ARIA attributes."
    },
    {
      "id": "write-tests",
      "title": "Write Unit Tests",
      "instruction": "1. Import React Testing Library and the component. 2. Write a test for default rendering. 3. Write tests for each prop variation. 4. Write tests for user interactions. 5. Verify all ARIA attributes.",
      "inputs": ["component_code"],
      "outputs": ["test_code"],
      "validationCriteria": "All branches covered. Tests verify rendering, interactions, and accessibility."
    }
  ],
  "language": "typescript",
  "maker": {
    "agentCount": 3,
    "kThreshold": 2,
    "maxOutputLength": 10000,
    "maxRetries": 3,
    "globalRedFlagRules": [
      {
        "id": "no_class_components",
        "name": "No Class Components",
        "description": "Must use functional components, not class components",
        "type": "regex",
        "pattern": "class\\s+\\w+\\s+extends\\s+(React\\.)?Component",
        "severity": "warning"
      }
    ]
  }
}

Scenario 2: Documentation Skill

Goal: Generate API documentation from source code.

{
  "name": "api-docs-generator",
  "description": "Generates comprehensive API documentation from source code with function signatures, parameter tables, and code examples.",
  "triggers": ["generate API documentation", "document this API"],
  "negativeTriggers": ["write a README", "create a tutorial"],
  "steps": [
    {
      "id": "parse-source",
      "title": "Parse Source Code",
      "instruction": "1. Read all provided source files. 2. Extract every exported function, class, and type. 3. Capture: name, parameters with types, return type, docstrings.",
      "inputs": ["user_request"],
      "outputs": ["parsed_exports"],
      "validationCriteria": "Every public export is captured with full type information."
    },
    {
      "id": "generate-docs",
      "title": "Generate Documentation",
      "instruction": "1. Create a markdown file with title and overview. 2. For each function, generate: signature, description, parameter table, return type, and code example. 3. Add a table of contents.",
      "inputs": ["parsed_exports"],
      "outputs": ["documentation"],
      "validationCriteria": "Every export has a complete documentation section."
    }
  ],
  "language": "typescript",
  "maker": {
    "agentCount": 3,
    "kThreshold": 2,
    "maxOutputLength": 20000,
    "maxRetries": 3,
    "globalRedFlagRules": [
      {
        "id": "no_placeholder_docs",
        "name": "No Placeholder Documentation",
        "description": "Must not contain TBD or placeholder text",
        "type": "regex",
        "pattern": "(TBD|TODO|PLACEHOLDER|\\[description here\\])",
        "severity": "error"
      }
    ]
  }
}

Scenario 3: Multi-Step Pipeline Skill

Goal: Data transformation pipeline: ingest → clean → validate → transform → report.

Multiple outputs per step — ingest produces both raw_records AND schema_info
Higher retries (maxRetries: 5) — data processing is harder to get right
Python language — validation scripts check Python syntax

{
  "name": "data-pipeline-builder",
  "description": "Creates data transformation pipelines: clean, validate, transform, and report.",
  "triggers": ["build a data pipeline", "process this dataset"],
  "negativeTriggers": ["visualize data", "create a chart"],
  "steps": [
    {
      "id": "ingest",
      "title": "Ingest Raw Data",
      "instruction": "1. Read the input file (CSV or JSON). 2. Parse into structured records. 3. Log total record count and column names.",
      "inputs": ["user_request"],
      "outputs": ["raw_records", "schema_info"],
      "validationCriteria": "All records parsed. Column names detected."
    },
    {
      "id": "transform",
      "title": "Transform Data",
      "instruction": "1. Apply user-requested transformations. 2. Calculate summary statistics for numeric columns.",
      "inputs": ["raw_records", "schema_info"],
      "outputs": ["transformed_data"],
      "validationCriteria": "All transformations applied. Statistics are correct."
    },
    {
      "id": "report",
      "title": "Generate Report",
      "instruction": "1. Create summary with total records processed. 2. Include statistics. 3. Sample of transformed output (first 10 records).",
      "inputs": ["transformed_data"],
      "outputs": ["pipeline_report"],
      "validationCriteria": "Report includes all statistics."
    }
  ],
  "language": "python",
  "maker": {
    "agentCount": 3,
    "kThreshold": 2,
    "maxOutputLength": 15000,
    "maxRetries": 5,
    "globalRedFlagRules": [
      {
        "id": "no_hardcoded_paths",
        "name": "No Hardcoded File Paths",
        "description": "Pipeline code must not contain hardcoded paths",
        "type": "regex",
        "pattern": "(/Users/|/home/|C:\\\\|D:\\\\)",
        "severity": "error"
      }
    ]
  }
}

Scenario 4: Strict Compliance Skill

Goal: SQL migrations with strict safety rules.

Higher consensus (agentCount: 5, kThreshold: 3) — critical operations need stronger agreement
Error-severity rules — DROP without backup and TRUNCATE immediately halt the pipeline

{
  "name": "safe-sql-migration",
  "description": "Generates safe, reversible SQL migrations with rollback scripts.",
  "triggers": ["create a database migration", "add a column to the database"],
  "negativeTriggers": ["write a SQL query", "optimize a query"],
  "steps": [
    {
      "id": "generate-up",
      "title": "Generate Forward Migration",
      "instruction": "1. Write SQL using CREATE/ALTER. 2. Wrap in transaction. 3. Use snake_case. 4. Add IF NOT EXISTS guards.",
      "inputs": ["user_request"],
      "outputs": ["up_migration"],
      "validationCriteria": "SQL is valid. Uses transactions. Snake_case identifiers."
    },
    {
      "id": "generate-down",
      "title": "Generate Rollback",
      "instruction": "1. Write SQL to reverse the forward migration. 2. Create backup tables for destructive ops. 3. Wrap in transaction.",
      "inputs": ["up_migration"],
      "outputs": ["down_migration"],
      "validationCriteria": "Rollback exactly reverses the forward migration."
    }
  ],
  "maker": {
    "agentCount": 5,
    "kThreshold": 3,
    "maxOutputLength": 8000,
    "maxRetries": 3,
    "globalRedFlagRules": [
      {
        "id": "no_drop_without_backup",
        "name": "No DROP Without Backup",
        "type": "regex",
        "pattern": "DROP\\s+TABLE(?!.*CREATE\\s+TABLE.*backup)",
        "severity": "error"
      },
      {
        "id": "no_truncate",
        "name": "No TRUNCATE",
        "type": "regex",
        "pattern": "TRUNCATE\\s+TABLE",
        "severity": "error"
      }
    ]
  }
}

Scenario 5: Review / Audit Skill

Goal: Security vulnerability review with structured audit report.

Parallel steps — check-injection and check-xss both consume attack_surface independently
Fan-in — generate-report consumes from two parent steps

flowchart TD
    UR["user_request"] --> SCAN["scan-surface"]
    SCAN --> AS["attack_surface"]
    AS --> INJ["check-injection"]
    AS --> XSS["check-xss"]
    INJ --> IF["injection_findings"]
    XSS --> XF["xss_findings"]
    IF --> RPT["generate-report"]
    XF --> RPT
    RPT --> AR["audit_report"]

    style UR fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style SCAN fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style AS fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style INJ fill:#1e293b,stroke:#ef4444,color:#e2e8f0
    style XSS fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style IF fill:#1e293b,stroke:#ef4444,color:#e2e8f0
    style XF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style RPT fill:#1e293b,stroke:#10b981,color:#e2e8f0
    style AR fill:#065f46,stroke:#10b981,color:#e2e8f0

{
  "name": "security-audit",
  "description": "Reviews code for security vulnerabilities and produces a structured audit report.",
  "triggers": ["audit this code for security", "find vulnerabilities"],
  "negativeTriggers": ["implement authentication", "write a security policy"],
  "steps": [
    {
      "id": "scan-surface",
      "title": "Identify Attack Surface",
      "instruction": "1. Identify user input entry points. 2. Identify database queries. 3. Identify file system operations.",
      "inputs": ["user_request"],
      "outputs": ["attack_surface"],
      "validationCriteria": "All entry points identified."
    },
    {
      "id": "check-injection",
      "title": "Check Injection Vulnerabilities",
      "instruction": "1. Check for SQL injection. 2. Check for command injection. 3. Note file, line, and severity.",
      "inputs": ["attack_surface"],
      "outputs": ["injection_findings"],
      "validationCriteria": "Every query and exec call checked."
    },
    {
      "id": "check-xss",
      "title": "Check XSS and CSRF",
      "instruction": "1. Check for unescaped user input. 2. Check for CSRF token validation. 3. Check cookie flags.",
      "inputs": ["attack_surface"],
      "outputs": ["xss_findings"],
      "validationCriteria": "All rendering paths checked."
    },
    {
      "id": "generate-report",
      "title": "Generate Audit Report",
      "instruction": "1. Combine findings. 2. Sort by severity. 3. Add recommended fixes. 4. Add executive summary.",
      "inputs": ["injection_findings", "xss_findings"],
      "outputs": ["audit_report"],
      "validationCriteria": "All findings included. Sorted by severity."
    }
  ],
  "maker": {
    "agentCount": 3,
    "kThreshold": 2,
    "maxOutputLength": 15000,
    "maxRetries": 3,
    "globalRedFlagRules": [
      {
        "id": "no_blanket_approval",
        "name": "No Blanket Safe Approval",
        "description": "Must not claim 'no vulnerabilities found' without analysis",
        "type": "regex",
        "pattern": "(no (security )?issues found|code is secure|no vulnerabilities)",
        "severity": "warning"
      }
    ]
  }
}

Plugin System

SkillForge uses a plugin architecture for language-specific validation. Each language is a self-contained plugin that provides syntax checking, AST analysis, and red-flag presets.

Architecture Overview

flowchart TD
    RF["red-flag.ts\ndetectRedFlags"]
    RF -->|"length, regex, custom"| Direct["Handle Directly"]
    RF -->|"syntax / ast"| PR["PluginRegistry\nSingleton"]

    PR --> JS["JavaScript /\nTypeScript Plugin"]
    PR --> PY["Python\nPlugin"]
    PR --> JSON["JSON\nPlugin"]
    PR --> DART["Dart / Flutter\nPlugin"]
    PR --> CUSTOM["Your Custom\nPlugin"]

    JS --> R1["checkSyntax\ncheckAST\ngetRedFlagPreset"]
    PY --> R2["checkSyntax\ncheckAST\ngetRedFlagPreset"]
    JSON --> R3["checkSyntax\ncheckAST\ngetRedFlagPreset"]
    DART --> R4["checkSyntax\ncheckAST\ngetRedFlagPreset"]
    CUSTOM --> R5["checkSyntax\ncheckAST\ngetRedFlagPreset"]

    style RF fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style Direct fill:#1e293b,stroke:#6366f1,color:#e2e8f0
    style PR fill:#1e293b,stroke:#8b5cf6,color:#e2e8f0
    style JS fill:#1e293b,stroke:#3b82f6,color:#e2e8f0
    style PY fill:#1e293b,stroke:#10b981,color:#e2e8f0
    style JSON fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style DART fill:#1e293b,stroke:#06b6d4,color:#e2e8f0
    style CUSTOM fill:#1e293b,stroke:#ec4899,color:#e2e8f0
    style R1 fill:#0f172a,stroke:#3b82f6,color:#94a3b8
    style R2 fill:#0f172a,stroke:#10b981,color:#94a3b8
    style R3 fill:#0f172a,stroke:#f59e0b,color:#94a3b8
    style R4 fill:#0f172a,stroke:#06b6d4,color:#94a3b8
    style R5 fill:#0f172a,stroke:#ec4899,color:#94a3b8

Built-in Plugins

SkillForge ships with 4 built-in plugins, automatically registered on import:

JavaScript / TypeScript Plugin

Languages: javascript, typescript

Syntax checks:

Balanced braces {}, parentheses (), brackets []
Unclosed template literals
Unclosed string literals (per-line)

AST checks:

Hallucinated imports (suspiciously long package names like my-super-awesome-package)
Duplicate function/variable declarations

Preset: 7 rules — length, syntax, AST, empty output, apology, placeholders, truncation

Python Plugin

Languages: python

Syntax checks:

Balanced parentheses and brackets
Mixed tabs and spaces in indentation
Missing colons after def, class, if, for, while, etc.

Preset: 6 rules — length, syntax, empty output, apology, placeholders (including pass #), truncation

JSON Plugin

Languages: json

Syntax checks:

Full JSON.parse() validation — catches all JSON syntax errors

Preset: 2 rules — syntax, empty output

Dart / Flutter Plugin

Languages: dart, flutter

Syntax checks:

Balanced braces, parentheses, brackets, and generic angle brackets
Unclosed string literals (handles raw strings r'...')
Missing semicolons after var, final, const, return, throw, late
Empty build() method bodies (Flutter-specific)
Broken cascade notation ..

AST checks:

Hallucinated package: imports
Duplicate class declarations
Duplicate top-level function declarations
Missing @override on lifecycle methods (build, initState, dispose, etc.)
StatefulWidget without matching State<WidgetName> class

Preset: 13 rules — length, syntax, AST, empty output, apology, placeholders, truncation, debug print(), deprecated widgets (RaisedButton, FlatButton, OutlineButton), excessive dynamic type, hardcoded colors

Writing a Custom Plugin

Create a plugin for any language in 3 steps:

Step 1: Define the Plugin

import type { LanguagePlugin } from 'skillforge';

const swiftPlugin: LanguagePlugin = {
  name: 'swift',
  languages: ['swift', 'swiftui'],

  checkSyntax(code: string): string[] {
    const errors: string[] = [];

    // Example: check for balanced braces
    const opens = (code.match(/{/g) ?? []).length;
    const closes = (code.match(/}/g) ?? []).length;
    if (opens !== closes) {
      errors.push(`Unbalanced braces: ${opens} opening vs ${closes} closing`);
    }

    // Example: check for force-unwrap abuse
    const forceUnwraps = (code.match(/\w+!/g) ?? []).length;
    if (forceUnwraps > 5) {
      errors.push(`Excessive force-unwrapping: ${forceUnwraps} instances — use optional binding`);
    }

    return errors;
  },

  checkAST(code: string): string[] {
    const issues: string[] = [];

    // Example: detect duplicate struct declarations
    const structPattern = /struct\s+(\w+)/g;
    const structs = new Map<string, number>();
    let match;
    while ((match = structPattern.exec(code)) !== null) {
      structs.set(match[1], (structs.get(match[1]) ?? 0) + 1);
    }
    for (const [name, count] of structs) {
      if (count > 1) {
        issues.push(`Duplicate struct: "${name}" declared ${count} times`);
      }
    }

    return issues;
  },

  getRedFlagPreset() {
    return [
      {
        id: 'swift_syntax',
        name: 'Swift Syntax Error Detection',
        description: 'Generated Swift code contains syntax errors',
        type: 'syntax' as const,
        language: 'swift',
        severity: 'error' as const,
      },
      {
        id: 'swift_ast',
        name: 'Swift AST Issue Detection',
        description: 'Generated Swift code has structural issues',
        type: 'ast' as const,
        language: 'swift',
        severity: 'warning' as const,
      },
      {
        id: 'swift_force_unwrap',
        name: 'Force Unwrap Detection',
        description: 'Code uses excessive force-unwrapping (!)',
        type: 'regex' as const,
        pattern: '\\w+!\\.',
        severity: 'warning' as const,
      },
    ];
  },
};

Step 2: Register It

import { registerPlugin } from 'skillforge';

registerPlugin(swiftPlugin);

That's it! Now swift and swiftui are supported everywhere in SkillForge.

Step 3: Use It

# CLI uses it automatically
npx tsx bin/skillforge.ts init my-swift-skill --language swift

# Or programmatically
import { getRedFlagPreset, detectRedFlags } from 'skillforge';
const rules = getRedFlagPreset('swift');
const detections = detectRedFlags(swiftCode, rules);

Plugin API Reference

`LanguagePlugin` Interface

interface LanguagePlugin {
  name: string;              // Unique plugin name
  languages: string[];       // Language identifiers this plugin handles
  checkSyntax(code: string): string[];    // Returns error descriptions
  checkAST(code: string): string[];       // Returns issue descriptions
  getRedFlagPreset(): RedFlagRule[];      // Returns preset rules
}

`registerPlugin(plugin)`

Register a custom plugin with the global registry. If a language identifier is already registered, the new plugin overrides it.

import { registerPlugin } from 'skillforge';
registerPlugin(myPlugin);

`pluginRegistry`

Direct access to the singleton registry for advanced use:

import { pluginRegistry } from 'skillforge';

pluginRegistry.getAllPlugins();           // LanguagePlugin[]
pluginRegistry.getSupportedLanguages();   // string[]
pluginRegistry.isSupported('swift');      // boolean
pluginRegistry.getPlugin('dart');         // LanguagePlugin | undefined
pluginRegistry.getPreset('dart');         // RedFlagRule[]
pluginRegistry.checkSyntax(code, 'dart'); // string[]
pluginRegistry.checkAST(code, 'dart');    // string[]

Red-Flag Rules

Red-flag rules are the safety net that prevents bad output from being accepted.

Built-in Rule Types

Type	Field	Description	Example
`length`	`maxLength`	Output exceeds character limit	Code over 10,000 chars
`regex`	`pattern`	Output matches a forbidden pattern	Apology phrases, TODOs
`syntax`	`language`	Output has syntax errors (delegated to plugin)	Invalid TypeScript
`ast`	`language`	Output has structural issues (delegated to plugin)	Hallucinated imports
`custom`	`scriptPath`	Custom validation script	Your own checker

Rule Severity

Severity	Behavior
`warning`	Output is discarded and resampled (agent tries again)
`error`	Pipeline halts immediately and reports the failure

Presets

Get language-specific rule presets via the unified getRedFlagPreset() function:

import { getRedFlagPreset } from 'skillforge';

// Returns the full preset for any registered language
const tsRules = getRedFlagPreset('typescript');    // 7 rules
const dartRules = getRedFlagPreset('dart');         // 13 rules
const pyRules = getRedFlagPreset('python');         // 6 rules
const jsonRules = getRedFlagPreset('json');         // 2 rules

// Documentation preset (not language-specific)
import { getDocRedFlagPreset } from 'skillforge';
const docRules = getDocRedFlagPreset();             // 3 rules

Writing Custom Rules

{
  "id": "my_custom_rule",
  "name": "Human-Readable Name",
  "description": "What this rule catches and why",
  "type": "regex",
  "pattern": "your_regex_pattern_here",
  "severity": "warning"
}

Tips:

Use \\b for word boundaries to avoid false positives
Use \\s+ for flexible whitespace matching
Test your regex against sample outputs before adding
Prefer warning for style issues, error for correctness

Testing

SkillForge generates three types of tests automatically:

Trigger Tests

Verify the skill activates (or doesn't) for given prompts. Uses four weighted strategies:

Strategy	Weight	How it works
Exact match	1.0	Direct string comparison
Keyword overlap	0.7	Token intersection
Fuzzy n-gram similarity	0.5	Character-level similarity
Description match	0.3	Semantic description comparison

Functional Tests

Verify structural integrity:

All steps have validation criteria
All steps have inputs and outputs
Data flow is valid (no orphaned inputs)
MAKER config is consistent (K ≤ agent count)

Comparison Tests

Score output quality on 9 heuristics (0-100%):

Meaningful content, reasonable length, no placeholders, no apologies
Structured code, error handling, documentation
No debug logging, type annotations

SDK API Reference

Import everything from the main entry point:

import { detectRedFlags, decompose, generateSkill, registerPlugin } from 'skillforge';

Core Functions

`detectRedFlags(output, rules)`

Scan output against an array of RedFlagRule objects. Returns RedFlagDetection[].

import { detectRedFlags, getRedFlagPreset } from 'skillforge';

const rules = getRedFlagPreset('typescript');
const detections = detectRedFlags(llmOutput, rules);

for (const d of detections) {
  console.log(`🚩 ${d.ruleName}: ${d.reason}`);
  if (d.discarded) console.log('   → Output discarded');
}

`decompose(config)`

Break a SkillConfig into AtomicStep[]. Each atomic step is a single-responsibility micro-step.

import { decompose, getDecompositionReport } from 'skillforge';

const steps = decompose(config);
const report = getDecompositionReport(steps);
console.log(`${report.parentSteps} steps → ${report.totalAtomicSteps} atomic steps`);

`validateConsensus(step, outputs, options?)`

Run K-threshold consensus validation on multiple LLM outputs for a step.

import { validateConsensus } from 'skillforge';

const result = validateConsensus(step, [outputA, outputB, outputC], {
  kThreshold: 2,
  similarityThreshold: 0.8,
});

if (result.consensusReached) {
  console.log(`Consensus: ${result.votes} agents agreed`);
  console.log(result.accepted);
}

`generateSkill(config, options)`

Run the full pipeline: decompose → validate → assemble → write.

import { createDefaultConfig, generateSkill } from 'skillforge';

const config = createDefaultConfig('my-skill', 'Does amazing things');
const result = generateSkill(config, {
  outputDir: './dist',
  verbose: true,
  validate: true,
});

`assemble(config, atomicSteps, validationResults)`

Generate the final SKILL.md and all output files from decomposed steps.

`countBalance(code, open, close)`

Shared utility — counts bracket balance respecting string literals. Used by plugins internally.

Plugin Functions

import {
  registerPlugin,        // Register a custom LanguagePlugin
  pluginRegistry,        // Direct registry access
  getRedFlagPreset,      // Get rules for any language

  // Individual built-in plugins
  javascriptPlugin,
  pythonPlugin,
  jsonPlugin,
  dartPlugin,
} from 'skillforge';

import type { LanguagePlugin } from 'skillforge';

Generator Functions

import {
  generateSkill,           // Full pipeline
  createDefaultConfig,     // Create a starter SkillConfig

  // Template rendering
  render,                  // Mustache-style template rendering
  renderConfigJson,        // Render config as formatted JSON
  renderFrontmatter,       // Render YAML frontmatter
  renderDecompositionTable, // Render decomposition as table

  // Script generation
  generateCodeValidator,    // Generate JS/TS validation script
  generatePythonValidator,  // Generate Python validation script
  generateTestSuiteScript,  // Generate test runner script

  // Templates
  getCodeSkillTemplate,     // Get the SKILL.md template
} from 'skillforge';

Testing Functions

import {
  generateDefaultTests,    // Auto-generate tests from config
  runTestSuite,            // Run all tests
  formatTestReport,        // Format results for console

  runTriggerTest,          // Run a single trigger test
  runFunctionalTest,       // Run a single functional test
  validateOutputAgainstExpected, // Compare output to expected
} from 'skillforge';

// Full test flow
const tests = generateDefaultTests(config);
const results = runTestSuite(config, tests);
console.log(formatTestReport(results));

Types

All TypeScript interfaces are exported for type-safe usage:

import type {
  // Config
  SkillConfig,
  WorkflowStep,
  MAKERConfig,
  RedFlagRule,

  // Decomposition
  AtomicStep,
  IOContract,
  FieldSpec,

  // Validation
  ValidationResult,
  ValidationError,
  RedFlagDetection,
  ConsensusResult,
  CandidateOutput,

  // Generation
  GeneratedSkill,
  GeneratedFile,
  SkillFrontmatter,
  GenerateResult,
  GeneratorOptions,

  // Testing
  TestCase,
  TestResult,
  TestSuiteResult,
  TestType,

  // Plugin
  LanguagePlugin,

  // CLI
  CLIOptions,
} from 'skillforge';

Project Structure

skill_sdk/
├── bin/
│   └── skillforge.ts              # CLI entry point (6 commands)
├── src/
│   ├── index.ts                   # Public SDK API (barrel export)
│   ├── types/
│   │   ├── index.ts               # All TypeScript interfaces
│   │   └── plugin.ts              # LanguagePlugin interface
│   ├── core/
│   │   ├── decomposer.ts          # Maximal Agentic Decomposition
│   │   ├── validator.ts           # K-threshold consensus voting
│   │   ├── red-flag.ts            # Red-flag orchestrator (delegates to plugins)
│   │   ├── assembler.ts           # SKILL.md generation
│   │   ├── plugin-registry.ts     # Plugin registration & lookup
│   │   └── helpers.ts             # Shared utilities (countBalance)
│   ├── plugins/
│   │   ├── index.ts               # Auto-registers all built-in plugins
│   │   ├── javascript.ts          # JS/TS plugin
│   │   ├── python.ts              # Python plugin
│   │   ├── json.ts                # JSON plugin
│   │   └── dart.ts                # Dart/Flutter plugin
│   ├── generators/
│   │   ├── skill-generator.ts     # Full pipeline orchestrator
│   │   ├── script-generator.ts    # Validation script generators
│   │   └── template-engine.ts     # Mustache-like renderer
│   └── testing/
│       ├── test-runner.ts         # Test orchestrator
│       ├── trigger-test.ts        # Trigger matching tests
│       └── functional-test.ts     # Structural validation tests
├── tests/
│   ├── decomposer.test.ts         # 12 tests
│   ├── validator.test.ts          # 10 tests
│   ├── red-flag.test.ts           # 24 tests
│   ├── plugins.test.ts            # 22 tests
│   └── integration.test.ts        # 5 tests
├── examples/
│   ├── react-component-skill/
│   ├── express-rest-api-generator/
│   └── flutter-accessibility-audit/
├── package.json
└── tsconfig.json

Troubleshooting

"Config not found" error

❌ Config not found: /path/to/skillforge.config.json

Ensure you're passing the correct --config path. Default looks for ./skillforge.config.json.

Step has no inputs or outputs

Structural issues found: Step "step-2" has no inputs or outputs

Every step needs at least one entry in inputs and outputs. The first step typically takes ["user_request"].

Data flow validation failure

Step "step-3" requires input "analysis" which is not produced by any previous step

Check that outputs of a previous step includes the exact string used in inputs. Names must match exactly.

K-threshold exceeds agent count

K-threshold (4) exceeds agent count (3)

kThreshold must be ≤ agentCount. If K > agents, consensus is impossible.

Red-flag regex false positives

Make the pattern more specific (add \b, ^, $)
Change severity from error to warning
Remove the rule from globalRedFlagRules

Plugin not registered

getRedFlagPreset('swift') returns []

Custom plugins must be registered before use:

import { registerPlugin } from 'skillforge';
registerPlugin(swiftPlugin);

FAQ

Q: Does SkillForge call an LLM during build? No. SkillForge is a pure scaffolding and validation tool. It generates the skill definition, decomposition, validation scripts, and test suite — but the actual LLM calls happen at runtime.

Q: What's the difference between warning and error severity? warning discards the output and retries. error halts the pipeline immediately — use this for safety-critical rules.

Q: Can steps run in parallel? Yes! If two steps consume the same input but don't depend on each other, they are implicitly parallel. SkillForge wires dependencies automatically.

Q: How many atomic steps should I expect? Roughly 5-12 per workflow step. A typical 4-step skill produces 20-40 atomic steps.

Q: Can I add a new programming language? Yes! Create a LanguagePlugin and call registerPlugin(). See Writing a Custom Plugin.

Q: What languages are supported out of the box? JavaScript, TypeScript, Python, JSON, Dart, and Flutter. Each has a dedicated plugin with language-specific heuristics.

Q: What happens if no agents agree? The step is retried up to maxRetries times. If it still fails, the pipeline reports the failure with all candidate outputs.

Q: Can I use custom validation scripts? Yes. Use the custom red-flag rule type with a scriptPath. The script receives the output and should exit with 0 (pass) or 1 (fail).

Q: How do I override a built-in plugin? Call registerPlugin() with a plugin that uses the same languages array. The new plugin replaces the built-in one in the registry.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
examples		examples
skills		skills
src		src
tests		tests
.gitignore		.gitignore
PROMPT.md		PROMPT.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

SkillForge

Table of Contents

Quick Start

Installation

Key npm Scripts

Core Concepts

🧬 Maximal Agentic Decomposition (MAD)

🗳️ K-Threshold Consensus Voting

🚩 Red-Flag Detection

CLI Reference

init — Scaffold a new skill

build — Generate the full skill

decompose — Preview atomic breakdown

test — Run the test suite

inspect — Red-flag analysis

validate — Consensus validation info

Configuration Reference

Writing Effective Instructions

Defining Data Flow

Creating Skills

Scenario 1: Code Generation Skill

Scenario 2: Documentation Skill

Scenario 3: Multi-Step Pipeline Skill

Scenario 4: Strict Compliance Skill

Scenario 5: Review / Audit Skill

Plugin System

Architecture Overview

Built-in Plugins

JavaScript / TypeScript Plugin

Python Plugin

JSON Plugin

Dart / Flutter Plugin

Writing a Custom Plugin

Step 1: Define the Plugin

Step 2: Register It

Step 3: Use It

Plugin API Reference

LanguagePlugin Interface

registerPlugin(plugin)

pluginRegistry

Red-Flag Rules

Built-in Rule Types

Rule Severity

Presets

Writing Custom Rules

Testing

Trigger Tests

Functional Tests

Comparison Tests

SDK API Reference

Core Functions

detectRedFlags(output, rules)

decompose(config)

validateConsensus(step, outputs, options?)

generateSkill(config, options)

assemble(config, atomicSteps, validationResults)

countBalance(code, open, close)

Plugin Functions

Generator Functions

Testing Functions

Types

Project Structure

Troubleshooting

"Config not found" error

Step has no inputs or outputs

Data flow validation failure

K-threshold exceeds agent count

Red-flag regex false positives

Plugin not registered

FAQ

License

About

Topics

Resources

Uh oh!

Stars

`init` — Scaffold a new skill

`build` — Generate the full skill

`decompose` — Preview atomic breakdown

`test` — Run the test suite

`inspect` — Red-flag analysis

`validate` — Consensus validation info

`LanguagePlugin` Interface

`registerPlugin(plugin)`

`pluginRegistry`

`detectRedFlags(output, rules)`

`decompose(config)`

`validateConsensus(step, outputs, options?)`

`generateSkill(config, options)`

`assemble(config, atomicSteps, validationResults)`

`countBalance(code, open, close)`

Packages