Skip to content

paulasilvatech/awesome-agentic-loop

Repository files navigation

title Super Agentic Loop — Primitive Agent Template Kit
description Production-ready templates for agentic AI workflows based on empirical research 2025-2026
author Paula Silva — AI-Native Software Engineer
date 2026-04-10
version 1.0.0
status active
tags
agents
primitives
sdd
tdd
hooks
skills
github-copilot
claude-code

Super Agentic Loop

Pick the right primitive first. Stacking the wrong layers creates complexity with no gain.

A curated collection of production-ready templates for agentic AI workflows. Every template in this kit is grounded in empirical research from 2025–2026 — not opinion. Design decisions are documented with their evidence sources.


What Is a Primitive Agent?

A primitive is the smallest, independent building block of an agentic system. Before building workflows, pipelines, or autonomous agents, you need to understand the six primitives — and when to use each.

Primitive File Scope Loaded when
Instructions copilot-instructions.md / CLAUDE.md / AGENTS.md Always-on context Every interaction, automatically
Custom Agent .github/agents/name.agent.md Named specialist persona User invokes @agent-name
Skill .github/skills/name/SKILL.md Reusable capability module Agent loads on relevance
Prompt .github/prompts/name.prompt.md Pre-defined slash command User invokes /command
MCP Server External server Live data / external tools On demand via tool call
Hook settings.json hooks config Policy enforcement On lifecycle event, automatically

Understanding these primitives — and their costs, scopes, and interactions — is what separates a reliable agentic system from a brittle one.


Why Primitives Matter

Most teams fail at agentic AI not because the model is wrong, but because the architecture is wrong.

Common failure patterns:

  • Putting everything in one giant copilot-instructions.md (loaded on every request — token cost grows linearly)
  • Writing agents with vague descriptions (wrong agent activated on wrong tasks)
  • Using soft constraints like "should" instead of hard MUST NOT (agents ignore them)
  • Duplicating content between Instructions and Agents (>20% token overhead with no gain)
  • Generating agent files with LLMs instead of curating them manually (-3% performance)

Source: arXiv:2602.11988 (ETH Zurich), arXiv:2601.20404, arXiv:2602.12430

The Triple Debt Model — Technical, Cognitive, and Intent debt reinforce each other in a cycle only broken by correct primitive architecture

This kit gives you the patterns that work — with the evidence to back them up.


The Six Primitives Explained

1. Instructions — Always-On Context

File: .github/copilot-instructions.md / CLAUDE.md / AGENTS.md
Cost: High — loaded on every single request.
Purpose: Shape behavior that must apply universally, without exception.

Instructions are passive and always-on. They cannot restrict tools. They establish the baseline: what the project is, its architecture, build commands, and non-negotiable constraints.

Use for:

  • Project identity and architecture pattern
  • Build and test commands
  • Hard constraints that apply to every file type
  • Escalation triggers

Do NOT use for:

  • Rules that only apply to specific file types → use scoped .instructions.md with applyTo
  • Role-specific behavior → use Custom Agents
  • Reusable procedures → use Skills

Empirical impact:

Finding Data Source
AGENTS.md reduces agent runtime -28.64% arXiv:2601.20404
AGENTS.md reduces output tokens -16.58% arXiv:2601.20404
LLM-generated files reduce performance -3% arXiv:2602.11988
Redundant content overhead >20% tokens with no gain arXiv:2602.11988
Most impactful section Architecture arXiv:2511.09268

Keep this file under 100 lines. Write it manually. Include only what the agent cannot infer from the code itself.


2. Custom Agent — Named Specialist Persona

File: .github/agents/[name].agent.md
Cost: Medium — loaded only when the user invokes @agent-name.
Purpose: Define a named role with restricted tools, a clear responsibility boundary, and explicit constraints.

A Custom Agent is a persona with enforcement. Unlike Instructions, it can restrict which tools are available to that role. A @security-reviewer agent can be limited to read-only tools — by design, not by hope.

Use for:

  • Roles with clear input/output boundaries (reviewer reads, never writes)
  • Operations where mistakes are expensive (security review, DB migrations, production config)
  • Multi-agent pipelines (Planner → Implementer → Reviewer)
  • Consistent behavior invoked by multiple team members

Do NOT use for:

  • Exploratory or one-off work → use generalist agent or plain chat
  • Global rules → use Instructions

Two types:

Type When to use
Generalist Task crosses multiple concerns, exploratory work, small projects
Specialist Clear boundaries, expensive mistakes, pipeline steps, enforced tool restrictions

3. Skill — Reusable Capability Module

File: .github/skills/[name]/SKILL.md
Cost: Variable — loaded dynamically based on topic relevance.
Purpose: Bundle domain knowledge, scripts, templates, and examples into a module any agent can load.

A Skill is not a persona — it is a capability. Where a Custom Agent asks "who am I?", a Skill asks "what do I know how to do?". Skills are loaded automatically when the agent determines they are relevant.

Use for:

  • Reusable domain procedures (incident triage, postmortem generation, CI debugging)
  • Bundled scripts + templates + examples that belong together
  • Knowledge that multiple agents should access

Do NOT use for:

  • Always-on context → use Instructions
  • One-time operations → use Prompts or plain chat

Empirical guardrails:

Finding Data Source
26.1% of community skills contain vulnerabilities Prompt injection via SKILL.md arXiv:2602.12430
Human-expert vs LLM-generated skills Near-perfect vs frequent failure arXiv:2603.19583
Performance ceiling Degrades above 500 lines Anthropic docs

Keep SKILL.md under 500 lines. Use decision trees before examples. Define your trigger description precisely — overly broad triggers activate the wrong skill.


4. Prompt — Pre-defined Slash Command

File: .github/prompts/[name].prompt.md
Cost: Lowest — loaded only when the user explicitly invokes /command.
Purpose: Define a bounded, repeatable, user-triggered task.

Prompts are user-initiated, task-scoped, and idempotent. Running /gen-tests twice on the same file produces the same output. Each prompt has one objective and one output format.

Use for:

  • /gen-tests — generate a test suite for the current file
  • /security-review — OWASP check on the current file
  • /release-notes — generate changelog from commits
  • Any repeatable task with a predictable structure that users invoke on purpose

Do NOT use for:

  • Behavior that should apply without user action → use Instructions
  • Domain knowledge the agent should always apply → use Skills

5. MCP Server — Live Data and External Tools

Scope: External server, connected via tool calls
Cost: On-demand only
Purpose: Give the agent access to live data or external systems — databases, APIs, file systems, search indexes.

MCP (Model Context Protocol) Servers extend what the agent can do, not what it knows. They are the integration layer: read from a database, call a REST API, interact with a CI system.

Use for:

  • Real-time data the agent cannot have in its context
  • External system operations (GitHub API, Jira, Slack, cloud providers)
  • Custom tool surfaces specific to your organization

Stack on top of: any other primitive that needs live data.


6. Hook — Policy Enforcement via Lifecycle Events

File: settings.json hooks config + shell scripts
Cost: Minimal — fires on lifecycle events
Purpose: Intercept agent actions before or after they execute to enforce policy, run quality gates, or block dangerous operations.

Hooks operate at the infrastructure layer of the agent loop. They are deterministic, not probabilistic — a PreToolUse hook either blocks or allows. No prompt-level instruction can override them.

Lifecycle events:

Event Type Use case
PreToolUse Blocking Block destructive Bash commands, guard protected paths
PostToolUse Async Run lint, secret scan, format check after writes
SessionStart Async Log session, load project context
UserPromptSubmit Blocking Validate or modify the incoming message

Empirical context:

Finding Data Source
All evaluated LLMs produce hardcoded credentials Systematic weakness arXiv:2508.14727
Tool poisoning = OWASP LLM Top 1 Highest attack vector arXiv:2604.07551
5 out of 7 MCP clients lack input validation Injection risk arXiv:2603.22489

Use PreToolUse to BLOCK. Use PostToolUse to OBSERVE and autocorrect. Never eval hook stdin fields.


How the Primitives Compose

Does this need to apply to EVERY interaction?
├── YES → Instructions (copilot-instructions.md / CLAUDE.md)
└── NO
    └── Is this a REPEATABLE TASK with predictable structure?
        ├── YES
        │   └── Does it require a SPECIFIC PERSONA + TOOL RESTRICTIONS?
        │       ├── YES → Custom Agent (.github/agents/name.agent.md)
        │       └── NO
        │           └── Does it bundle SCRIPTS + TEMPLATES + EXAMPLES?
        │               ├── YES → Skill (.github/skills/name/SKILL.md)
        │               └── NO  → Prompt (.github/prompts/name.prompt.md)
        └── NO (exploratory, one-off, broad task)
            └── Generalist Agent or plain chat

ADDITIVE LAYERS (stack on any of the above):
  └── Needs LIVE DATA from an external system? → Add MCP Server
  └── Needs DETERMINISTIC POLICY enforcement? → Add Hook

Routing Decision Map — right model, right mode, right primitive for every SDLC phase


Composition Patterns

Pattern 1 — Sequential Handoff (Plan → Implement → Review)

@implementation-planner → IMPL-NNN.md
@[project]-agent        → executes tasks from IMPL-NNN.md
@code-reviewer          → validates the resulting PR

Pattern 2 — Parallel Multi-Lens Review

@security-reviewer  [parallel]
@code-reviewer      [parallel] → synthesize findings
@test-specialist    [parallel]

Pattern 3 — Gather-then-Act

@security-reviewer reads the target area first
@[project]-agent implements with security findings in context

Pattern 4 — Supervisor

@[project]-agent implements
@code-reviewer validates output before PR is opened

What's in This Repository

SUPER-TEMPLATES/
├── agents/
│   ├── AGENT_DECISION_GUIDE.md           ← When to use each primitive (decision flowchart)
│   ├── GENERALIST_AGENT_template.md      ← Full-stack agent template
│   └── SPECIALIST_AGENT_template.md      ← Role-restricted agent templates (3 roles)
├── intructions/
│   ├── copilot-instructions.md           ← Workspace instructions template
│   ├── prompt-templates.md               ← Slash command prompt templates
│   └── scoped-instructions-templates.md  ← Path-scoped .instructions.md templates
└── sdd-tdd-agents/
    ├── CONSTITUTION_template.md          ← Layer 1: immutable project principles
    ├── SPECIFICATION_template.md         ← Layer 2: per-feature requirements (EARS format)
    ├── IMPLEMENTATION_PLAN_template.md   ← Layer 3: atomic agent tasks with [P] markers
    ├── TDD_SPEC_template.md              ← TOML spec + Arrange/Act/Assert test suite
    ├── SKILL_md_template.md              ← Skill module template
    ├── HOOKS_template.md                 ← Lifecycle hook config + ready-to-use scripts
    └── CLAUDE_md_template.md             ← CLAUDE.md / AGENTS.md context file template

SDD — Spec-Driven Development

The templates implement a three-layer Spec-Driven Development hierarchy that prevents cascading failures when agents misinterpret vague requirements.

CONSTITUTION.md          → immutable principles, always loaded, < 200 lines
    ↓
SPECIFICATION.md         → per-feature requirements in EARS format + Given/When/Then
    ↓
IMPLEMENTATION_PLAN.md   → atomic tasks, [P] parallel markers, explicit pre-gates

The Benchmark Gap — LLM benchmarks measure implementation (61%) but not spec or planning phases

Finding Data Source
Human-refined specs reduce agent errors up to -50% arXiv:2602.00180
Planner-Coder gap (vague tasks) 7–83% failure rate arXiv:2510.10460
CONSTITUTION → SPEC → PLAN hierarchy prevents cascading failures arXiv:2602.02584

SDD + Chat Mode Alignment — each SDD phase maps to a specific mode, model, and VS Code primitive


Who Should Use This

Role What to take
Developer starting a new project CLAUDE_md_template.md + CONSTITUTION_template.md + HOOKS_template.md
Tech Lead setting up team conventions Full SDD stack: Constitution + Spec + Plan
Platform Engineer building internal tooling GENERALIST_AGENT_template.md + SPECIALIST_AGENT_template.md
Security Engineer enforcing policy HOOKS_template.md + security-reviewer from SPECIALIST_AGENT_template.md
DevOps Engineer automating release workflows SKILL_md_template.md + prompt-templates.md

Quick Start

# 1. Copy the context file to your project root
cp SUPER-TEMPLATES/sdd-tdd-agents/CLAUDE_md_template.md ./CLAUDE.md

# 2. Set up the SDD stack
mkdir -p docs/sdd
cp SUPER-TEMPLATES/sdd-tdd-agents/CONSTITUTION_template.md ./CONSTITUTION.md
cp SUPER-TEMPLATES/sdd-tdd-agents/SPECIFICATION_template.md docs/sdd/SPEC-001-[feature].md
cp SUPER-TEMPLATES/sdd-tdd-agents/IMPLEMENTATION_PLAN_template.md docs/sdd/IMPL-001-[feature].md

# 3. Install lifecycle hooks
mkdir -p ~/.config/claude/hooks
# See HOOKS_template.md for ready-to-use scripts

# 4. Add a custom agent
cp SUPER-TEMPLATES/agents/SPECIALIST_AGENT_template.md .github/agents/[role]-agent.md

Recommended filling order

1. CONSTITUTION.md          30 min — team decision, immutable principles
2. CLAUDE.md / AGENTS.md    20 min — derived from your existing code
3. Hooks                    Once per org — copy and adapt scripts
4. SPECIFICATION.md         Per feature, before writing any code
5. IMPLEMENTATION_PLAN.md   Break the spec into atomic agent tasks
6. TDD_SPEC                 Before invoking the agent
7. Skills                   As repeated automations emerge

Token Cost Awareness

Every primitive has a cost profile. Use the right primitive for the right job to avoid wasting context budget.

Primitive When loaded Token cost impact
copilot-instructions.md Every request High — minimize ruthlessly
Scoped .instructions.md Matching path only ~68% less than root file
Custom Agent Per agent session Medium — only when invoked
Skill On relevance match Variable — bounded by file size
Prompt User-invoked only Lowest — on demand
MCP Server Tool call Network latency — not token budget
Hook Lifecycle event None — shell execution

VS Code + Copilot Primitives Token Cost Stack — use the cheapest primitive that meets the task needs

At 1,000 developers × 10 requests/day, a 2,000-token copilot-instructions.md consumes 20 million tokens per day — before a single line of code enters the context.


Common Mistakes

Mistake Why it fails Fix
Vague agent description Wrong agent activated on wrong tasks Write specific trigger conditions
No tool restrictions on specialist Defeats the purpose of specialization Restrict to minimum needed tools
SHOULD instead of MUST NOT Agents treat soft constraints as optional Use MUST NOT for hard rules
Instructions + Agent with duplicate content >20% token overhead, no gain Instructions = always-on only
One mega-agent for everything Context bloat, unpredictable behavior Split into specialists + generalist
LLM-generated agent files -3% performance vs human-curated Write and curate manually
SKILL.md over 500 lines Reliability degrades above threshold Split with progressive disclosure

References

Source Topic
arXiv:2601.20404 AGENTS.md impact on runtime and token consumption
arXiv:2602.11988 ETH Zurich — LLM-generated vs human-curated context files
arXiv:2602.00180 Human-refined specs reduce agent errors by 50%
arXiv:2510.10460 Planner-Coder gap — vague task failure rates 7–83%
arXiv:2602.02584 Constitutional SDD hierarchy prevents cascading failures
arXiv:2511.09268 Architecture section is the highest-impact part of context files
arXiv:2511.21382 115-paper review — TDD with LLMs, iterative repair loop
arXiv:2602.12430 Community Skills vulnerabilities — 26.1% injection rate
arXiv:2603.19583 Human-expert skills vs LLM-generated skills
arXiv:2603.05344 Lifecycle hook events and mutation behavior
arXiv:2508.14727 All evaluated LLMs produce hardcoded credentials
arXiv:2604.07551 OWASP LLM Top 10 — tool poisoning as primary vector
arXiv:2603.22489 5 out of 7 MCP clients lack input validation
Anthropic Agent Skills docs SKILL.md best practices and 500-line limit

Maintained by Paula Silva — AI-Native Software Engineer · Version 1.0.0 · April 2026

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages