| title | Super Agentic Loop — Primitive Agent Template Kit | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| description | Production-ready templates for agentic AI workflows based on empirical research 2025-2026 | ||||||||
| author | Paula Silva — AI-Native Software Engineer | ||||||||
| date | 2026-04-10 | ||||||||
| version | 1.0.0 | ||||||||
| status | active | ||||||||
| tags |
|
Pick the right primitive first. Stacking the wrong layers creates complexity with no gain.
A curated collection of production-ready templates for agentic AI workflows. Every template in this kit is grounded in empirical research from 2025–2026 — not opinion. Design decisions are documented with their evidence sources.
A primitive is the smallest, independent building block of an agentic system. Before building workflows, pipelines, or autonomous agents, you need to understand the six primitives — and when to use each.
| Primitive | File | Scope | Loaded when |
|---|---|---|---|
| Instructions | copilot-instructions.md / CLAUDE.md / AGENTS.md |
Always-on context | Every interaction, automatically |
| Custom Agent | .github/agents/name.agent.md |
Named specialist persona | User invokes @agent-name |
| Skill | .github/skills/name/SKILL.md |
Reusable capability module | Agent loads on relevance |
| Prompt | .github/prompts/name.prompt.md |
Pre-defined slash command | User invokes /command |
| MCP Server | External server | Live data / external tools | On demand via tool call |
| Hook | settings.json hooks config |
Policy enforcement | On lifecycle event, automatically |
Understanding these primitives — and their costs, scopes, and interactions — is what separates a reliable agentic system from a brittle one.
Most teams fail at agentic AI not because the model is wrong, but because the architecture is wrong.
Common failure patterns:
- Putting everything in one giant
copilot-instructions.md(loaded on every request — token cost grows linearly) - Writing agents with vague descriptions (wrong agent activated on wrong tasks)
- Using soft constraints like "should" instead of hard
MUST NOT(agents ignore them) - Duplicating content between Instructions and Agents (>20% token overhead with no gain)
- Generating agent files with LLMs instead of curating them manually (-3% performance)
Source: arXiv:2602.11988 (ETH Zurich), arXiv:2601.20404, arXiv:2602.12430
This kit gives you the patterns that work — with the evidence to back them up.
File: .github/copilot-instructions.md / CLAUDE.md / AGENTS.md
Cost: High — loaded on every single request.
Purpose: Shape behavior that must apply universally, without exception.
Instructions are passive and always-on. They cannot restrict tools. They establish the baseline: what the project is, its architecture, build commands, and non-negotiable constraints.
Use for:
- Project identity and architecture pattern
- Build and test commands
- Hard constraints that apply to every file type
- Escalation triggers
Do NOT use for:
- Rules that only apply to specific file types → use scoped
.instructions.mdwithapplyTo - Role-specific behavior → use Custom Agents
- Reusable procedures → use Skills
Empirical impact:
| Finding | Data | Source |
|---|---|---|
| AGENTS.md reduces agent runtime | -28.64% | arXiv:2601.20404 |
| AGENTS.md reduces output tokens | -16.58% | arXiv:2601.20404 |
| LLM-generated files reduce performance | -3% | arXiv:2602.11988 |
| Redundant content overhead | >20% tokens with no gain | arXiv:2602.11988 |
| Most impactful section | Architecture | arXiv:2511.09268 |
Keep this file under 100 lines. Write it manually. Include only what the agent cannot infer from the code itself.
File: .github/agents/[name].agent.md
Cost: Medium — loaded only when the user invokes @agent-name.
Purpose: Define a named role with restricted tools, a clear responsibility boundary, and explicit constraints.
A Custom Agent is a persona with enforcement. Unlike Instructions, it can restrict which tools are available to that role. A @security-reviewer agent can be limited to read-only tools — by design, not by hope.
Use for:
- Roles with clear input/output boundaries (reviewer reads, never writes)
- Operations where mistakes are expensive (security review, DB migrations, production config)
- Multi-agent pipelines (Planner → Implementer → Reviewer)
- Consistent behavior invoked by multiple team members
Do NOT use for:
- Exploratory or one-off work → use generalist agent or plain chat
- Global rules → use Instructions
Two types:
| Type | When to use |
|---|---|
| Generalist | Task crosses multiple concerns, exploratory work, small projects |
| Specialist | Clear boundaries, expensive mistakes, pipeline steps, enforced tool restrictions |
File: .github/skills/[name]/SKILL.md
Cost: Variable — loaded dynamically based on topic relevance.
Purpose: Bundle domain knowledge, scripts, templates, and examples into a module any agent can load.
A Skill is not a persona — it is a capability. Where a Custom Agent asks "who am I?", a Skill asks "what do I know how to do?". Skills are loaded automatically when the agent determines they are relevant.
Use for:
- Reusable domain procedures (incident triage, postmortem generation, CI debugging)
- Bundled scripts + templates + examples that belong together
- Knowledge that multiple agents should access
Do NOT use for:
- Always-on context → use Instructions
- One-time operations → use Prompts or plain chat
Empirical guardrails:
| Finding | Data | Source |
|---|---|---|
| 26.1% of community skills contain vulnerabilities | Prompt injection via SKILL.md | arXiv:2602.12430 |
| Human-expert vs LLM-generated skills | Near-perfect vs frequent failure | arXiv:2603.19583 |
| Performance ceiling | Degrades above 500 lines | Anthropic docs |
Keep SKILL.md under 500 lines. Use decision trees before examples. Define your trigger description precisely — overly broad triggers activate the wrong skill.
File: .github/prompts/[name].prompt.md
Cost: Lowest — loaded only when the user explicitly invokes /command.
Purpose: Define a bounded, repeatable, user-triggered task.
Prompts are user-initiated, task-scoped, and idempotent. Running /gen-tests twice on the same file produces the same output. Each prompt has one objective and one output format.
Use for:
/gen-tests— generate a test suite for the current file/security-review— OWASP check on the current file/release-notes— generate changelog from commits- Any repeatable task with a predictable structure that users invoke on purpose
Do NOT use for:
- Behavior that should apply without user action → use Instructions
- Domain knowledge the agent should always apply → use Skills
Scope: External server, connected via tool calls
Cost: On-demand only
Purpose: Give the agent access to live data or external systems — databases, APIs, file systems, search indexes.
MCP (Model Context Protocol) Servers extend what the agent can do, not what it knows. They are the integration layer: read from a database, call a REST API, interact with a CI system.
Use for:
- Real-time data the agent cannot have in its context
- External system operations (GitHub API, Jira, Slack, cloud providers)
- Custom tool surfaces specific to your organization
Stack on top of: any other primitive that needs live data.
File: settings.json hooks config + shell scripts
Cost: Minimal — fires on lifecycle events
Purpose: Intercept agent actions before or after they execute to enforce policy, run quality gates, or block dangerous operations.
Hooks operate at the infrastructure layer of the agent loop. They are deterministic, not probabilistic — a PreToolUse hook either blocks or allows. No prompt-level instruction can override them.
Lifecycle events:
| Event | Type | Use case |
|---|---|---|
PreToolUse |
Blocking | Block destructive Bash commands, guard protected paths |
PostToolUse |
Async | Run lint, secret scan, format check after writes |
SessionStart |
Async | Log session, load project context |
UserPromptSubmit |
Blocking | Validate or modify the incoming message |
Empirical context:
| Finding | Data | Source |
|---|---|---|
| All evaluated LLMs produce hardcoded credentials | Systematic weakness | arXiv:2508.14727 |
| Tool poisoning = OWASP LLM Top 1 | Highest attack vector | arXiv:2604.07551 |
| 5 out of 7 MCP clients lack input validation | Injection risk | arXiv:2603.22489 |
Use
PreToolUseto BLOCK. UsePostToolUseto OBSERVE and autocorrect. Neverevalhook stdin fields.
Does this need to apply to EVERY interaction?
├── YES → Instructions (copilot-instructions.md / CLAUDE.md)
└── NO
└── Is this a REPEATABLE TASK with predictable structure?
├── YES
│ └── Does it require a SPECIFIC PERSONA + TOOL RESTRICTIONS?
│ ├── YES → Custom Agent (.github/agents/name.agent.md)
│ └── NO
│ └── Does it bundle SCRIPTS + TEMPLATES + EXAMPLES?
│ ├── YES → Skill (.github/skills/name/SKILL.md)
│ └── NO → Prompt (.github/prompts/name.prompt.md)
└── NO (exploratory, one-off, broad task)
└── Generalist Agent or plain chat
ADDITIVE LAYERS (stack on any of the above):
└── Needs LIVE DATA from an external system? → Add MCP Server
└── Needs DETERMINISTIC POLICY enforcement? → Add Hook
@implementation-planner → IMPL-NNN.md
@[project]-agent → executes tasks from IMPL-NNN.md
@code-reviewer → validates the resulting PR
@security-reviewer [parallel]
@code-reviewer [parallel] → synthesize findings
@test-specialist [parallel]
@security-reviewer reads the target area first
@[project]-agent implements with security findings in context
@[project]-agent implements
@code-reviewer validates output before PR is opened
SUPER-TEMPLATES/
├── agents/
│ ├── AGENT_DECISION_GUIDE.md ← When to use each primitive (decision flowchart)
│ ├── GENERALIST_AGENT_template.md ← Full-stack agent template
│ └── SPECIALIST_AGENT_template.md ← Role-restricted agent templates (3 roles)
├── intructions/
│ ├── copilot-instructions.md ← Workspace instructions template
│ ├── prompt-templates.md ← Slash command prompt templates
│ └── scoped-instructions-templates.md ← Path-scoped .instructions.md templates
└── sdd-tdd-agents/
├── CONSTITUTION_template.md ← Layer 1: immutable project principles
├── SPECIFICATION_template.md ← Layer 2: per-feature requirements (EARS format)
├── IMPLEMENTATION_PLAN_template.md ← Layer 3: atomic agent tasks with [P] markers
├── TDD_SPEC_template.md ← TOML spec + Arrange/Act/Assert test suite
├── SKILL_md_template.md ← Skill module template
├── HOOKS_template.md ← Lifecycle hook config + ready-to-use scripts
└── CLAUDE_md_template.md ← CLAUDE.md / AGENTS.md context file template
The templates implement a three-layer Spec-Driven Development hierarchy that prevents cascading failures when agents misinterpret vague requirements.
CONSTITUTION.md → immutable principles, always loaded, < 200 lines
↓
SPECIFICATION.md → per-feature requirements in EARS format + Given/When/Then
↓
IMPLEMENTATION_PLAN.md → atomic tasks, [P] parallel markers, explicit pre-gates
| Finding | Data | Source |
|---|---|---|
| Human-refined specs reduce agent errors | up to -50% | arXiv:2602.00180 |
| Planner-Coder gap (vague tasks) | 7–83% failure rate | arXiv:2510.10460 |
| CONSTITUTION → SPEC → PLAN hierarchy | prevents cascading failures | arXiv:2602.02584 |
| Role | What to take |
|---|---|
| Developer starting a new project | CLAUDE_md_template.md + CONSTITUTION_template.md + HOOKS_template.md |
| Tech Lead setting up team conventions | Full SDD stack: Constitution + Spec + Plan |
| Platform Engineer building internal tooling | GENERALIST_AGENT_template.md + SPECIALIST_AGENT_template.md |
| Security Engineer enforcing policy | HOOKS_template.md + security-reviewer from SPECIALIST_AGENT_template.md |
| DevOps Engineer automating release workflows | SKILL_md_template.md + prompt-templates.md |
# 1. Copy the context file to your project root
cp SUPER-TEMPLATES/sdd-tdd-agents/CLAUDE_md_template.md ./CLAUDE.md
# 2. Set up the SDD stack
mkdir -p docs/sdd
cp SUPER-TEMPLATES/sdd-tdd-agents/CONSTITUTION_template.md ./CONSTITUTION.md
cp SUPER-TEMPLATES/sdd-tdd-agents/SPECIFICATION_template.md docs/sdd/SPEC-001-[feature].md
cp SUPER-TEMPLATES/sdd-tdd-agents/IMPLEMENTATION_PLAN_template.md docs/sdd/IMPL-001-[feature].md
# 3. Install lifecycle hooks
mkdir -p ~/.config/claude/hooks
# See HOOKS_template.md for ready-to-use scripts
# 4. Add a custom agent
cp SUPER-TEMPLATES/agents/SPECIALIST_AGENT_template.md .github/agents/[role]-agent.md1. CONSTITUTION.md 30 min — team decision, immutable principles
2. CLAUDE.md / AGENTS.md 20 min — derived from your existing code
3. Hooks Once per org — copy and adapt scripts
4. SPECIFICATION.md Per feature, before writing any code
5. IMPLEMENTATION_PLAN.md Break the spec into atomic agent tasks
6. TDD_SPEC Before invoking the agent
7. Skills As repeated automations emerge
Every primitive has a cost profile. Use the right primitive for the right job to avoid wasting context budget.
| Primitive | When loaded | Token cost impact |
|---|---|---|
copilot-instructions.md |
Every request | High — minimize ruthlessly |
Scoped .instructions.md |
Matching path only | ~68% less than root file |
| Custom Agent | Per agent session | Medium — only when invoked |
| Skill | On relevance match | Variable — bounded by file size |
| Prompt | User-invoked only | Lowest — on demand |
| MCP Server | Tool call | Network latency — not token budget |
| Hook | Lifecycle event | None — shell execution |
At 1,000 developers × 10 requests/day, a 2,000-token
copilot-instructions.mdconsumes 20 million tokens per day — before a single line of code enters the context.
| Mistake | Why it fails | Fix |
|---|---|---|
| Vague agent description | Wrong agent activated on wrong tasks | Write specific trigger conditions |
| No tool restrictions on specialist | Defeats the purpose of specialization | Restrict to minimum needed tools |
SHOULD instead of MUST NOT |
Agents treat soft constraints as optional | Use MUST NOT for hard rules |
| Instructions + Agent with duplicate content | >20% token overhead, no gain | Instructions = always-on only |
| One mega-agent for everything | Context bloat, unpredictable behavior | Split into specialists + generalist |
| LLM-generated agent files | -3% performance vs human-curated | Write and curate manually |
SKILL.md over 500 lines |
Reliability degrades above threshold | Split with progressive disclosure |
| Source | Topic |
|---|---|
| arXiv:2601.20404 | AGENTS.md impact on runtime and token consumption |
| arXiv:2602.11988 | ETH Zurich — LLM-generated vs human-curated context files |
| arXiv:2602.00180 | Human-refined specs reduce agent errors by 50% |
| arXiv:2510.10460 | Planner-Coder gap — vague task failure rates 7–83% |
| arXiv:2602.02584 | Constitutional SDD hierarchy prevents cascading failures |
| arXiv:2511.09268 | Architecture section is the highest-impact part of context files |
| arXiv:2511.21382 | 115-paper review — TDD with LLMs, iterative repair loop |
| arXiv:2602.12430 | Community Skills vulnerabilities — 26.1% injection rate |
| arXiv:2603.19583 | Human-expert skills vs LLM-generated skills |
| arXiv:2603.05344 | Lifecycle hook events and mutation behavior |
| arXiv:2508.14727 | All evaluated LLMs produce hardcoded credentials |
| arXiv:2604.07551 | OWASP LLM Top 10 — tool poisoning as primary vector |
| arXiv:2603.22489 | 5 out of 7 MCP clients lack input validation |
| Anthropic Agent Skills docs | SKILL.md best practices and 500-line limit |
Maintained by Paula Silva — AI-Native Software Engineer · Version 1.0.0 · April 2026




