Awesome Agent Harness 🛠️

A curated list of pioneering research papers, tools, and resources on the Agent Harness — the systematic execution layer that transforms raw model capability into sustained, long-horizon autonomy.

A Survey on AI Agent Harness

Agent = Model (Stochastic Intelligence) + Harness (Deterministic Infrastructure)

The survey proposes a Unified Architectural Taxonomy that organizes the Agent Harness as a four-layered stack:

Layer 1: Execution & Orchestration — The temporal engine driving the autonomous execution loop, model routing, and multi-agent composition.
Layer 2: Context & Trajectory Management — The epistemic layer governing state compaction, trajectory persistence, memory hierarchies, and observability.
Layer 3: Interaction Surface & Execution Environment — The sensory and actuation organs connecting the agent to the world via tool calling, standardized protocols, and sandboxed execution.
Layer 4: Constraints & Guardrails — The independent observer enforcing deterministic laws through access control, permission management, and defense against agent injection.

The figure below illustrates the asymmetric co-evolution between model capability and harness responsibility:

We aim to provide a comprehensive overview for researchers, developers, and infrastructure engineers interested in this rapidly advancing field.

Agent Harness Foundations

Cross-layer conceptual works that define and motivate the Agent Harness as a first-class research object.

Title	Author	Year	Description
Effective harnesses for long-running agents	Young et al.	2025	long-running agent harness management
Natural-Language Agent Harnesses	Pan et al.	2026	natural-language harness design
Harness Engineering for Language Agents: The Harness Layer as Control, Agency, and Runtime	He et al.	2026	harness as control, agency, and runtime layer
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned	Bui et al.	2026	terminal coding agent scaffolding, context engineering, lessons learned
Harness engineering: leveraging Codex in an agent-first world	Lopopolo et al.	2026	Codex-based harness engineering
The importance of Agent Harness in 2026	Schmid et al.	2026	agent harness importance analysis
What is an agent harness in the context of large-language models?	Parallel Web Systems et al.	2025	agent harness concept overview
Meta-Harness: End-to-End Optimization of Model Harnesses	Lee et al.	2026	end-to-end automated optimization of harness code

Layer 1: Execution & Orchestration

Acting as the temporal engine of the harness, Layer 1 drives the autonomous execution loop, manages model routing, orchestrates multi-agent compositions, and enforces resilience mechanisms to maintain forward momentum under failures.

Model & Agent Routing

Dynamically determining which LLM or specialized agent should handle a given subtask, optimizing for cost, capability, and resource constraints.

Title	Author	Year	Description
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems	Zhang et al.	2026	experience-driven self-routing
Best-route: Adaptive llm routing with test-time optimal compute	Ding et al.	2025	test-time optimal compute routing
Masrouter: Learning to route llms for multi-agent systems	Yue et al.	2025	multi-agent system routing learning
Adaptive vision-language model routing for computer use agents	Liu et al.	2026	adaptive VLM routing for computer use
Camar: Continuous actions multi-agent routing	Pshenitsyn et al.	2026	continuous-action multi-agent routing
SkillOrchestra: Learning to Route Agents via Skill Transfer	Wang et al.	2026	skill-transfer-based agent routing
DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching	Lu et al.	2026	semantic-matching topology routing
Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models	Wang et al.	2026	confidence-aware multi-scale routing
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory	Zhang et al.	2026	query-aware budget-tier memory routing
CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing	Liu et al.	2026	context-aware task-efficient routing
Budget-aware agentic routing via boundary-guided training	Zhang et al.	2026	boundary-guided budget-aware routing
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference	Ma et al.	2026	active-inference adaptive routing
Optimal-agent-selection: State-aware routing framework for efficient multi-agent collaboration	Wang et al.	2025	state-aware optimal agent selection
Towards generalized routing: Model and agent orchestration for adaptive and efficient inference	Guo et al.	2025	generalized model-agent orchestration

Multi-Agent Composition & Orchestration

Treating agents as composable, modular entities and orchestrating concurrent subagent spawning, delegation, and synchronized state handoffs.

Title	Author	Year	Description
Claude Code Subagents	Anthropic	2025	custom AI subagent spawning
Compass: Enhancing agent long-horizon reasoning with evolving context	Wan et al.	2025	evolving context for long-horizon reasoning
Kimi K2. 5: Visual Agentic Intelligence	Team et al.	2026	visual agentic intelligence
Swarm: An educational framework exploring ergonomic, lightweight multi-agent orchestration	OpenAI et al.	2024	lightweight multi-agent orchestration
CrewAI: Framework for orchestrating role-playing autonomous AI agents	Moura et al.	2025	role-playing agent orchestration
A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows	Daunis et al.	2025	declarative agent workflow language
Orchestral AI: A Framework for Agent Orchestration	Roman et al.	2026	general-purpose agent orchestration

Autonomous Loop, Resilience & Human-in-the-Loop

Ensuring the execution loop is resilient to non-termination and drift, and managing the spectrum from full human oversight to closed-loop autonomy.

Title	Author	Year	Description
Human-in-the-Loop or AI-in-the-Loop? Automate or Collaborate?	Natarajan et al.	2025	human-in-the-loop vs AI-in-the-loop
Adaptive fault tolerance mechanisms of large language models in cloud computing environments	Jin et al.	2025	adaptive fault tolerance in cloud LLMs
ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering	dos Santos Filho et al.	2026	event sourcing for autonomous agents
Combining LLM, Non-monotonic Logical Reasoning, and Human-in-the-loop Feedback in an Assistive AI Agent	Fu et al.	2025	LLM + non-monotonic reasoning + HITL
Enabling self-improving agents to learn at test time with human-in-the-loop guidance	He et al.	2025	test-time learning with human guidance
Planagent: A multi-modal large language agent for closed-loop vehicle motion planning	Zheng et al.	2026	closed-loop vehicle motion planning
A multi-AI agent system for autonomous optimization of agentic AI solutions via iterative refinement and LLM-driven feedback loops	Yuksel et al.	2025	iterative refinement via LLM feedback
Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows	Xiao et al.	2026	autonomous combustion research agent
From llm reasoning to autonomous ai agents: A comprehensive review	Ferrag et al.	2025	LLM reasoning to autonomous agents survey

Layer 2: Context & Trajectory Management

While the orchestration layer manages execution time, Layer 2 governs the agent's epistemic space — mitigating context window saturation, catastrophic forgetting, and maintaining strict observability.

Memory Systems

Structured, queryable knowledge layers ranging from production-ready platforms to research prototypes.

Title	Author	Year	Description
Mem0: Building production-ready ai agents with scalable long-term memory	Chhikara et al.	2025	scalable production long-term memory
Zep: a temporal knowledge graph architecture for agent memory	Rasmussen et al.	2025	temporal knowledge graph memory
Memory is all you need: Testing how model memory affects llm performance in annotation tasks	Timoneda et al.	2025	memory effects on LLM annotation
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations	Jiayang et al.	2026	interactive memory benchmarking
Evaluating memory in llm agents via incremental multi-turn interactions	Hu et al.	2025	incremental multi-turn memory eval
Nemori: Self-organizing agent memory inspired by cognitive science	Nan et al.	2025	self-organizing cognitive memory
MemGPT: towards LLMs as operating systems.	Packer et al.	2023	LLM as operating system with memory tiers
A-mem: Agentic memory for llm agents	Xu et al.	2025	agentic self-organizing memory
Memagent: Reshaping long-context llm with multi-conv rl-based memory agent	Yu et al.	2025	multi-conv RL-based memory agent
G-memory: Tracing hierarchical memory for multi-agent systems	Zhang et al.	2025	hierarchical multi-agent memory tracing
Hipporag: Neurobiologically inspired long-term memory for large language models	Gutierrez et al.	2024	neurobiological long-term memory
SimpleMem: Efficient Lifelong Memory for LLM Agents	Liu et al.	2026	efficient lifelong memory for agents
General agentic memory via deep research	Yan et al.	2025	agentic memory via deep research
Choosing How to Remember: Adaptive Memory Structures for LLM Agents	Lu et al.	2026	adaptive memory structure selection
From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents	Zhu et al.	2026	provenance-aware tiered memory
Lifelong learning of large language model based agents: A roadmap	Zheng et al.	2026	lifelong learning roadmap for agents
Memory Poisoning Attack and Defense on Memory Based LLM-Agents	Sunil et al.	2026	memory poisoning attack and defense

Context Compression

Strategies to prevent Context Rot — the progressive degradation of reasoning quality due to accumulated irrelevant tokens.

Title	Author	Year	Description
Acon: Optimizing context compression for long-horizon llm agents	Kang et al.	2025	long-horizon agent context compression
Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression	Jiang et al.	2024	prompt compression via token scoring
Scaling llm multi-turn rl with end-to-end summarization-based context management	Lu et al.	2025	summarization-based context management
Longcodebench: Evaluating coding llms at 1m context windows	Rando et al.	2025	1M-token coding evaluation
Scaling long-horizon llm agent via context-folding	Sun et al.	2025	hierarchical trajectory folding
Pretraining context compressor for large language models with embedding-based memory	Dai et al.	2025	embedding-based context compressor
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents	Wang et al.	2026	self-adaptive context pruning for code
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management	Lindenbauer et al.	2025	observation masking vs summarization
Longcodezip: Compress long context for code language models	Shi et al.	2025	long-context code compression
Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents	Zhou et al.	2025	synergize memory and reasoning
ContextBench: A Benchmark for Context Retrieval in Coding Agents	Li et al.	2026	context retrieval benchmarking
Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory	Zhang et al.	2026	runtime RL on episodic memory
Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model	Hu et al.	2025	hierarchical working memory management
SWE Context Bench: A Benchmark for Context Learning in Coding	Zhu et al.	2026	context learning benchmarking
Safesieve: From heuristics to experience in progressive pruning for llm-based multi-agent communication	Zhang et al.	2026	progressive multi-agent comm pruning

Trajectory Persistence & Observability

Persisting the agent's execution history to external storage for recovery, replay, and continuous learning, while decoupling observability from the model's working memory.

Title	Author	Year	Description
Reducing Cost of LLM Agents with Trajectory Reduction	Xiao et al.	2025	trajectory reduction for efficiency
Semantic Checkpointing for Stateless LLM Agents in Multi-Tenant Enterprise Systems	Roshan et al.	2025	semantic checkpointing for stateless agents
Large-scale Evaluation of Notebook Checkpointing with AI Agents	Fang et al.	2025	notebook checkpointing evaluation
AgentTrace: A Structured Logging Framework for Agent System Observability	AlSayyad et al.	2026	structured logging for observability
AgentSight: System-Level Observability for AI Agents Using eBPF	Zheng et al.	2025	eBPF-based system-level observability
Durable Execution in LangGraph	LangChain et al.	2026	fault-tolerant durable execution

Self-Evolving Architectures

Agent systems that improve their own capabilities, prompts, or memory structures at test time or through continuous interaction.

Title	Author	Year	Description
Darwin godel machine: Open-ended evolution of self-improving agents	Zhang et al.	2025	open-ended self-improving evolution
Your agent may misevolve: Emergent risks in self-evolving llm agents	Shao et al.	2025	emergent risks in self-evolution
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?	Xia et al.	2025	on-the-fly SWE agent self-evolution
Agentic context engineering: Evolving contexts for self-improving language models	Zhang et al.	2025	evolving contexts for self-improvement
Gepa: Reflective prompt evolution can outperform reinforcement learning	Agrawal et al.	2025	reflective prompt evolution
Dynamic cheatsheet: Test-time learning with adaptive memory	Suzgun et al.	2026	test-time learning with adaptive memory
Adaptive self-improvement llm agentic system for ml library development	Zhang et al.	2025	self-improvement for ML library dev
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization	Zhang et al.	2025	self-improving kernel optimization
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs	Zhou et al.	2025	fine-tuning agents without LLM FT
Multi-agent evolve: Llm self-improve through co-evolution	Chen et al.	2025	LLM self-improve through co-evolution
Ragen: Understanding self-evolution in llm agents via multi-turn reinforcement learning	Wang et al.	2025	self-evolution via multi-turn RL
Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory	Wei et al.	2025	benchmarking test-time self-evolving memory
WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model	Fang et al.	2025	co-evolving web world model
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience	Sun et al.	2025	self-evolving computer use agent
Self-evolving multi-agent simulations for realistic clinical interactions	Almansoori et al.	2025	self-evolving clinical simulations
EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks	Feng et al.	2025	continual world model for long-horizon
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data	Acikgoz et al.	2026	self-evolving tool learning from zero
EvoTool: Self-evolving tool-use policy optimization in llm agents via blame-aware mutation and diversity-aware selection	Yang et al.	2026	blame-aware tool-use optimization
AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution	Yang et al.	2026	experience-driven lifelong skill evolution
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents	Zhang et al.	2026	learning and evolving memory skills
EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration	Guo et al.	2026	self-evolving multi-agent configuration
Over-Searching in Search-Augmented Large Language Models	Xie et al.	2026	over-searching in search-augmented LLMs

Agentic Skills

Modular, reusable capabilities that agents acquire, compose, and execute to extend their action space.

Title	Author	Year	Description
Inducing programmatic skills for agentic tasks	Wang et al.	2025	inducing programmatic skills
SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?	Chen et al.	2026	tool-use skill learning evaluation
SoK: Agentic Skills--Beyond Tool Use in LLM Agents	Jiang et al.	2026	systematization of agentic skills
SkillReducer: Optimizing LLM Agent Skills for Token Efficiency	Gao et al.	2026	token-efficient skill optimization
Agent skills for large language models: Architecture, acquisition, security, and the path forward	Xu et al.	2026	skill architecture, acquisition, security
Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality	Ling et al.	2026	data-driven Claude skill analysis
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale	Zheng et al.	2026	retrieve-and-rerank skill selection
When single-agent with skills replace multi-agent systems and when they fail	Li et al.	2026	single-agent skills vs multi-agent
EvoSkill: Automated Skill Discovery for Multi-Agent Systems	Alzubi et al.	2026	automated multi-agent skill discovery
SkillsBench: Benchmarking how well agent skills work across diverse tasks	Li et al.	2026	skill benchmarking across diverse tasks
Cua-skill: Develop skills for computer using agent	Chen et al.	2026	skills for computer-using agents
Introducing Agent Skills	Anthropic et al.	2025	agent skill platform launch
Reinforcement Learning for Self-Improving Agent with Skill Library	Wang et al.	2025	RL-based self-improving skill library
Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis	Jiao et al.	2026	compositional skill synthesis
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver	Ma et al.	2026	collective skill evolution via cloud sharing

Skills Security

Security vulnerabilities and defenses related to agentic skill systems and skill-based prompt injection.

Title	Author	Year	Description
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections	Schmotz et al.	2025	skill-based prompt injection analysis
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale	Liu et al.	2026	skill security vulnerabilities at scale
Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study	Liu et al.	2026	malicious skill detection study
When Skills Lie: Hidden-Comment Injection in LLM Agents	Wang et al.	2026	hidden-comment skill injection
Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections	Yang et al.	2026	persistent control via self-reinforcing injection

Layer 3: Interaction Surface & Execution Environment

Because language models are inherently disembodied, Layer 3 constitutes the sensory and actuation organs of the agentic system — standardizing interfaces for tool calling and code execution, and enforcing hardware-level isolation.

Standardized Protocols & Interaction Surface

Defining and standardizing how agents interact with tools, APIs, and external environments.

Title	Author	Year	Description
From language to action: a review of large language models as autonomous agents and tool users	Chowa et al.	2026	LLM as autonomous agent review
Defining and Detecting the Defects of Large Language Model-based Autonomous Agents	Ning et al.	2026	LLM agent defect detection
Llm agents making agent tools	Wolflein et al.	2025	agents making agent tools
Code-Mode: Plug-and-play library to enable agents to call MCP and UTCP tools via code execution	Protocol et al.	2026	MCP/UTCP via code execution
Ui-tars: Pioneering automated gui interaction with native agents	Qin et al.	2025	native automated GUI interaction
GeoJSON agents: a multi-agent LLM architecture for geospatial analysis—function calling vs. code generation	Luo et al.	2026	function calling vs code generation
Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity	Kim et al.	2026	real-world API complexity evaluation
Beyond Message Passing: Toward Semantically Aligned Agent Communication	Yuan et al.	2026	semantically aligned agent communication
Improving Google A2A Protocol: Protecting Sensitive Data and Mitigating Unintended Harms in Multi-Agent Systems	Louck et al.	2025	A2A protocol sensitive data protection
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems	Li et al.	2026	protocol-aware A2A security benchmark

Tool Use & Code Execution

Benchmarks and methods for evaluating and improving agent tool use capabilities.

Title	Author	Year	Description
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment	Dihan et al.	2025	action-aware web tree search
Terminal-bench: Benchmarking agents on hard, realistic tasks in command line interfaces	Merrill et al.	2026	CLI task benchmarking
Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use	Liu et al.	2026	budget-constrained tool planning
Toolsandbox: A stateful, conversational, interactive evaluation benchmark for llm tool use capabilities	Lu et al.	2025	stateful tool-use evaluation

Layer 4: Constraints & Guardrails

Because LLM outputs are inherently probabilistic, Layer 4 acts as an independent observer and judge — imposing deterministic laws of physics and security boundaries on the system, operating entirely out-of-band.

Sandboxing & Execution Environments

Isolating agent execution to contain erratic behaviors and protect host infrastructure.

Title	Author	Year	Description
The two patterns by which agents connect sandboxes	LangChain et al.	2026	agent-sandbox connection patterns
SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents	Yuan et al.	2026	container-free RL sandbox
Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning	Wang et al.	2025	multi-turn RL GUI agent sandbox
Computer Environments Elicit General Agentic Intelligence in LLMs	Cheng et al.	2026	sandbox for agentic intelligence
Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents	Gajo et al.	2025	virtual retail store environment
SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents	Touzel et al.	2025	social media simulation sandbox
cellmate: Sandboxing browser ai agents	Meng et al.	2025	browser AI agent sandboxing
AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems	Piao et al.	2025	hybrid human-AI interaction sandbox
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations	Chen et al.	2025	static sandboxes are inadequate; open-ended co-evolution needed
Deepresearchgym: A free, transparent, and reproducible evaluation sandbox for deep research	Coelho et al.	2025	reproducible deep research evaluation
SWE-World: Building Software Engineering Agents in Docker-Free Environments	Sun et al.	2026	Docker-free SWE environments
R2e-gym: Procedural environments and hybrid verifiers for scaling open-weights swe agents	Jain et al.	2025	procedural environments with hybrid verifiers

Governance Boundaries

Enforcing access control, permission management, and policy compliance for agent actions.

Title	Author	Year	Description
POLARIS: Typed Planning and Governed Execution for Agentic AI in Back-Office Automation	Moslemi et al.	2026	typed planning and governed execution
ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files	Sharma et al.	2026	executable constraint enforcement
Sandbox-runtime: A lightweight sandboxing tool for enforcing filesystem and network restrictions on arbitrary processes at the OS level, without requiring a container	Anthropic et al.	2026	OS-level filesystem/network sandboxing
Securing AI Agent Execution	Buhler et al.	2025	agent execution security analysis
BashArena: A Control Setting for Highly Privileged AI Agents	Kaufman et al.	2025	highly-privileged agent control setting
Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents	Maloyan et al.	2026	MCP specification security analysis

Agent Injection & Defense

Defending against adversarial prompt injection attacks targeting agentic systems.

Title	Author	Year	Description
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks	Schmotz et al.	2026	skill file attack measurement
AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System	Li et al.	2026	dynamic prompt injection benchmark
WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents	Wang et al.	2026	web agent injection detection
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack	Li et al.	2026	reasoning-enhanced injection safety
From assistant to double agent: Formalizing and benchmarking attacks on openclaw for personalized local ai agent	Wang et al.	2026	formalizing attacks on OpenClaw for personalized local agents
Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats	Deng et al.	2026	OpenClaw security analysis and mitigation
Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw	Ying et al.	2026	OpenClaw threat architecture and defenses
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers	Liu et al.	2026	comprehensive safety via skills, plugins, and watchers
A trajectory-based safety audit of clawdbot (openclaw)	Chen et al.	2026	trajectory-based safety audit
OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents	Li et al.	2026	zero-fork defense-in-depth runtime
OpenClaw Agents on Moltbook: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social Network	Manik et al.	2026	risky instruction sharing and norm enforcement in agent networks

Contributing

Contributions are welcome! To add a paper, open a pull request with the new entry added to the relevant section, following the format below:

Title-Year-Brief description

Please ensure the paper is directly relevant to the Agent Harness infrastructure.

This repository is maintained in conjunction with the survey paper "A Survey on AI Agent Harness".

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Agent_Harness_Survey.pdf		Agent_Harness_Survey.pdf
README.md		README.md
taxonomy.png		taxonomy.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Agent Harness 🛠️

Contents

Agent Harness Foundations

Layer 1: Execution & Orchestration

Model & Agent Routing

Multi-Agent Composition & Orchestration

Autonomous Loop, Resilience & Human-in-the-Loop

Layer 2: Context & Trajectory Management

Memory Systems

Context Compression

Trajectory Persistence & Observability

Self-Evolving Architectures

Agentic Skills

Skills Security

Layer 3: Interaction Surface & Execution Environment

Standardized Protocols & Interaction Surface

Tool Use & Code Execution

Layer 4: Constraints & Guardrails

Sandboxing & Execution Environments

Governance Boundaries

Agent Injection & Defense

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Agent Harness 🛠️

Contents

Agent Harness Foundations

Layer 1: Execution & Orchestration

Model & Agent Routing

Multi-Agent Composition & Orchestration

Autonomous Loop, Resilience & Human-in-the-Loop

Layer 2: Context & Trajectory Management

Memory Systems

Context Compression

Trajectory Persistence & Observability

Self-Evolving Architectures

Agentic Skills

Skills Security

Layer 3: Interaction Surface & Execution Environment

Standardized Protocols & Interaction Surface

Tool Use & Code Execution

Layer 4: Constraints & Guardrails

Sandboxing & Execution Environments

Governance Boundaries

Agent Injection & Defense

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages