Skip to content

feat: add llms.txt discovery as default agent behavior#18158

Merged
steipete merged 1 commit intoopenclaw:mainfrom
yolo-maxi:feature/llms-txt-discovery
Feb 16, 2026
Merged

feat: add llms.txt discovery as default agent behavior#18158
steipete merged 1 commit intoopenclaw:mainfrom
yolo-maxi:feature/llms-txt-discovery

Conversation

@yolo-maxi
Copy link

@yolo-maxi yolo-maxi commented Feb 16, 2026

Summary

Adds automatic llms.txt awareness so OpenClaw agents check for /llms.txt or /.well-known/llms.txt when exploring new domains.

Changes

  • System prompt: New llms.txt Discovery section (full mode only, when web_fetch is available) instructing agents to check for llms.txt files when visiting new domains
  • web_fetch tool: Updated description to mention llms.txt discovery

Why

llms.txt is an emerging standard (like robots.txt for AI) that helps site owners describe how AI agents should interact with their content and APIs. Making this a default agent behavior:

  • Helps the ecosystem adopt agent-native web experiences
  • Gives agents better context about how to use a site's resources
  • Zero cost when the file doesn't exist (agents just move on)
  • Respects site owner preferences for AI interaction

Details

The system prompt section is:

  • Only included in full prompt mode (not subagents)
  • Only included when web_fetch tool is available
  • Instructs agents not to warn when llms.txt is missing (most sites don't have one yet)

TypeScript compiles cleanly with no errors.

Greptile Summary

This PR adds automatic llms.txt awareness to OpenClaw agents. When the web_fetch tool is available and the prompt mode is full (not subagents), the system prompt now includes a new "llms.txt Discovery" section instructing agents to check for /llms.txt or /.well-known/llms.txt when visiting new domains. The web_fetch tool description is also updated to mention this behavior.

  • The buildLlmsTxtSection function follows the same pattern as other section builders (buildVoiceSection, buildMemorySection, etc.) with proper isMinimal and tool-availability guards
  • The section is correctly excluded for subagent/minimal/none prompt modes
  • One logic gap: the guard only checks for web_fetch availability, but the prompt text mentions "via web_fetch or browser" — agents with only browser available won't see this section
  • The existing minimal-mode test (system-prompt.e2e.test.ts) does not include web_fetch in toolNames and does not assert that the new section is excluded, so the new behavior has no direct test coverage

Confidence Score: 4/5

  • This PR is safe to merge — it only adds prompt text and a tool description update with no runtime logic changes.
  • The changes are low-risk (prompt text additions only, no runtime behavior changes). The section builder follows established patterns and has proper guards. One minor logic inconsistency exists between the guard condition and the prompt text regarding the browser tool, but this won't cause failures — it just means agents with only browser available won't get the llms.txt hint. No test coverage was added for the new section.
  • src/agents/system-prompt.ts — the buildLlmsTxtSection guard/text mismatch regarding browser tool should be reconciled

Last reviewed commit: 731963a

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Add automatic llms.txt awareness so agents check for /llms.txt or
/.well-known/llms.txt when exploring new domains.

Changes:
- System prompt: new 'llms.txt Discovery' section (full mode only,
  when web_fetch is available) instructing agents to check for llms.txt
  files when visiting new domains
- web_fetch tool: updated description to mention llms.txt discovery

llms.txt is an emerging standard (like robots.txt for AI) that helps
site owners describe how AI agents should interact with their content.
Making this a default behavior helps the ecosystem adopt agent-native
web experiences.

Ref: https://llmstxt.org
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: XS labels Feb 16, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +153 to +154
}
if (!params.availableTools.has("web_fetch")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard only checks web_fetch, but text mentions browser

The prompt text on line 159 says "via web_fetch or browser", but the gate at line 154 only checks for web_fetch. If an agent has browser available but not web_fetch, the section is silently omitted — even though the agent could still check for llms.txt via the browser tool.

Consider also gating on browser:

Suggested change
}
if (!params.availableTools.has("web_fetch")) {
if (!params.availableTools.has("web_fetch") && !params.availableTools.has("browser")) {
return [];
}

Alternatively, if the intent is to only include this when web_fetch is present, the prompt text on line 159 should drop the "or browser" mention to avoid confusion.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 153:154

Comment:
**Guard only checks `web_fetch`, but text mentions `browser`**

The prompt text on line 159 says "via web_fetch or browser", but the gate at line 154 only checks for `web_fetch`. If an agent has `browser` available but not `web_fetch`, the section is silently omitted — even though the agent could still check for llms.txt via the browser tool.

Consider also gating on `browser`:

```suggestion
  if (!params.availableTools.has("web_fetch") && !params.availableTools.has("browser")) {
    return [];
  }
```

Alternatively, if the intent is to only include this when `web_fetch` is present, the prompt text on line 159 should drop the "or browser" mention to avoid confusion.

How can I resolve this? If you propose a fix, please make it concise.

@steipete steipete merged commit e368c36 into openclaw:main Feb 16, 2026
26 checks passed
@sebslight sebslight self-assigned this Feb 17, 2026
@sebslight
Copy link
Member

Reverted in e6683a6 (reverts merge commit e368c36).

This was an accidental merge, so we rolled it back.

@yolo-maxi
Copy link
Author

Hey @sebslight — saw the revert. Understood if the merge was accidental, but the PR itself is intentional and all CI checks passed. Would love a proper review when you get a chance.

Happy to address any feedback or restructure the approach if needed.

@HenryLoenwind
Copy link
Contributor

HenryLoenwind commented Feb 18, 2026

Not sure if this is now in or not; it's in the changelog. Anyway:

"Zero cost" is not right. Without a way for the agent to keep track of what "new domains" are, this is two additional tool calls producing tokens to pay for. And with such a way, it's one tool call to read the file (even more tokens) or grep for the domain (if the agent is smart).

It'd be better to keep track of domains and cache the llms.txt in the code, and then give agents the llms.txt the first time they access that domain in a session alongside the fetch result. After scanning the llms.txt for prompt injection. Very heavily. We want them to follow those instructions, but not that hard. Those are basically skills that are downloaded from random sites the websearch has found...oh...on second thought, I don't want my agent to do that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants