feat: add llms.txt discovery as default agent behavior#18158
feat: add llms.txt discovery as default agent behavior#18158steipete merged 1 commit intoopenclaw:mainfrom
Conversation
Add automatic llms.txt awareness so agents check for /llms.txt or /.well-known/llms.txt when exploring new domains. Changes: - System prompt: new 'llms.txt Discovery' section (full mode only, when web_fetch is available) instructing agents to check for llms.txt files when visiting new domains - web_fetch tool: updated description to mention llms.txt discovery llms.txt is an emerging standard (like robots.txt for AI) that helps site owners describe how AI agents should interact with their content. Making this a default behavior helps the ecosystem adopt agent-native web experiences. Ref: https://llmstxt.org
| } | ||
| if (!params.availableTools.has("web_fetch")) { |
There was a problem hiding this comment.
Guard only checks web_fetch, but text mentions browser
The prompt text on line 159 says "via web_fetch or browser", but the gate at line 154 only checks for web_fetch. If an agent has browser available but not web_fetch, the section is silently omitted — even though the agent could still check for llms.txt via the browser tool.
Consider also gating on browser:
| } | |
| if (!params.availableTools.has("web_fetch")) { | |
| if (!params.availableTools.has("web_fetch") && !params.availableTools.has("browser")) { | |
| return []; | |
| } |
Alternatively, if the intent is to only include this when web_fetch is present, the prompt text on line 159 should drop the "or browser" mention to avoid confusion.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/system-prompt.ts
Line: 153:154
Comment:
**Guard only checks `web_fetch`, but text mentions `browser`**
The prompt text on line 159 says "via web_fetch or browser", but the gate at line 154 only checks for `web_fetch`. If an agent has `browser` available but not `web_fetch`, the section is silently omitted — even though the agent could still check for llms.txt via the browser tool.
Consider also gating on `browser`:
```suggestion
if (!params.availableTools.has("web_fetch") && !params.availableTools.has("browser")) {
return [];
}
```
Alternatively, if the intent is to only include this when `web_fetch` is present, the prompt text on line 159 should drop the "or browser" mention to avoid confusion.
How can I resolve this? If you propose a fix, please make it concise.|
Hey @sebslight — saw the revert. Understood if the merge was accidental, but the PR itself is intentional and all CI checks passed. Would love a proper review when you get a chance. Happy to address any feedback or restructure the approach if needed. |
|
Not sure if this is now in or not; it's in the changelog. Anyway: "Zero cost" is not right. Without a way for the agent to keep track of what "new domains" are, this is two additional tool calls producing tokens to pay for. And with such a way, it's one tool call to read the file (even more tokens) or grep for the domain (if the agent is smart). It'd be better to keep track of domains and cache the llms.txt in the code, and then give agents the llms.txt the first time they access that domain in a session alongside the fetch result. After scanning the llms.txt for prompt injection. Very heavily. We want them to follow those instructions, but not that hard. Those are basically skills that are downloaded from random sites the websearch has found...oh...on second thought, I don't want my agent to do that! |
Summary
Adds automatic llms.txt awareness so OpenClaw agents check for
/llms.txtor/.well-known/llms.txtwhen exploring new domains.Changes
llms.txt Discoverysection (full mode only, whenweb_fetchis available) instructing agents to check for llms.txt files when visiting new domainsWhy
llms.txt is an emerging standard (like robots.txt for AI) that helps site owners describe how AI agents should interact with their content and APIs. Making this a default agent behavior:
Details
The system prompt section is:
fullprompt mode (not subagents)web_fetchtool is availableTypeScript compiles cleanly with no errors.
Greptile Summary
This PR adds automatic llms.txt awareness to OpenClaw agents. When the
web_fetchtool is available and the prompt mode isfull(not subagents), the system prompt now includes a new "llms.txt Discovery" section instructing agents to check for/llms.txtor/.well-known/llms.txtwhen visiting new domains. Theweb_fetchtool description is also updated to mention this behavior.buildLlmsTxtSectionfunction follows the same pattern as other section builders (buildVoiceSection,buildMemorySection, etc.) with properisMinimaland tool-availability guardsweb_fetchavailability, but the prompt text mentions "via web_fetch or browser" — agents with onlybrowseravailable won't see this sectionsystem-prompt.e2e.test.ts) does not includeweb_fetchintoolNamesand does not assert that the new section is excluded, so the new behavior has no direct test coverageConfidence Score: 4/5
browsertool, but this won't cause failures — it just means agents with onlybrowseravailable won't get the llms.txt hint. No test coverage was added for the new section.src/agents/system-prompt.ts— thebuildLlmsTxtSectionguard/text mismatch regardingbrowsertool should be reconciledLast reviewed commit: 731963a
(2/5) Greptile learns from your feedback when you react with thumbs up/down!