astro-llm

Deterministic, build-time content extraction for Astro sites, designed for Large Language Model (LLM) usage.

Supports Astro 4, 5, and 6.

astro-llm generates a single, clean, static context file from your built HTML — suitable for:

Retrieval-Augmented Generation (RAG)
Chat grounding
Offline LLM training
Search indexing
Auditable documentation snapshots

No runtime JavaScript.
No servers.
No magic.

Core Principles

Build-time only – runs after astro build
Deterministic output – same input, same output
Config-first – behaviour controlled by llm.config.json
Safety by default – sensitive data stripped
LLM-friendly – readable, predictable structure

What This Plugin Does

After your site is built:

Scans generated .html files in /dist
Extracts readable content in DOM order
Applies safety rules (email / phone / scripts)
Applies include/exclude rules
Writes a single output file (e.g. llm.txt or llm.json)

What This Plugin Does NOT Do

❌ No runtime DOM mutation
❌ No network requests
❌ No environment variables
❌ No telemetry or analytics
❌ No automatic crawling or discovery

Everything is explicit.

First Run Behaviour

On first run (dev or build), astro-llm will:

Create llm.config.json in the project root
Populate it with explicit defaults
Never overwrite it again

If the file already exists, it is left untouched.

Configuration (`llm.config.json`)

This file is the single source of truth.

{
  "enabled": true,
  "output": {
    "format": "txt",
    "filename": "llm.txt"
  },
  "include": {
    "pages": true,
    "headings": true,
    "paragraphs": true,
    "lists": true,
    "tables": true,
    "codeBlocks": true,
    "meta": {
      "title": true,
      "description": true,
      "keywords": true
    }
  },
  "exclude": {
    "paths": [],
    "selectors": []
  },
  "safety": {
    "stripEmails": true,
    "stripPhoneNumbers": true,
    "stripForms": true,
    "stripScripts": true
  },
  "purpose": {
    "llmTraining": true,
    "ragIndexing": true,
    "chatGrounding": true
  }
}

Output Format

TXT (default)

---
PATH: /index.html
---
Page title
Section heading
Paragraph content here
[email removed]

JSON

{
  "documents": [
    "---\nPATH: /index.html\n---\nPage title Section heading Paragraph content here"
  ]
}

Safety Rules

When enabled, the plugin removes:

Email addresses → [email removed]
Phone numbers → [phone removed]
<script>, <style>, <form> blocks
Inline JavaScript content

Already-encoded entities are preserved.

Exclusions

Path exclusions

"exclude": {
  "paths": ["/admin", "/api"]
}

Selector exclusions

"exclude": {
  "selectors": [".llm-ignore", "#internal"]
}

Determinism Guarantee

Given:

Same HTML output
Same config
Same plugin version

You will always get identical output.

Recommended Use Cases

RAG pipelines
Static knowledge bases
LLM prompt grounding
Offline semantic indexing
Compliance-safe extraction

License

MIT © Velohost

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

astro-llm

Core Principles

What This Plugin Does

What This Plugin Does NOT Do

First Run Behaviour

Configuration (`llm.config.json`)

Output Format

TXT (default)

JSON

Safety Rules

Exclusions

Path exclusions

Selector exclusions

Determinism Guarantee

Recommended Use Cases

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

astro-llm

Core Principles

What This Plugin Does

What This Plugin Does NOT Do

First Run Behaviour

Configuration (llm.config.json)

Output Format

TXT (default)

JSON

Safety Rules

Exclusions

Path exclusions

Selector exclusions

Determinism Guarantee

Recommended Use Cases

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`llm.config.json`)

Packages