Local-first Apple documentation extraction, cleanup, and search. Built for fast developer lookup and agent workflows that need deterministic, citeable Markdown instead of scraping the web.
This repo is a working toolkit and a work in progress. Not all planned features are fully implemented yet. It assumes you already have the Apple API Reference docset (Dash docset) on disk and a brotli CLI available.
- Apple docs are large and dynamic; agents need stable, local references.
- DocC exports are noisy; we need predictable front matter and trimmed tables of contents.
- Local search should be instant, without re-reading docsets for every query.
tools/docset_query.py— exports DocC content from the Apple docset to Markdown.tools/docset_sanitize.py— rebuilds front matter + trims the TOC for cleaner context.tools/docindex.py— builds a local JSON index for fast search by heading/key sections.tools/docmeta.py— peeks front matter/TOC quickly for debugging.scripts/sync_docs.sh— syncs a canonical docs cache intodocs/apple(repo cache is gitignored).
# Export a framework/topic tree to Markdown
python tools/docset_query.py export \
--root /documentation/vision \
--output docs/apple/vision.md
# Sanitize the export (trim TOC, rebuild front matter)
python tools/docset_sanitize.py --input docs/apple/vision.md --in-place --toc-depth 2
# Build or refresh the search index
python tools/docindex.py rebuild
# Search headings/key sections
python tools/docindex.py search "CVPixelBuffer"This is the flow we use in other repos that need grounded Apple citations:
- Search locally first. Agents call
docindex.py searchagainstdocs/apple. - Fetch only what’s missing. If the topic isn’t there, use
docset_query.py fetchorexport. - Sanitize for stable context. Run
docset_sanitize.pyto keep front matter and TOC consistent. - Rebuild the index.
docindex.py rebuildkeeps agent search fast and deterministic. - Keep a canonical cache. Sync with
scripts/sync_docs.shsodocs/applestays a lightweight, shareable cache without committing the full docset.
This approach lets agents answer questions with local, vetted Markdown and avoids hitting remote docs during runs.
- Reads the Dash Apple API Reference docset directly (SQLite + brotli chunks).
- Commands:
export— walk a documentation tree and emit a single Markdown file.fetch— render a single symbol/topic (optionally to stdout).init— prebuild manifests for faster traversal.
- Defaults:
- Docset path:
~/Library/Application Support/Dash/DocSets/Apple_API_Reference/Apple_API_Reference.docset - Language:
swift - Cache:
~/.cache/apple-docs exportdepth: 7,fetchdepth: 1
- Docset path:
- Overrides:
--docsetorDOCSET_ROOTfor alternate docsets--languagefor alternate language variantsDOCSET_CACHE_DIRfor cache location
- Rebuilds front matter with a stable summary and key sections.
- Trims TOC depth and drops noisy stopwords (e.g. “discussion”, “parameters”).
- Keeps output deterministic so agent prompts stay consistent.
- Builds
Build/DocIndex/index.jsonfrom Markdown indocs/apple. - Indexes front matter, headings, and key sections.
- Search matches headings/key sections and returns anchored paths.
docs/appleis a cache-only directory and is.gitignore’d.- Use the sync script to mirror a canonical docs folder into the repo cache:
- Pull:
DOCS_SOURCE=~/docs/apple scripts/sync_docs.sh pull --allow-delete - Push:
DOCS_SOURCE=~/docs/apple scripts/sync_docs.sh push
- Pull:
- This toolchain assumes a local Apple docset; it does not download docsets.
- Docsets come from Dash and Kapeli’s feeds:
- Dash app + docsets: https://kapeli.com/dash
- Docset feeds (download without the app): https://github.com/Kapeli/feeds
- The scripts are intentionally CLI-first so they can be scripted by agents.
- See
AGENTS_RULES.mdfor the workflow guardrails we use internally.
Implemented now:
- Docset export (
tools/docset_query.py):export,fetch, andinit. - Sanitizer (
tools/docset_sanitize.py): front matter rebuild + TOC trimming. - Index + search (
tools/docindex.py): JSON index + heading/key-section search. - Metadata peek (
tools/docmeta.py): front matter/TOC inspection. - Cache sync (
scripts/sync_docs.sh): pull/push to a canonical docs folder.
Planned (not implemented):
- Automated docset download/updates from Kapeli feeds or other vendor sources.
- A “docset” here means the Dash-compatible docset format on disk.
- Dash is the most common way to install and update docsets, but the feed repo also lets you download without the app.
- This toolkit only consumes a docset you already have and reads it locally; it does not fetch or manage docsets.
- Add a small helper that reads Kapeli’s feed metadata and downloads vendor docsets automatically.
- Cache and unpack docsets into a consistent local location so agents can bootstrap a repo quickly.
