This file explains the intent, architecture, expectations, and guard‑rails for automated / AI contributions to the folder.api v2 codebase.
folder.api parses generic HTTP directory listings (Apache, NGINX, IIS, others) into structured metadata. It:
- Traverses recursively (depth-limited) starting from a directory URL.
- Normalizes folder & file entries (names, hidden flags, dates, sizes).
- Optionally enriches MIME & size via parallel HEAD requests.
- Falls back to a sandboxed iframe loader when direct fetch fails (auto mode) in browser contexts.
- Produces a hierarchical root tree plus flattened arrays and stats.
Primary consumer environments:
- Browser (ESM) – requires CORS / same-origin unless iframe fallback is used.
- Node.js (>= 18, pure ESM) – no CommonJS build.
Non‑goals: Server signature sniffing, auth handling, streaming partial traversal, speculative prefetch.
folderApiRequest() (public entrypoint)
normalizeOptions -> options.ts
traverse() (core/recursion.ts)
fetchDirectoryHtml() (core/fetchDirectory.ts)
iframeDirectoryHtml() (core/iframeDirectory.ts)
parseDirectoryHtml() (core/parseDirectory.ts)
heuristics: choose main anchor cluster, extract tokens, classify, parse date/size
enrichMime() (optional) (core/mime.ts)
assemble + stats (types.ts structures)
Supporting utilities: url normalization, decoding, date/size parsing, hidden detection, semaphore for HEAD concurrency, error tagging.
See src/types.ts for canonical definitions.
- FolderNode / FolderEntry / FileEntry
- FolderApiOptions:
- maxDepth (>=0; default 0)
- mode: fetch | iframe | auto (default auto)
- includeMime (boolean default false)
- headConcurrency (default 4; clamp >=1)
- timeoutMs (per directory, default 15000, clamp >=100)
- sameOriginOnly (default true) – currently enforced upstream by user; traversal itself assumes already vetted URL.
- signal (AbortSignal)
- Result stats: fetches, iframes, heads, durationMs (internal), maxDepth.
Errors are recorded as strings with a category prefix (e.g. date:, size:, mime:, decode:, loop:, limit:). Do not silently discard parse issues—append via pushError.
Maintain these unless a deliberate versioned change is requested:
- Pure ESM distribution (no CJS).
package.jsontype: modulemust remain. - All directory URLs normalized to end with a slash and de-duped path slashes.
- No network requests besides: GET (directory HTML) + HEAD (optional MIME). No POST/PUT/etc.
- HEAD concurrency honors
headConcurrencyvia the semaphore. - Recursion safety: hard cap at 50,000 entries (files + folders) -> emits
limit:error and stops expanding further. - Hidden detection: leading dot excluding
.and... - Dates: Converted/stored as ISO 8601 UTC strings (no time zone guessing beyond provided tokens).
- Sizes: Prefer explicit unit tokens (K,M,G) > raw integers when ambiguous with date/time.
- Auto mode fallback: Only attempt iframe after a failed fetch (error or non-200) – never both in parallel.
- No global mutable singletons besides the internal semaphore instances created per request.
- Fetch one directory at a time (depth-first) to avoid unbounded fan‑out; acceptable because typical listings are modest.
- HEAD enrichment is parallel; keep it bounded.
- Avoid regex catastrophes—current parsers operate on trimmed tokens and short lines.
- Do not introduce large dependencies; current footprint is TS + stdlib.
Tests live in tests/ using Vitest (jsdom for DOM dependent code). Suites:
- Unit: date parsing, size parsing, hidden detection, options normalization, classification, ambiguous date handling, URL utilities.
- Integration: basic traversal, recursion depth, iframe fallback (mocked), iframe-only mode, MIME enrichment, timeout, loop prevention, decode errors. Run with:
npm test
Add tests for any new behavior; maintain >90% coverage (implicit target). Prefer deterministic, synthetic HTML snippets rather than hitting live servers.
When proposing changes:
- Update or add unit tests first for new logic.
- Keep patches minimal—avoid unrelated formatting churn.
- Preserve existing error category prefixes; add new categories only if strongly justified.
- Extend
NormalizedOptionsthroughnormalizeOptionswith validation & clamping if adding options. - Always rebuild (
npm run build) after TS changes so the harness can import updated dist.
| Area | Pitfall | Guidance |
|---|---|---|
| URL handling | Missing trailing slash leads to double requests via server redirect | Always run through normalizeDirectoryUrl / ensureHttp. |
| iframe mode | Trying to use in Node environment | Detect document existence and throw meaningful error (already implemented). |
| MIME enrichment | Serial HEADs cause slowness | Keep semaphore; do not regress concurrency. |
| Parsing | Grabbing all anchors (noise) | Let parseDirectoryHtml clustering heuristics stand unless improved with tests. |
| Dates | Misinterpreting year/time numbers as sizes | Only accept size tokens with explicit unit or clear size pattern. |
| Loops | Visiting same directory via different encodings | Use keyForVisited (protocol + host + pathname) consistently. |
Developer convenience UI supporting:
- Option tweaking (depth, mode, mime, concurrency, timeout).
- Live tree rendering with tooltips for full metadata.
- Stats panel (internal duration + wall clock). Make sure any structural API change (e.g., renamed fields) updates tooltip logic and stats presentation.
- No eval / dynamic script injection; only parses HTML directory listings.
- Iframe sandbox restricted to
allow-same-origin(no scripts; relies on server output). Do not relax sandbox attributes without explicit approval. - Respect
AbortSignalpromptly (fetch + HEAD + iframe timeout).
- Adaptive concurrency (tune based on response latency).
- Pluggable metadata enrichers (hashing, media dimension probes) with opt-in flags.
- Streaming / incremental callback during traversal.
- Snapshot regression tests for parse output (need normalization of timestamps first).
When the user asks for changes:
- Clarify intent only if truly ambiguous.
- List explicit requirements (checklist) before editing.
- Read relevant files (avoid guessing paths).
- Modify code + add / update tests in same PR scope.
- Run tests; never leave failing state unless user explicitly accepts WIP.
- Summarize deltas & map to checklist.
- Suggest small, safe adjacent improvements (optional), clearly separated.
Avoid:
- Broad refactors without a request.
- Introducing dependencies for trivial utilities.
- Changing public types without noting breaking impact.
Current version: 2.0.0-alpha.1 (ESM only). Breaking changes require bumping minor/major appropriately and updating README / this file.
Retrieve a directory (depth 0):
const res = await folderApiRequest('https://example.com/data/');
console.log(res.files.length);Include MIME with higher concurrency:
await folderApiRequest('https://example.com/data/', { includeMime: true, headConcurrency: 12 });Iframe-only (browser-only environments with blocked fetch):
await folderApiRequest('https://cross-origin.example.com/public/', { mode: 'iframe' });Abort after 5s:
const ac = new AbortController();
setTimeout(()=>ac.abort(), 5000);
await folderApiRequest(url, { signal: ac.signal });- Entry: Folder or file record.
- Root / Self / Parent / Child roles: Classification of folder entries relative to starting directory.
- Safety cap: 50k entry limit for runaway traversal.
Original author: Paul Ellis. MIT Licensed.
If you are an automated assistant reading this: follow the process, keep changes tight, justify deviations, and always back code edits with tests.