Skip to content

ucsc/markdown-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloudflare Markdown Proxy

A Hono-based Cloudflare Worker that converts web pages to clean Markdown or styled, print-friendly HTML.

Features

  • /text — Converts HTML pages to Markdown via Cloudflare's Workers AI toMarkdown API
  • /print — Cleans original HTML with HTMLRewriter, injects a print stylesheet, preserves semantic structure
  • Content filtering: scope Markdown conversion to a CSS selector and strip unwanted elements by ID or class
  • Domain whitelisting to restrict which sites can be proxied
  • CORS-enabled responses
  • Zero external runtime dependencies — Hono and all libraries are bundled at build time

Usage

GET /text?page=https://news.ucsc.edu/2026/03/some-article/

Returns the page content as Markdown (text/markdown).

GET /print?page=https://news.ucsc.edu/2026/03/some-article/

Returns cleaned HTML (text/html) with screen and print stylesheets. Original HTML structure is preserved — tables, figures, semantic elements all survive intact. Suitable for reading in a browser or printing to PDF.

Configuration

Environment variables are set in wrangler.toml under [vars]:

Variable Description Default
WHITELISTED_DOMAIN Only this hostname can be proxied news.ucsc.edu
CSS_SELECTOR Scope AI conversion to a content area, /text only (e.g. .entry-content) .entry-content
REMOVE_IDS Comma-separated element IDs to strip ""
REMOVE_CLASSES Comma-separated class names to strip (see wrangler.toml)

The AI binding is configured under [ai] in wrangler.toml.

Development

npm install        # Install dependencies
npm run dev        # Start local dev server
npm run test       # Run tests (46 tests across unit and integration suites)
npm run deploy     # Deploy to Cloudflare

Project Structure

src/
  index.js                  # Hono app — mounts routes, CORS
  routes/
    home.js                 # GET / — front page
    print.js                # GET /print — cleaned HTML with print stylesheet
    text.js                 # GET /text — Markdown via AI.toMarkdown
  middleware/
    validate-page.js        # Validates ?page param, fetches upstream HTML
  lib/
    tidy-html.js            # HTMLRewriter-based HTML cleaner
    preprocess-html.js      # HTML prep for Markdown conversion
    utils.js                # parseList(), stripFrontMatter()
  pretty.css                # Screen and print stylesheet
tests/
  unit/                     # Pure function tests
  integration/              # Route tests (workerd runtime)
  vitest.workspace.js       # Vitest workspace config
wrangler.toml               # Worker config, env vars, AI binding

License

MIT

About

Cloudflare Worker to fetch UCSC news articles to display as plain text or print

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors