punctilio (n.): precise observance of formalities.
Pretty good at making your text pretty. The most feature-complete and reliable English typography package. punctilio transforms plain ASCII into typographically correct Unicode, even across HTML element boundaries. Try it live at turntrout.com/punctilio.
Smart quotes · Em/en dashes · Ellipses · Math symbols · Legal symbols · Arrows · Primes · Fractions · Superscripts · Ligatures · Non-breaking spaces · HTML-aware · Markdown support · Bri’ish, German, and French localisation support
import { transform } from "punctilio";
transform(`"It's a beautiful thing, the destruction of words..." -- 1984`);
// → “It’s a beautiful thing, the destruction of words…”—1984punctilio accepts three input formats: text, Markdown, and HTML.
npm install punctilioAs far as I can tell, punctilio is the most reliable and feature-complete. I built punctilio for my website. I wrote1 and sharpened the core regexes sporadically over several months, exhaustively testing edge cases. Eventually, I decided to spin off the functionality into its own package.
I tested punctilio against smartypants 0.2.2, tipograph 0.7.4, smartquotes 2.3.2, typograf 7.6.0, and retext-smartypants 6.2.0.2 These other packages have spotty feature coverage and inconsistent impact on text. For example, smartypants mishandles quotes after em dashes (though quite hard to see in GitHub’s font) and lacks multiplication sign support.
| Input | smartypants |
punctilio |
|---|---|---|
| 5x5 | 5x5 (✗) | 5×5 (✓) |
My benchmark.mjs measures how well libraries handle a wide range of scenarios. The benchmark normalizes stylistic differences (e.g. non-breaking vs regular space, British vs American dash spacing) for fair comparison.
| Package | Passed (of 157) |
|---|---|
punctilio |
155 (99%) |
tipograph |
91 (58%) |
typograf |
74 (47%) |
smartquotes |
72 (46%) |
smartypants |
68 (43%) |
retext-smartypants |
65 (41%) |
| Feature | Example | punctilio |
smartypants |
tipograph |
smartquotes |
typograf |
|---|---|---|---|---|---|---|
| Smart quotes | "hello" → “hello” | ✓ | ✓ | ✓ | ✓ | ✓ |
| Leading apostrophe | 'Twas → ’Twas | ✓ | ✗ | ✗ | ◐ | ✗ |
| Em dash | -- → — | ✓ | ✓ | ✗ | ✗ | ✓ |
| En dash (ranges) | 1-5 → 1–5 | ✓ | ✗ | ✓ | ✗ | ✗ |
| Minus sign | -5 → −5 | ✓ | ✗ | ✓ | ✗ | ✗ |
| Ellipsis | ... → … | ✓ | ✓ | ✓ | ✗ | ✓ |
| Multiplication | 5x5 → 5×5 | ✓ | ✗ | ✗ | ✗ | ◐ |
| Math symbols | != → ≠ | ✓ | ✗ | ◐ | ✗ | ◐ |
| Legal symbols | (c) 2004 → © 2004 | ✓ | ✗ | ◐ | ✗ | ✓ |
| Arrows | -> → → | ✓ | ✗ | ◐ | ✗ | ◐ |
| Prime marks | 5'10" → 5′10″ | ✓ | ✗ | ✓ | ✓ | ✗ |
| Degrees | 20 C → 20 °C | ✓ | ✗ | ✗ | ✗ | ✓ |
| Fractions | 1/2 → ½ | ✓ | ✗ | ✗ | ✗ | ✓ |
| Superscripts | 2nd → 2ⁿᵈ | ✓ | ✗ | ✗ | ✗ | ✗ |
| English localization | American / British | ✓ | ✗ | ✗ | ✗ | ✗ |
| Ligatures | ?? → ⁇ | ✓ | ✗ | ✓ | ✗ | ✗ |
| Non-English quotes | „Hallo” | ✓ | ✗ | ✓ | ✗ | ◐ |
| Non-breaking spaces | Chapter 1 | ✓ | ✗ | ✗ | ✗ | ✓ |
| Pattern | Behavior | Notes |
|---|---|---|
'99 but 5' clearance |
5' not converted to 5′ |
Leading apostrophe is indistinguishable from an opening quote without semantic understanding |
Setting aside the benchmark, punctilio’s test suite includes 1,550+ tests at 100% branch coverage, including edge cases derived from competitor libraries (smartquotes, retext-smartypants, typograf) and the Standard Ebooks typography manual. I also verify that all transformations are stable when applied multiple times. All transforms run in linear time, with scaling tests that guard against quadratic RegEx backtracking.
Perhaps the most innovative feature of the library is that it properly handles DOMs! For Markdown, use the built-in remarkPunctilio or transformMarkdown plugins instead of converting to HTML and back.
Other typography libraries take one of two approaches, both with drawbacks.
- String-based libraries (like
smartypants) transform plain text but are unaware of HTML structure. If you concatenate text from<em>Wait</em>..., transform it intoWait…, and then try to convert back—you've lost track of where the</em>belongs. - AST-based libraries (like
rehype-retext) process each text node individually, preserving structure but losing cross-node information. A quote that opens inside<em>"Wait</em>and closes outside it..."spans two text nodes. Processed independently, the library can't tell whether the final"is opening or closing, because it never sees both at once.
punctilio introduces separation boundaries to get the best of both worlds:
- Flatten the parent container's contents to a string, delimiting element boundaries with a two-character private-use Unicode sentinel (
U+E000 U+E001) to avoid unintended matches. - Every regex allows (and preserves) these characters, treating them as boundaries of a “permeable membrane” through which contextual information flows. For example,
.U+E000..still becomes…U+E000. - Rehydrate the HTML AST. For all k, set element k’s text content to the segment starting at separator occurrence k.
import { transform, DEFAULT_SEPARATOR } from "punctilio";
transform(`"Wait${DEFAULT_SEPARATOR}"`);
// → `“Wait”${DEFAULT_SEPARATOR}`
// The separator doesn’t block the information that this should be an end-quote!For rehype / unified pipelines, use the built-in plugin which handles the separator logic automatically:
import rehypePunctilio from "punctilio/rehype";
unified()
.use(rehypeParse)
.use(rehypePunctilio)
.use(rehypeStringify)
.process('<p><em>"Wait</em>..." -- she said</p>');
// → <p><em>“Wait</em>…”—she said</p>
// The opening quote inside <em> and the closing quote outside it
// are both resolved correctly across the element boundary.For Markdown ASTs via remark, use remarkPunctilio which applies the same separator technique to preserve inline element boundaries, or use transformMarkdown for a simpler Markdown-to-Markdown pipeline.
For manual DOM walking or custom transforms, use transformElement from punctilio/rehype.
The rehype plugin accepts additional options. Elements matching any skipTags tag name or carrying any skipClasses class are left untransformed (values shown are the defaults for skipTags):
rehypePunctilio({
skipTags: ["code", "pre", "script", "style", "kbd", "var", "samp"],
skipClasses: ["no-formatting"],
});punctilio doesn’t enable all transformations by default. Fractions and degrees tend to match too aggressively (perfectly applying the degree transformation requires semantic meaning). Superscript letters and punctuation ligatures have spotty font support. Furthermore, ligatures = true can change the meaning of text by collapsing question and exclamation marks.
transform(text, {
punctuationStyle: "american" | "british" | "german" | "french" | "none", // default: 'american'
dashStyle: "american" | "british" | "none", // default: 'american'
symbols: true, // ellipsis, math, legal, arrows
includeArrows: true, // arrow transforms (-> → →); only applies when symbols is true
collapseSpaces: true, // normalize whitespace
fractions: false, // 1/2 → ½
degrees: false, // 20 C → 20 °C
superscript: false, // 1st → 1ˢᵗ
ligatures: false, // ??? → ⁇, ?! → ⁈, !? → ⁉, !!! → !
nbsp: true, // non-breaking spaces (after honorifics, between numbers and units, etc.)
checkIdempotency: true, // verify transform(transform(x)) === transform(x)
});- Fully general prime mark conversion (e.g.
5'10"→5′10″) requires semantic understanding to distinguish from closing quotes (e.g."Term 1"should produce closing quotes).punctiliocounts quotes to heuristically guess whether the matched number at the end of a quote (if not, it requires a prime mark). Other libraries liketipograph0.7.4 use simpler patterns that make more mistakes. - The
americanstyle follows the Chicago Manual of Style:- Periods and commas go inside quotation marks (“Hello,” she said.)
- Unspaced em-dashes between words (word—word)
- The
britishstyle follows Oxford style:- Periods and commas go outside quotation marks (“Hello”, she said.)
- Spaced en-dashes between words (word – word)
- The
germanstyle uses low-9 quotes: „double” (U+201E/U+201C) and ‚single' (U+201A/U+2018).- Punctuation outside quotes
- The
frenchstyle uses guillemets with non-breaking space padding: « Bonjour ».- Single quotes remain as curly quotes
- Punctuation outside quotes
- Setting either style to
noneskips the entire transform category:punctuationStyle: 'none'preserves straight quotes, apostrophes, and prime marks;dashStyle: 'none'preserves all hyphens, number ranges, date ranges, and minus signs. punctiliois idempotent by design:transform(transform(text))always equalstransform(text). This is verified automatically by default (checkIdempotency: true). SetcheckIdempotency: falseto disable the check.- Use
classifyApostrophes(text)to distinguish apostrophes from closing single quotes. It returns text with apostrophes as U+02BC (MODIFIER LETTER APOSTROPHE) and closing quotes as U+2019 (RIGHT SINGLE QUOTATION MARK). Per the Unicode Standard,transform()andniceQuotes()use U+2019 for both in their output.
Footnotes
-
While Claude is the number one contributor to this repository, that’s because Claude helped me port my existing code and added some features. The core regular expressions (e.g. dashes, quotes, multiplication signs) are human-written and were quite delicate. Those numerous commits don’t show in this repo’s history. ↩
-
The Python libraries I found were closely related to the JavaScript packages. I tested them and found similar scores, so I don’t include separate Python results. ↩