---
title: Subscripts For Citations
description: "A typographic proposal: replace cumbersome inline citation formats like ‘Foo <em>et al.</em> (2010)’ with subscripted dates/sources like ‘<span class=\"cite\"><span class=\"cite-author-plural\">Foo</span><span class=\"cite-joiner\">et al</span><span class=\"cite-date\">2020</span></span>’. Intuitive, easily implemented, consistent, compact, and can be used for evidentials in general."
thumbnail: /doc/design/typography/2020-02-06-gwern-gwernnet-subscripts-examples.png
thumbnail-text: "Screenshot of 3 proposed typographic conventions for simpler, easier to read, inline citations."
thumbnail-css: "outline invert-not"
created: 2020-01-08
modified: 2024-11-01
status: finished
confidence: certain
importance: 2
css-extension: dropcaps-kanzlei
...

<div class="abstract">
> I propose reviving an old [General Semantics](https://en.wikipedia.org/wiki/General_semantics) notation: borrow from scientific writing and use subscripts like ‘Gwern~2020~’ for denoting sources (like citation, timing, or medium).
>
> Using subscript indices is flexible, compact, universally technically supported, and intuitive. This convention can go beyond formal academic citation and be extended further to 'evidentials' in general, indicating the source & date of statements.
>
> While (currently) unusual, subscripting might be a useful trick for clearer writing, compared to omitting such information or using standard cumbersome circumlocutions.
</div>

I [don't believe the Sapir-Whorf hypothesis](/language "'On the Existence of Powerful Natural Languages', Gwern 2016") so beloved of 20^th^ century thinkers & SF, or that we can make ourselves much more rational by One Weird Linguistic Trick. There is no far transfer, and the benefits of improved vocabulary/notation are inherently domain-specific.
You think the same thoughts in English as you do in Chinese.
But, [like good typography](/design "'Design Of This Website', Gwern 2010"), good linguistic conventions may be worth all told, say, even as much as 5% of whatever one values---and that's not nothing.
[In 'rectifying names', be realistic: aim low.]{.marginnote} (It's definitely worthwhile to do things like spellcheck your writings, after all, even though no amount of spellcheck can rescue a bad idea.)

# Good Writing Conventions

[Checklist approach.]{.marginnote} I already use a few unusual conventions, like attempting to use the [Kesselman Estimative words](/doc/statistics/bayes/2008-kesselman.pdf "'Verbal probability expressions in National Intelligence Estimates: a comprehensive analysis of trends from the fifties through post 9/11', Kesselman 2008") to be more systematic about the strength of my claims or always linking fulltext in citations (and improving using [link annotations](/static/build/LinkMetadata.hs) which do not just link fulltext but present the abstract/excerpts/summary as well), and I employ a few more domain-specific tricks like avoiding use of the word 'significance' in statistics contexts, [automatically inflation-adjusting](/static/build/Inflation.hs "'InflationAdjuster', Gwern 2019"){#inflation-adjusting} currencies (to avoid the [trivial inconvenience](https://www.lesswrong.com/posts/reitXJgJXFzKpdKyd/beware-trivial-inconveniences "'Beware Trivial Inconveniences', Alexander 2009") of doing it by hand & so not doing it at all), or using research-specific [checklists](/about#writing-checklist).
Without straying into [conlang](!W "Constructed language") territory or attempting to do everything in formal logic or serious eccentricity, what else could be done?

# Subscripts For Metadata

One idea for more precise English writing which I think could be usefully revived is broader use of *subscripts*.
We could encode many kinds of useful metadata: citations, dates, sources, and evidentials are ones we could use in scholarly contexts.

[Distinguishing things named the same.]{.marginnote} The subscripting idea is derived from [General Semantics](!W "General semantics") (GS)^[I am considerably less impressed by other GS linguistic suggestions like [E-Prime](!W "E-Prime"), and the related proposal [E-positive](https://www.lesswrong.com/posts/W8CxEFCnYNdrHkuoB/abs-e-or-speak-only-in-the-positive?commentId=Ga4DbTCdmYh5jRM4B "‘Abs-E (or, speak only in the positive) § <code>text2epositive.py</code> experiment’, Gwern 2024") hasn't worked out for me either; but subscripting seems like it may be worth rescuing.], which itself borrows it from standard science or math typography, like physics/statistics/mathematics/chemistry/programming: a [superscript/subscript](!W "Subscript and superscript") is an index distinguishing multiple versions of something, such as quantity, location, or time, eg. _x~t~_ vs _x~t+1~_.
They're typically not seen outside STEM contexts, aside from a few obscure uses like [ruby](!W "Ruby character")/[_furigana_](!W "Furigana") [glosses](!W "Interlinear gloss").

## Citations

However, there are many places we could use subscripting to be clearer & more compact about which version we are referring to, using them as [evidentials](!W "Evidentiality").
And because it's clearer & more compact, we can afford to use it more places without it wasting space/effort/patience.
Citations are a good use case. Why write "Friedenbach (2​012)" if we can write "Friedenbach~2012~"?
The latter is shorter, easier to read, less ambiguous (especially if we use it in parentheticals, see Friedenbach (2​012)), and doesn't come in a dozen different slightly-varying house styles.
Why not use it for other things, like software versions or [probabilities](https://www.lesswrong.com/posts/Tmz6ucxDFsdod2QLd/subscripts-for-probabilities) too?

## Evidentials

But why restrict subscripting to formal publications or written documents?
Apply it to any quote, statement, or opinion where indexing variables like time might be relevant.
Refusing to allow easy references to anything not a book is but codex chauvinism.

[One convention, arbitrary metadata.]{.marginnote} It is a unified notation: regardless of whether something was thought, spoken, or written by me in 2020, it gets the same notation---"Gwern~2020~". The evidential can be expanded as necessary: if it's a paper or essay, the '2020' can be a hyperlink, or if it's a 'personal communication', then there can be a bibliography entry stating as much, or if it's the author about their own beliefs/actions/statements in 2020, further information neither necessary nor usually possible (and it avoids awkward custom phraseology like "As I thought back in 2020 or so...").
In contrast, normal citation style cumbersomely uses a different format for each, or provides no guidance: how do you gracefully cite a paper written one year but whose author changed their mind 5 years later based on new results and who told you so 10 years after that?
("Dr. Bach originally maintained A (Bach _et al._ (2000)[^hyperlinks]) but gradually modified his position until 2005 when he recalls writing in his diary he had lost all confidence in A (personal communication, according to Frieden 2015)...")

[^hyperlinks]: Probably the worst notation is the form which takes a braces-and-suspenders approach and puts in as much punctuation as possible: "Foo _et al._ (2020)", adding 3 visible characters, 5 additional characters & 1 additional typeface.

    Amazingly, people manage to make this even worse when it's a hyperlink: some people eschew underlines for hyperlinks and instead color it, which is fine, but then they decide to *not* color the *parentheses*! So it looks like "**Foo _et al._** (**2020**)", flip-flopping between color. This manages to simultaneously: (1) achieve nothing; (2) look extremely confusing should a normal parenthetical comment ever contain a citation; (3) confuse the reader by implying that the year is a second, separate, hyperlink; and (4) create a crazy zebra-like triple flip-flop of coloring (possibly a quadruple flip-flop depending on how punctuation is handled---how would one color a citation like "Foo _et al._ (2020)​'s work"?), becoming epilepsy-inducing at screen scale given how many citations a paper might stuff into a single page.

## Multiple Authors

![Classicists & normal people's reactions to pseudo-Latin (by [Rachel Kowert](https://x.com/DrKowert/status/1395483179675537419 "Bahahaha. Academic humor. [20 May 2021]")).](/doc/design/typography/subscript/2021-05-20-rachelkowert-standuptrexmeme-etal.jpg){.float-right}

['et al' = '...']{.marginnote} Single or double authorship is straightforward, just 'Friedenbach~2012~' or 'Frieden & Bach~2012~'
But how should multi-author citations, currently denoted by 'et al' (or 'et al.' or even '_et al._'), be handled?
This is important because the frequency of multi-author papers has risen dramatically, and they are now the norm in many fields; notations should be optimized for the most common case.

But the existing 'et al' notation is ridiculous: not only does it take up 6 letters and is natural language which should be a symbol, it's ambiguous & hard to machine-parse^[Mostly because of all the variations, as well as inconsistency in usage and reuse of abbreviated forms later in a paper (itself a concession that the notation is bad). The term 'et al' itself is relatively unambiguous, but then the bulkiness has further downstream effects: for example, it is customary in some styles to drop the (bracketed?) years to shorthand refer to 'Foo et al.', which means that it is now harder to scan sentences which are broken up by periods (and in at least one citation style, the spaces are replaced *by* periods), parsing is context-sensitive because one must maintain state to figure out what the original full citation was, and papers being papers this will often be ambiguous/erroneous because there will be multiple such citations (either because the authors didn't realize while editing that it had become ambiguous or didn't care because "the context is clear").], and it's not English or even *Latin*.[^Latin-et-al]
Writing 'Foo et al~2010~' or 'Foo~et\ al\ 2010~' doesn't look nice, and it makes the subscripting far less compact.
My current suggestion is to do the expected thing: when you elide something, how do you write that?
Why, with an [ellipsis](!W) '...', of course!
So one would write 'Foo~...2010~' or possibly 'Foo...~2010~'.
(I think the former is probably better, since there is less risk of confusion over what is being elided.)

[^Latin-et-al]: It's not really Latin either, because it's a non-Latin abbreviation for the actual Latin phrase, [_et alii_](https://en.wiktionary.org/wiki/et_alii "'Wiktionary: <em>et alii</em>', Wiktionary 2018") (to save you one character and also avoid the need to correctly *conjugate* the Latin---the absurdity is fractal, is what I'm saying). One wants to ask people who use 'et​. al​.' what exactly they think the 'et​.' is an abbreviation *of*...

    The Greeks & Romans didn't use [full stops](!W) for [abbreviation](!W), as that is a relatively recent English convention superseding a wide variety of [scribal abbreviations](!W); but as pseudo-Latin, that means that many will italicize it, as foreign words/phrases usually are---and now that is even more work, even more visual clutter, and introduces ambiguity with other uses of italics like titles. A terrible notation, and what could be more pretentious?

### Unicode Ellipsis

[Funky alternatives.]{.marginnote} Horizontal ellipsis aren't the only kind: there are several others in [Unicode](https://en.wikipedia.org/wiki/Unicode), including midline '⋯' and vertical '⋮' and even the "down right diagonal ellipsis" '⋱', so one could do 'Foo~⋯2010~' or 'Foo~⋮2010~' or 'Foo~⋱2010~'.
(I'm not sure about support for these particular Unicode entities, but they show up without issue in my Firefox, [Emacs](https://en.wikipedia.org/wiki/Emacs), and urxvt, so they shouldn't be *too* rare.)
The vertical ellipsis is nice but unfortunately it's hard to see the first/top dot because it almost overlaps with the final letter, making it look like a weird colon.
The midline ellipsis is middling, and doesn't really have any virtue.[^et-al]
But I particularly like the last one, down-right-diagonal ellipsis, because it works visually so well---it leads the eye down and to the right and is clear about the omission being an entire phrase, so to speak.

[^et-al]: At least for *subscripts*. If we were only replacing the 'et al' notation, MIDLINE HORIZONTAL ELLIPSIS (⋯) is great: it's intuitive, looks nice without any CSS styling, and takes a third the pixels while also looking visually much simpler:

    ![A screenshot demo of the standard "et al" vs "⋯"  in denoting citations by >2 authors.](/doc/design/typography/subscript/2021-03-25-gwern-etal-midlinehorizontalellipsis-demo.png)

## Generalized Evidentials

Evidentials using authors or years are short enough that they can be laid out as simple subscripts. There is no new typographic issue with that.
But as [discussed above](#evidentials), there is no need to limit it to formal publications; knowledge can be derived from many sources, and even in the most formal academic writing, there are the occasional pseudo-citations like "Foo 2010 (personal communication)".
A complete evidential---like "Foo told me so on the second day of our Black Forest camping trip in 2010"---would be awkward to read if naively subscripted.

In a noninteractive format, such evidentials probably must be relegated to footnotes/endnotes/[sidenotes](/sidenote "'Sidenotes In Web Design', Gwern 2020"); in an interactive format like HTML, we can do better.

For HTML, CSS supports setting maximum widths & truncating with ellipsis overflows, while expanding width on hover, so one can do something *roughly* like this:

~~~{.CSS}
.evidential { display: inline-block; white-space: nowrap; max-width: 10ch; overflow: hidden; text-overflow: ellipsis; }
.evidential:hover, .evidential:focus { max-width: min-content; position: relative; bottom: -0.5em; font-size: 80%; }
~~~

![CSS prototype of expanding subscripts for long evidentials.](/doc/design/typography/subscript/2020-11-29-gwern-subscript-cssmockup.png)

Then the first 10 characters will be displayed, truncated by '...', and if the reader hovers over it with their mouse, it expands to reveal the arbitrarily-long evidential.
The CSS seems tricky to get right, so it might be easier to resort to JavaScript-based popups like my existing link annotations/definitions using [`popups.js`](/static/js/popups.js "'<code>popups.js</code>', Achmiz 2019").

## Citation Evidentials

[Citation contexts.]{.marginnote} One of the more complained about weaknesses of citations as untyped directional links is that they are often interpreted as an endorsement, even though references are used in innumerable ways.
Parsing out the meaning of a citation is one of the biggest challenges of any [citation analysis](!W), especially for [scientometrics](!W): is a citation in a new paper there because the old work is bad and being criticized & debunked, or because it is the key prior art for the new paper, or it is a justification for a claim in the new paper (and if so, is this claim key or minor? how certain is the citer that it is correct?^[We might even like a notation to indicate "author has actually read the citation fulltext and is [not simply mindlessly copying it & propagating any errors in its use](/leprechaun#citogenesis-how-often-do-researchers-not-read-the-papers-they-cite "‘Leprechaun Hunting & Citogenesis § Citogenesis: How Often Do Researchers Not Read The Papers They Cite?’, Gwern 2014")", but realistically, few would be willing to do this accurately.]), or it merely cited as a placeholder citation for a well-known fact or as a general background reference?

Subscripted evidentials allow a relatively lightweight way of encoding such properties: simply append them in the subscript for those who care.[^disclosure-citations]

[^disclosure-citations]: In digital media, one approach to handling longer appended metadata would be the use of 'collapse' or ['disclosure'](!W "Disclosure widget") features, which expand on demand. (Analogous to [code folding](!W), and with basic HTML5 support in the [`<details>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details) element.) Block-level collapses are used extensively on Gwern.net, but inline collapsing is also supported: <span class="collapse"><span class="abstract-collapse">eg.</span> this is an inline collapsed region which you can click on to expand and see the entire sentence, and then click on again to re-collapse</span>.

    A real-world example of evidential icons is the Russian investigative website [Proekt's](!W "Proekt") use of a *person-silhouette* icon to mark the presence of an inline natural language description of the human source for a claim vs a *document* icon to denote claim sourced from written material (usually leaked, given extensive hacking/leaking in the contemporary Russian context), rather than use footnotes or conventional citations (which would not convey such a distinction between [HUMINT](!W) and [OSINT](!W), one might say); [examples](https://www.proekt.media/en/portrait-en/evgeny-prigozhin/ "'Chef and Chief: Portrait of Yevgeny Prigozhin, the personal sadist of the Russian president', Andrei Zakharov, Katya Arenina, Ekaterina Reznikova, Mikhail Rubin, Roman Badanin, 2023-07-12"):

    ![Screenshot of Proekt use of disclosures for human-sourced citations: top, default collapsed; bottom, expanded.](/doc/design/typography/subscript/2023-zakharov-screenshotofpersonsilhouetteiconiconcollapsebeforeafter.png){.invert}

    ![Screenshot for document-sourced citations, denoted by a page-document icon instead.](/doc/design/typography/subscript/2023-zakharov-screenshotofpagedocumenticoniconcollapsebeforeafter.png){.invert}

[Chess commentary-style.]{.marginnote} For a concrete example, one could draw inspiration from [chess annotation symbols](!W), which are compact, intuitive, and writers/readers may already be a little familiar with them.
In chess annotations, '!' is good while '?' is bad, and they can be intensified by duplication ('!​!', '?​?') or weakened by combination ([interrobang](!W)-like ['!​?'](https://en.wikipedia.org/wiki/Chess_annotation_symbols#!?_(Interesting_move)) and ['?​!'](https://en.wikipedia.org/wiki/Chess_annotation_symbols#?!_(Dubious_move_/_Inaccuracy))).
Then one could use them easily:

> We build on <span class="cite"><span class="cite-author-plural">Foo</span><span class="cite-joiner">et al</span><span class="cite-date">2020!!</span></span>, borrowing a trick from <span class="cite"><span class="cite-author-plural">Bar</span><span class="cite-date">2021!</span></span>, on the task of optimally petting domesticated cats (<span class="cite"><span class="cite-author">Bradshaw</span><span class="cite-date">1994</span></span>), which fails to replicate <span class="cite"><span class="cite-author-plural">Baz</span><span class="cite-joiner">et al</span><span class="cite-date">2019?</span></span> on the value of backwards-stroking (as well as <span class="cite"><span class="cite-author-plural">Quux</span><span class="cite-joiner">et al</span><span class="cite-date">2018??</span></span>, recently shown to be fraudulent).

This clearly divvies up importance between the key work, less important but relevant works, tangential references, and highly critical citations---if a paper is a narrative, then evidentials let one see the _dramatis personae_ at a glance to know who are the allies, who are the enemies, and who are bystanders.

Or one could prefer to instead highlight importance: perhaps 'key papers' get an [asterisk](!W) to indicate they are the most important ones (using one of the [many other stars](!W "Star (glyph)") to avoid confusion with footnotes).
Or agreement/disagreement could be indicated, such as by ['✓'](!W "Check mark#Unicode") vs ['❎'](https://en.wikipedia.org/wiki/X_mark#Unicode) (NEGATIVE SQUARED CROSS MARK).

[Semi-automatable.]{.marginnote} There are many possible things one could reasonably encode, but the main problem is that there is no simple automatic way to do so (after all, this would not be necessary if there was) and authors typically do not want to do this---such explicit epistemology is exhausting!
The latest scientometric efforts like [Semantic Scholar](https://arxiv.org/abs/2301.10140 "‘The Semantic Scholar Open Data Platform’, Kinney et al 2023") use large language neural nets like GPT-4 to attempt to infer the semantics of a citation; these might make it feasible to do semi-automated edits of finished documents, where the document can be parsed, the NN can be fed all of the citations' fulltexts, read the document text, and can suggest the correct semantic markup for each citation, which the human author validates.

[*Too* automatable?]{.marginnote} But this then raises another issue: if such systems can already do it at sufficiently high accuracy that the human author only has to spend a minute double-checking, then is it worthwhile to encode them at all?
It may be better for the author to not bother, and let the machines figure it out as necessary; then the readers can just rely on a tool, like a website or web browser or PDF reader plugin, to preprocess any paper they are reading to add the relevant semantic notations. (See for example [Elicit](https://elicit.com/).)

## Technical Support

[Subscripts: already in a theater near you.]{.marginnote} Because it's already used so much in technical writing, subscripting is reasonably familiar to anyone who took highschool chemistry & can be quickly figured out from context for those who've forgotten, and it's well-supported by fonts and markup languages and [word processors](https://en.wikipedia.org/wiki/Word_processor_program): it's written `x~t~` in [Pandoc](!W) [Markdown](!W) & some Markdown extensions like [`markdown-it`](https://github.com/markdown-it/markdown-it-sub) (but not Reddit), `x<sub>t</sub>` in HTML, `x<subscript>t</subscript>` in [DocBook](!W), `x_t` in [<span class="logotype-tex">T<sub>e</sub>X</span>](!W "TeX")/[<span class="logotype-latex">L<span class="logotype-latex-a">a</span>T<span class="logotype-latex-e">e</span>X</span>](!W "LaTeX"), `x\ :sub: \t` in [reStructuredText](!W); and it has keybindings `C-=` in Microsoft Office, `C-B` in [LibreOffice](!W), `C-,` in Google Docs etc.
So subscripting can be used almost everywhere immediately.

Note that we *do* need to use existing ways of subscripting, and we can't fake it using the [Unicode super/subscript characters](https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block): they look weird, break a lot of tools (eg. you can't reliably search for them in all browsers), and omit most of the alphabet so you couldn't even write 'Foo 2020b' (it skips from 'a' to 'e', although including LATIN SUBSCRIPT SMALL LETTER SCHWA 'ₔ' for some reason).

## Example Use

Example: here are 3 versions of a text; one stripped of citations and evidentials, one with them written out in long form, and one with subscripts:

> 1. I went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey; he quotes the local Hakim Abdul saying that the cats even look different from cats elsewhere (but after further thought, I'm not sure I agree with that there). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, she claimed the traffic was terribly oppressive and *ruined* the trip. (Oh really?)
> 2. In 2010, I went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey's 2000 _Street Cats of Istanbul_; he quotes the local Hakim Abdul in 1970 saying that the cats even look different from cats elsewhere (but after further thought as I write this now in 2020, I'm not sure I agree with Bey (2000)). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, on Facebook she claimed the traffic was terribly oppressive and *ruined* the trip. (Oh really?)
> 3. I~2010~ went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey~2000~ (_Street Cats of Istanbul_); he quotes the local Hakim Abdul~1970~ saying that the cats even look different from cats elsewhere (but after further thought, I'm not sure I~2020~ agree with Bey~2000~). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, she claimed~FB~ the traffic was terribly oppressive and *ruined* the trip. (Oh really?)

In the first version, suppressing the metadata leads to a confusing passage. What did Bey write? We don't learn when Abdul expressed his opinion---which is important because Istanbul, as a large fast-growing metropolis, may have changed greatly over the 40 years from quote to visit. When did the speaker become skeptical of the claim Istanbul cats both act & look different? What might explain the wife's inconsistency, and which version should we put more weight on?

The second version answers all these questions, but at the cost of prolixity, jamming in comma phrases to specify date or source.
Few people would want to either write or read such a passage, and the fussiness has a distinctly fussy pseudo-academic air.
Unsurprisingly, few people will bother with this---any more than they will bother providing [inflation-adjusted](https://www.bls.gov/data/inflation_calculator.htm) dollar amounts of something from a decade ago (even though that's misleading by a good 15% or so, and compounding), or they'd want to check a paywalled paper, or redo calculations in Roman numerals.

The third version may look a little alien because of the subscripts, but it provides all the information of the second version plus a little more (by making explicit the implicit '2020'), in considerably less space (as we can delete the circumlocutions in favor of a single consistent subscript), and reads more pleasantly (the metadata is literally out of the way until we decide we need it).

## Possible Alternative Notation

I considered 3 alternatives:

#. **Superscripts**: already overloaded as footnotes & powers
#. **Bang notation**: another possible notation for disambiguating, is the "X!Y" notation (apparently derived from [UUCP bang notation](!W "UUCP#Bang path")), which is associated with online fandoms & fanfiction, and gives notation like "2020!gwern".

    This notation puts the metadata first, which is confusing yodaspeak (what does the '2020' refer to? It dangles until you read on); it makes it inline & full-sized, and then tacks on an additional character just to take up even more space; it's confusing and unusual to anyone who isn't familiar with it from online fanfiction already, and to those who *are* familiar, it is low-status and has bad connotations.
#. <span id="ruby"></span> **Ruby annotations**: as mentioned above, there is [standardized HTML support](https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-ruby-element "'HTML Living Standard: Text-level semantics: 4.5.10: The <code>ruby</code> element', WhatWG 2020") (but with [spotty browser support](https://caniuse.com/?search=ruby) & no support at all in most other formats) for 'ruby' annotations which are similar to superscripts and intended for interlinear glosses.

    Unfortunately, in a horizontal language like English (as opposed to Chinese/Japanese), they require extremely high [line-heights](!W "Leading") to be at all legible. Example:

    ![Example of HTML 'ruby' interlinear gloss (eg. `<ruby>$200<rt>$375 in 2020</ruby>`).](/doc/design/typography/subscript/2020-01-07-gwern-gwernnet-inflation-ruby-mockup.png){.invert}
#. **New symbols**: no font, editor, or word processor support kills any new symbol proposal, and can be rejected out of hand.

## Disadvantages

[Deal-breaker: low status?]{.marginnote} The major downside, of course, is that subscripting is novel and weird.
It at least is not associated with anything bad (such as fanfics), and is associated with science & technology, but I'm sure it will deter readers anyway.
Does it do enough good to be worth using despite the considerable hit to [weirdness points](https://www.lesswrong.com/posts/5GnwjxbL3SQ7gjRn6/open-thread-july-16-22-2013?commentId=gFRKkcyXDTiKkug56)? That I don't know.

## Date Ranges

<div class="epigraph">
> The next day the little prince came back.
>
> "It would have been better to come back at the same hour," said the fox. "If, for example, you come at 4 o'clock in the afternoon, then at 3 o'clock I shall begin to be happy. I shall feel happier and happier as the hour advances. At 4 o'clock, I shall already be worrying and jumping about. I shall show you how happy I am! But if you come at just any time, I shall never know at what hour my heart is to be ready to greet you... One must observe the proper rituals..."
>
> "What is a ritual?" asked the little prince.
>
> "Those also are actions too often neglected," said the fox. "They are what make one day different from other days, one hour from other hours. There is a ritual, for example, among my hunters. Every Thursday they dance with the village girls. So Thursday is a wonderful day for me! I can take a walk as far as the vineyards. But if the hunters danced at just any time, every day would be like every other day, and I should never have any vacation at all."
>
> So the little prince tamed the fox.
>
> [_The Little Prince_](!W), [Antoine de Saint-Exupéry](!W)
</div>

<div class="epigraph">
> Listen: Billy Pilgrim has come unstuck in time.
</div>

[Date slippage.]{.marginnote} With inflation, what was once an absolute unit slowly becomes a relative unit, with a hidden multiplier, leading to ever greater mental slippage: it is misleading by default, and one must effortfully correct, which one will often not do.
A similar issue can happens with dates and durations, in both directions: a year like "1972" is absolute but it may be relevant for its distance from the present moment, which keeps changing.

[Trapped in eternal present.]{.marginnote} People often note that they feel like they are simply in a "Big Now" where time [infinitely scrolls](https://en.wikipedia.org/wiki/Infinite_scrolling), and don't feel like time or history is actually passing, that it is the present year; their mental default is somehow still frozen in 2000, and they have to remind themselves that it is in fact the (much later) present year.

Somehow, to many, the year 2001 is still a sci-fi future from a Kubrick movie, and not an increasingly-distant past a generation or more ago.
One might write in a 2022 blog post "what happened 50 years ago \[in 1972\]?", which is entirely comprehensible... as long as one didn't write it in early January 2022 when people are still adjusting to the new calendar year and wonder what happened in 1971, and someone reading it a decade later might be severely confused about the concern over the year 1982.
This goes in the other direction: '1972' is 50 years ago (as I write this), but does it *feel* an entire half-century in the past?
In 2022, when the movie sequel [_Top Gun Maverick_](!W) aired starring, of course, [Tom Cruise](!W), more than one person was startled to actually do the arithmetic for the 1986 release date of [_Top Gun_](!W) and realize it was ~36 years ago.
Or when a [live-action _Scooby-Doo_ reboot](!W "Velma (TV series)") sans [Scooby-Doo](!W "Scooby-Doo (character)") was announced, I was bemused to note that the franchise was now older than almost everyone debating the omission, as [it started](!W "Scooby-Doo, Where Are You!") in 1969 or >53 years ago---which means that _Top Gun_ & _Scooby-Doo_ are closer in time than you to _Top Gun_. (Does it *feel* that way? If yes, then ["Welcome to the Internet"!](https://www.youtube.com/watch?v=k1BneeJTDcU "‘Welcome to the Internet <span class=editorial>[<em>Inside</em>]</span>’, Burnham 2021"))
Such ["time ghost"](https://xkcd.com/1393/) comparisons can doubtless [be generated indefinitely](/idea#timeghost) as our intuitive grasp of timelines weakens.

[Context collapse.]{.marginnote} Why are past dates getting squeezed like this?
Many eras seem quite distinct: the [1920s](https://scholars-stage.org/passages-i-highlighted-in-my-copy-of-only-yesterday-an-informal-history-of-the-1920s/ "‘Passages I Highlighted in My Copy of <em>Only Yesterday: An Informal History of the 1920s</em>’, Greer 2019") are different from the 1930s, which are totally separate from the 1940s in my mind, and likewise the 1950s and 1960s slot into place, more or less; but come the 1970s, and everything gradually blends together.
I have a good feel for *technology & science* over that time, and can with trouble reconstruct geopolitics, but in other areas like general culture, I feel you could fool me with sequences decades out of order.
Did [Paul McCartney](!W) have his greatest album hits in the 1960s, or the 1990s? I'd believe you if you told me either.
Were [the Monkees](!W) the 1950s, or the 1970s, and were they a band, TV show, or both, and did they end in the 1970s or 2020s during a reunion tour?
It's hard to maintain a chronology of movies or music albums when one watches them indiscriminately, pulling from a century ago as easily as tracks released today, and how can we take the passage of time too seriously if Paul McCartney is *still*, in 2022, [performing in concerts](https://en.wikipedia.org/wiki/Paul_McCartney#2020%E2%80%93present) with guests such as [Bruce Springsteen](!W), while elsewhere [Bob Dylan](!W) continues releasing albums while touring & writing books?

[Recycling without updating.]{.marginnote} Aside from that sort of "context collapse" in time, [fashion cycles](/note/fashion) recycle the same things in minor variations, like environmentalism or political-correctness, which produces awkward repetitions: a feminist comparison to the misogyny of college education "50 years ago" is rhetorically effective in the 1990s, where it harks back to the 1940s (pre-[second-wave feminism](!W)) and an America with few female graduate students or tenured professors, but it is drastically less so when uttered in 2022 and referring to 1972, well into the tidal wave of female entry into higher ed which saw them reach a majority of undergrads just a few years later (a milestone they would never relinquish, and has since escalated to >60% of undergrads).

[To err is human.]{.marginnote} So what can be done about this?
If someone is seriously committed to making a case for 1972 being tantamount to the medieval dark ages, then fine; but perhaps they just hadn't quite thought about the implications of quoting a spicy passage from something written in the 1990s---after all, the 1990s, that was not *that* long ago, was it? It still seems so present & real to me...
(The trend in the 2020s of rebooting every 1990s media franchise, to appeal to aging Millennials, scarcely helps.)
One can try to be mindful and calculate deltas as a routine matter of reading, similar to checking whether a number of meaningful by shifting it an order of magnitude up & down, and simply remember the decade-level differences: 2020s to 1990s is 3 decades, to 1980s is 4 decades etc.
But this is something so mechanical & straightforward, and dates so pervasive, that this is a poor solution: we will still subtly slip into a vague timeless "Now".

[Denote years-since?]{.marginnote} It seems like something that a better notation could help with.
Scientists, for example, do not trust to absolute dates and expect readers to juggle piles of dates millennia apart, but explicitly use durations like ["100kya"](https://en.wikipedia.org/wiki/Year#Abbreviations_for_%22years_ago%22) or "1mya".
This is not too exotic, and I think most readers would understand what "10ya" means, particularly in a context.
So, this makes a natural fit with subscript citations: as we are *already* putting years in subscripts, why not put in date durations like "10ya"?

> Tom Cruise has returned for _Top Gun Maverick_ (2022), the sequel to _Top Gun_ (1986~36ya~).

[Regexp-able.]{.marginnote} This is compact, intuitive, and can be automated---particularly if one uses decimal separators for numbers but not years, so most dates can be matched as 4-digit whole numbers starting 1--2 and surrounded by non-digits (something like `[^0-9][12][0-9][0-9][0-9][^0-9]`).

[Hide duplicates & in cites.]{.marginnote} For particularly date-heavy writings where the same dates come up repeatedly, this might result in too much subscripting: unlike citations, which must be displayed at each claim to serve their role, date-range subscripting is a convenience, a reminder of date distances, so after the first instance of a date, the rest can be suppressed.
(This is also true of inflation-adjusting currency, but amounts tend to be fairly unique so it is less of a problem.)
It would not apply to subscripted citations themselves, as combinations like 'Foo<sub>2020<sub>2ya</sub></sub>' or 'Foo<sub>2020 (2ya)</sub>' are probably unworkable, and in any case, with citations, the year itself tends not to be too important given the realities of academic publishing, where papers can take decades to be formally published, or the work can take more decades to be applied or built on: if the year is important, it can be discussed normally. ("And then in 2020~2ya~, Foo~2020~ revolutionized the field by reticulating the splines with unprecedented efficiency...")

And for date ranges, particularly EN DASH-separated date ranges of years, one could subscript the EN DASH itself with the duration of the beginning/ending, in addition to the duration from the ending to the present.
Since the duration of the date range itself will usually be small, providing the first one is not so necessary and can be suppressed.
Which might look a little like this:

> Negotiations for the ¥ revaluation lasted 1987<span class="subsup"><sup>–</sup><sub>7</sub></span>1994<sub>30ya</sub>.

# External Links

- Discussion: [LW](https://www.lesswrong.com/posts/NkjPp86uuyunxDoB8/subscripting-typographic-convention-for-citations-dates "'Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal', Gwern 2020")

# Appendix
## Inflation {.collapse}

<div class="abstract">
> My general approach of [automatically inflation-adjusting](/static/build/Inflation.hs "‘InflationAdjuster’, Gwern 2019"){#inflation-adjuster-2} dollar/₿ amounts to the current nominal (ie. real) amount, rather than default to presenting ever-more-misleading nominal historical amounts, could be applied to many other financial assets.
>
> As far as almost all financial software is concerned, 'a dollar is a dollar', and as long as credits & debits sum to zero, everything is fine; inflation is, somewhat bizarrely, almost completely ignored, and left to readers or analysts to haphazardly handle it as they may (or may not, as is usually the case).
>
> Here are 4 ideas of 'inflation adjustments' to make commonly-cited economic numbers more meaningful to our decision-making:
>
> 1. rewrite individual stock returns to be always against a standard stock index, rather than against themselves in nominal terms
> 2. annotate stock prices on a specific date with their #1 statistic from that date to 'today'
> 3. rewrite currencies to year-currencies, with 'real' currency amounts simply being reported in the most recent year-currency
> 4. rewrite year-currencies into the NPV of the next-best alternative investment, which represents the compounding opportunity cost
</div>

### Benchmarking
#### Index

Stock prices would benefit from being reported in meaningful terms like net real return compared to an index like the [S&P 500](!W) (reinvesting), as opposed to being reported in purely nominal terms: how often do we really care about the absolute return or %, compared to the return over alternatives like the default baseline of simple stock indexing?

Typically, the question is not 'what is the return from investing in stock _X_ 10 years ago?' but 'what is its return compared to simply leaving my money in my standard stock index?'.

If _X_ returns 70% but the S&P 500 returned 200%, then including any numbers like '70%' is actively misleading to the reader: it should actually be '−130%' or something, to incorporate the enormous [opportunity cost](!W) of making such a lousy investment like _X_.
(See [the performance-pay Nobel](https://marginalrevolution.com/marginalrevolution/2016/10/performance-pay-nobel.html) for an example of why CEO pay should be adjusted by how much their companies *beat their competitors* instead of simply the stock price increasing over time.)

The original nominal return would fit nicely in a subscript appended to the more important benchmark-adjusted return:

> I made <span title="Return-on-investment of X compared to S&P 500 benchmark over the previous 5 years: −80% compared to an investment in the index.">−80%</span><sub><span title="Return-on-investment of X over 5 years: 20% nominal.">20</span></sub> by investing in _X_ 5 years ago! Subscribe to my newsletter for more investing tips.

A logical extension of index benchmarking is to include some form of risk-adjustment (a [Sharpe ratio](!W)?), to avoid cherrypicking or making lucky investments look the best, perhaps by taking into account a stock's [beta](!W "Beta (finance)").
Unfortunately, estimating true beta is difficult, and beta is constantly changing, and so there are many ways to calculate betas; it is not clear to me which one is best, nor that presenting a beta-adjusted benchmark would be intuitive or aid understanding.
So risk-adjustment might be an adjustment too far.

#### Event

Another example of the silliness of not thinking about the *use* of numbers: ever notice those stock tickers in financial websites like WSJ articles, where every mention of a company is followed by today's stock return ("Amazon (AMZN: 2.5%) announced Prime Day would be in July")?
They're largely clutter: what does a stock being up 2.5% on the day I happen to read an article tell me, exactly?

But what *could* we make them mean?
In news articles, we have two categories of questions in mind:

1. how it started (how did the efficient markets *react*)?

    When people look at stock price movements to interpret whether news is better or worse than expected, they are implicitly appealing to the EMH: "the market understands what this means, and the stock going up or down tells us what it thinks".

    So all 'tickercruft' is a half-assed implicit [event study](!W) (which is only an event if you happen to read it within a few hours of publication—if even that). "GoodRx fell −25% when Amazon announced its online pharmacy. Wow, that's serious!"

    To improve the event study, we make this rigorous: the ticker is meaningful only if it captures the *event*.
    Each use must be time-bracketed: what exact time did the news break & how did the stock move in the next ~hour (or possibly day)?
    Then that movement is cached and displayed henceforth.

    It may not be perfect but it's a lot better than displaying stock movements from arbitrarily far in the future when the reader happens to be reading it.
2. How it's going (since then)?

    When we read news, to generalize event studies, we are interested in the long-term outcome.
    ("It's a bold strategy, Cotton. Let's see how it works out for them.")
    So, similar to considering the net return for investment purposes, we can show the (net real index adjusted) return since publication.

    The net is a [high-variance but unbiased estimator](/backstop#black-box-vs-white-box-optimization) of every news article, and useful to know as foreshadowing: imagine reading an old article with the sentence "VISA (V: +0.01% today) welcomes its new president for 2020.", or "VISA welcomes its exciting new CEO John Johnson, who will take over starting in 2020 (V: −30%<sub>80%</sub>)"?

    The latter is useful context.
    V being up 0.01% the day you happen to read the article many years later, however, is not useful context for... anything.

Document tooling-wise, this is easy to support in Markdown.
They can be marked up the same way as my inflation-adjustment variables (eg. `[$10]($2020)`{.Markdown}), like `[V](!N "2019-04-27")`{.Markdown}.
For easier writing, since stock tickers tend to be unique (there are not many places other than stock names that strings like "V" or "AMZN" would appear, as [4-letter acronyms are mostly unused](/tla "‘CQK Is The First Unused TLA’, Gwern 2023")), the writer's text editor can run a few thousand regexp query-search-and-replaces (there are only so many stocks) to transform `V` → `[V](!N "2019-04-27")`{.Markdown} to inject the current day automatically.^[This could also be done by the CMS automatically on sight, assuming first-seen = writing-date, although with Pandoc this has the disadvantage that round-tripping does not preserve the original Markdown formatting, and the Pandoc conventions can look pretty strange compared to 'normal' Markdown—at least my Markdown, anyway.]

### Year-Currencies

It would also be useful to make temporal adjustments in regular accounting software, to treat each year of a currency as a foreign currency, with inflation defining the temporal exchange rate.

One would have **year-currencies** like '2019 dollars' vs '2020 dollars' and so on, and compute with respect to a particular target-year like '2024 dollars'.
This allows for much more sensible inter-temporal comparisons of expenditures or income, which do not simply reflect (usually unrelated) inflation rate changes and weird expenditure-currency-time-weighted nominal sums, and which would make clearer things like when real income is declining due to lack of raises.

Accounting software like [ledger](!W "Ledger (software)")/[hledger](https://hledger.org/) already mostly support such things through their foreign currencies, but they don't make it necessarily easy to write.
Some syntactic sugar would help. Like `$` is always a shortcut to USD, regardless of whether that is a dollar in 1792 or 2024; but it could be redefined as simply referring to the USD in the year of that transaction and be equivalent to a year-currency like `2024$`.
Then no additional effort by the user is necessary, because all transactions already have to have dates.

### NPV-Investment Units

We could go further: one should not hold much currency, because in addition to inflation nibbling away at its value, it is a bad investment.
If you hold a dollar permanently, you are foregoing a better alternative investment (like a stock index, eg. [VTSAX](https://investor.vanguard.com/investment-products/mutual-funds/profile/vtsax)), so you suffer from both inflation *and* severe opportunity cost.
Often, the opportunity cost is a lot larger, too.

If we don't try to express everything as atemporal dollars, or even a year-currency, what units *should* we use?

Accounting exists reflect the reality of our decisions & financial health: this is the ultimate goal of accounting which goes beyond just a crude list of account credits/debits or list of assets, which inevitably requires building a simple 'model', with judgments about risks and [depreciation](!W) and opportunity cost (which is why '[cash flow](!W) is a fact, all else is opinion').

This is also why accounting systems are never feature-complete, and always keep sprouting new ad hoc features & flags & reports, eventually embodying [Greenspun's tenth rule](!W) with their own, often ad hoc and extremely low-quality, built-in programming language.
Accounting is not a simple problem of just tracking some movements of a few kinds of assets like 'dollars' or 'cash' in and out of some 'accounts'.
Accounting is as complex as the world, because we always want more informative models of our financial position, and have more domain-specific knowledge to encode, and people will always differ about what they think the causal models are or what the future will be like, and this will cause conflicts over 'accounting'.
Accounting, [like statistics](/research-criticism "‘How Should We Critique Research?’, Gwern 2019"), is ultimately for *decision-making*: a failure to make this explicit will lead to many confusions & illusions in understanding the purpose of accounting or things like 'GAAP'.
Accounting systems should never delude themselves about this intrinsic complexity by pretending it doesn't exist and trying to foist it onto the users, but accept that they have to handle it and provide real solutions, like being built with a properly supported language like a Lisp or Python, instead of fighting a rearguard action by adding on features ad hoc as users can ram them through to meet their specific needs.

Simply listing everything in dollar amounts fails to do this.
When we compare individual stocks to the S&P 500, why stop there? Why not denominate *everything* in S&P 500 shares?
This would better reflect our actual choice ("spend on _X_" vs "save in VTSAX").

Like the year-currencies, this is fine as far as it goes, but it suffers from its own version of the 'nominal' vs 'real' illusion, by omitting a kind of intrinsic 'deflation' from the opportunity cost of compounding: if we simply do it the default way by defining a daily VTSAX exchange rate and inputting everything in year-currencies, which get converted to final VTSAX numbers for reports, we do take into account the increase in value of a VTSAX share over time, but we do not take into account something fairly important—the *income* from the index.
This would systematically bias reports by when a share was purchased: older shares are more valuable, assuming we buy-and-hold, and especially if we reinvest that.
(VTSAX pays out relatively little as [dividends](!W), so the problem isn't so big, but for other alternatives, like bonds, this omission might be wildly misleading.)

We presumably do track income from all investments properly, and so indirectly the opportunity cost is reflected in the balance sheets; the issue is not that it leads to wrong final numbers, but that it obscures the financial reality.
That of course is true of everything in accounting (sooner or later it all shows up in the assets or cash flow—fail to track depreciation properly and eventually there will be a large expensive expense for replacement/repair; fail to buy insurance or self-insure, and an expensive risk will happen etc.), and the point of accounting is to *show* such things beforehand, and let us see the costs & benefits *now*.

So the daily VTSAX share price (["mark-to-market"](!W)) should instead be treated as "held-to-maturity" (ie. we plan to do something like "hold it until the year 202<em>X</em>, after which we will sell it during retirement"), and the daily mark reflects the predicted NPV of holding VTSAX for _Y_ years while reinvesting & compounding.
The further back in time the VTSAX share, the more the 'exchange rate' reflects the future income & compounding (because we require several 2024 VTSAX shares to equal the value of 1 VTSAX share we bought-and-held in 2000, because that share *turned into* several 2024 VTSAX shares all on its own).

Then we get correct real opportunity-costs of expenses over time: our decision to spend \$10 on Beanie Babies in 2000 correctly reflects the real long-term cost to us of not putting that money into an index, and we can compute out the actual (or predicted) cost at any given date, and better see whether that asset performed well or the expense was justified. (*Did* investing in those Beanie Babies beat the alternative, or did our asset total gradually wither each year VTSAX outperformed and in retrospect the decision got worse and we refused to realize the loss? *Did* that \$10 of ice cream as a teen make us happier than \$100 in retirement we can't spend on anything we care about?)

This requires a bit more implementation effort, but still is not too hard: the target date can be specified by the user upfront once, and a simple, fair extrapolation of future income/returns used (eg. a simple average of all historical ones). If the user has so much that they can't save tax-free, a fudge factor can be added for taxes (updated after each actual tax filing if the user desires the greater accuracy). As the daily VTSAX share prices are updated however they are normally, the NPV estimate is updated by the ledger software without any further work from the user. And `$` doesn't need to be further modified, it simply remains a year-currency like the previous idea.

So for the user, this should be scarcely any more difficult than normal accounting software usage: one inputs transactions every day in a normal dollar amount, the stock price is updated by a script or API, and the main user-visible change is defaulting to `VTSAX_202X` units instead of `2024$` units, unless the user asks for `2024$` reports etc.

This provides a final, true, real accounting of all assets & expenses in a way which makes intertemporal comparison decision-relevant by default, while preserving all metadata necessary for standard financial reports or accounting.