Support underscores in numeric literals by tfausak · Pull Request #6180 · unisonweb/unison

tfausak · 2026-03-05T02:16:34Z

Overview

What does this change accomplish and why?
- Adds support for underscores as visual separators in numeric literals, a common readability feature in modern languages (Rust, Python, Java, Kotlin, Swift). Closes Underscores in numeric literals #2228.
- How does it change the user experience?
  - Users can now write 1_000_000 instead of 1000000, 0xFF_FF instead of 0xFFFF, etc.
- What was the old behavior/API and what is the new behavior/API?
  - Before: 1_000 produced a confusing parse error.
  - After: 1_000 is accepted and evaluates to 1000.

Before and after examples:

Input	Before	After
`1_000_000`	parse error	`1000000`
`0xFF_FF`	parse error	`65535`
`0b1010_0101`	parse error	`165`
`1_000.5e1_0`	parse error	`1000.5e10`
`1_`	`1` (silent)	parse error
`1__2`	`1` (silent)	parse error
`1_x`	`1`, `_x` (two tokens)	parse error

Closes Underscores in numeric literals #2228

Implementation approach and notes

This is a parser-only change — underscores are stripped at lex time, so no downstream changes to the parser, typechecker, pretty-printer, or runtime are needed.

Two helpers are added to the numeric parser in Unison.Syntax.Lexer.Unison:

digitsWithUnderscores — parses digit groups separated by single underscores, committing after each _ so that malformed literals (trailing _, consecutive __, _ before non-digit) produce errors instead of silently accepting partial input.
digitsToInteger — converts a digit string to Integer for a given base, replacing megaparsec's LP.decimal/LP.hexadecimal/LP.octal/LP.binary.

The three prefixed-base parsers (octal, hex, binary) are factored into a shared baseWithPrefix helper. Decimal is intentionally excluded from this because it has no prefix, which changes the backtracking semantics.

Interesting/controversial decisions

1_x is now an error, not two tokens (1, _x). Once the lexer sees <digit>_, it commits to the underscore-in-number interpretation. This seems like the right call — 1_x looks like a malformed numeric literal, and users can write 1 _x if they mean two tokens.
Underscores are not allowed after base prefixes (0x_FF is rejected) or adjacent to ./e/exponent signs. This matches Java's rules. Rust and Python are more permissive here.
Underscores are never emitted on output. The pretty-printer always renders 1000000, not 1_000_000. This is consistent with how Unison handles string literals (single-line vs multi-line formatting is regenerated by the pretty-printer, not preserved from source). A configurable pretty-printer could be a future enhancement.
Bytes literals (0xs...) don't support underscores. This is a separate piece of work since the bytes parser has different structure.

Test coverage

Lexer unit tests: 38 new test cases covering all numeric forms with underscores (decimal, float, scientific, hex, octal, binary) plus error cases (trailing _, consecutive __, _ adjacent to prefix/period/exponent, digit_nondigit sequences, leading zeros).
Transcript integration test: numeric-underscore-literals.md exercises valid literals end-to-end (parse → evaluate → display) and verifies error messages for invalid literals.
Test coverage is adequate for this change.

Loose ends

Bytes literals (0xs...) could also support underscores — separate issue.
The pretty-printer could optionally insert underscores in large numeric literals for readability — separate feature.

Final checklist

PR title is descriptive of the change.
Transcripts included demonstrating the changed behavior.
No .cabal file changes.

Allow underscores as visual separators in all numeric literal forms (decimal, float, scientific, hex, octal, binary). Underscores are stripped at lex time so no downstream changes are needed. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Remove P.try from digitsWithUnderscores so that once an underscore is consumed, the parser commits to requiring digits after it. This rejects malformed literals like 1_, 1__2, 0xFF_, etc. instead of silently accepting them. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Integration test covering valid literals (decimal, float, scientific, hex, octal, binary with underscores) and error cases (trailing and consecutive underscores). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Factor octal/hex/binary into shared otherBase helper. Extract isBinDigit predicate. Use mconcat and toInteger for clarity. Add tests confirming that 1_x and 1_e3 are rejected (previously these silently parsed as two tokens). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Ensures 0x_FF is an error, not a valid hex literal. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Ensures 1_.2, 1._2, 1e_2, and 1_e2 are all rejected. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Ensures 1e+_2 and 1e-_2 are rejected (underscore after exponent sign). Adds tests for leading zeros: 007 -> 7 and 0_1 -> 1. Co-Authored-By: Claude Opus 4.6 <[email protected]>

aryairani · 2026-03-05T05:00:58Z

I was wanting this recently too for some reason.

aryairani · 2026-03-05T19:43:08Z

@tfausak Looks great. Could you do a couple of things:

Add yourself to CONTRIBUTORS.markdown, acknowledging Unison's MIT license
run scripts/check.sh, and assuming it succeeds, check in the "proof" files it generates.

Extend the underscore separator feature from numeric literals (unisonweb#6180) to bytes literals. E.g. `0xs01_ef` now parses as `0xs01ef`. Uses the existing `digitsWithUnderscores` helper and switches from `isAlphaNum` to `isHexDigit` for stricter validation at lex time. Co-Authored-By: Claude Opus 4.6 <[email protected]>

ChrisPenner · 2026-03-16T20:38:41Z

Lovely, thanks @tfausak !

tfausak and others added 8 commits March 5, 2026 01:16

Add transcript test for underscore numeric literals

b5471b1

Integration test covering valid literals (decimal, float, scientific, hex, octal, binary with underscores) and error cases (trailing and consecutive underscores). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Rename otherBase to baseWithPrefix to avoid case-only difference

21beed9

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add test that underscore after base prefix is rejected

ab5c0cb

Ensures 0x_FF is an error, not a valid hex literal. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add tests for underscores adjacent to period and exponent marker

168ff0b

Ensures 1_.2, 1._2, 1e_2, and 1_e2 are all rejected. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add tests for exponent sign underscores and leading zeros

b6104f1

Ensures 1e+_2 and 1e-_2 are rejected (underscore after exponent sign). Adds tests for leading zeros: 007 -> 7 and 0_1 -> 1. Co-Authored-By: Claude Opus 4.6 <[email protected]>

aryairani approved these changes Mar 5, 2026

View reviewed changes

tfausak added 2 commits March 5, 2026 20:30

Add myself to the list of contributors

904a2d0

Update test proof

e5c5b2e

tfausak requested a review from a team as a code owner March 5, 2026 20:43

aryairani approved these changes Mar 6, 2026

View reviewed changes

aryairani self-requested a review March 6, 2026 02:16

aryairani approved these changes Mar 6, 2026

View reviewed changes

aryairani added this pull request to the merge queue Mar 6, 2026

Merged via the queue into unisonweb:trunk with commit 0d51c30 Mar 6, 2026
5 checks passed

tfausak mentioned this pull request Mar 6, 2026

Support underscores in bytes literals #6181

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support underscores in numeric literals#6180

Support underscores in numeric literals#6180
aryairani merged 10 commits intounisonweb:trunkfrom
tfausak:gh-2228-numeric-underscores

tfausak commented Mar 5, 2026

Uh oh!

aryairani commented Mar 5, 2026

Uh oh!

aryairani commented Mar 5, 2026

Uh oh!

Uh oh!

ChrisPenner commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tfausak commented Mar 5, 2026

Overview

Implementation approach and notes

Interesting/controversial decisions

Test coverage

Loose ends

Final checklist

Uh oh!

aryairani commented Mar 5, 2026

Uh oh!

aryairani commented Mar 5, 2026

Uh oh!

Uh oh!

ChrisPenner commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants