Support underscores in numeric literals#6180
Merged
aryairani merged 10 commits intounisonweb:trunkfrom Mar 6, 2026
Merged
Conversation
Allow underscores as visual separators in all numeric literal forms (decimal, float, scientific, hex, octal, binary). Underscores are stripped at lex time so no downstream changes are needed. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Remove P.try from digitsWithUnderscores so that once an underscore is consumed, the parser commits to requiring digits after it. This rejects malformed literals like 1_, 1__2, 0xFF_, etc. instead of silently accepting them. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Integration test covering valid literals (decimal, float, scientific, hex, octal, binary with underscores) and error cases (trailing and consecutive underscores). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Factor octal/hex/binary into shared otherBase helper. Extract isBinDigit predicate. Use mconcat and toInteger for clarity. Add tests confirming that 1_x and 1_e3 are rejected (previously these silently parsed as two tokens). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ensures 0x_FF is an error, not a valid hex literal. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ensures 1_.2, 1._2, 1e_2, and 1_e2 are all rejected. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ensures 1e+_2 and 1e-_2 are rejected (underscore after exponent sign). Adds tests for leading zeros: 007 -> 7 and 0_1 -> 1. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Contributor
|
I was wanting this recently too for some reason. |
aryairani
approved these changes
Mar 5, 2026
Contributor
|
@tfausak Looks great. Could you do a couple of things:
|
aryairani
approved these changes
Mar 6, 2026
aryairani
approved these changes
Mar 6, 2026
tfausak
added a commit
to tfausak/unison
that referenced
this pull request
Mar 6, 2026
Extend the underscore separator feature from numeric literals (unisonweb#6180) to bytes literals. E.g. `0xs01_ef` now parses as `0xs01ef`. Uses the existing `digitsWithUnderscores` helper and switches from `isAlphaNum` to `isHexDigit` for stricter validation at lex time. Co-Authored-By: Claude Opus 4.6 <[email protected]>
4 tasks
Member
|
Lovely, thanks @tfausak ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
What does this change accomplish and why?
1_000_000instead of1000000,0xFF_FFinstead of0xFFFF, etc.1_000produced a confusing parse error.1_000is accepted and evaluates to1000.Before and after examples:
1_000_00010000000xFF_FF655350b1010_01011651_000.5e1_01000.5e101_1(silent)1__21(silent)1_x1,_x(two tokens)Closes Underscores in numeric literals #2228
Implementation approach and notes
This is a parser-only change — underscores are stripped at lex time, so no downstream changes to the parser, typechecker, pretty-printer, or runtime are needed.
Two helpers are added to the
numericparser inUnison.Syntax.Lexer.Unison:digitsWithUnderscores— parses digit groups separated by single underscores, committing after each_so that malformed literals (trailing_, consecutive__,_before non-digit) produce errors instead of silently accepting partial input.digitsToInteger— converts a digit string toIntegerfor a given base, replacing megaparsec'sLP.decimal/LP.hexadecimal/LP.octal/LP.binary.The three prefixed-base parsers (octal, hex, binary) are factored into a shared
baseWithPrefixhelper. Decimal is intentionally excluded from this because it has no prefix, which changes the backtracking semantics.Interesting/controversial decisions
1_xis now an error, not two tokens (1,_x). Once the lexer sees<digit>_, it commits to the underscore-in-number interpretation. This seems like the right call —1_xlooks like a malformed numeric literal, and users can write1 _xif they mean two tokens.0x_FFis rejected) or adjacent to./e/exponent signs. This matches Java's rules. Rust and Python are more permissive here.1000000, not1_000_000. This is consistent with how Unison handles string literals (single-line vs multi-line formatting is regenerated by the pretty-printer, not preserved from source). A configurable pretty-printer could be a future enhancement.0xs...) don't support underscores. This is a separate piece of work since the bytes parser has different structure.Test coverage
_, consecutive__,_adjacent to prefix/period/exponent,digit_nondigitsequences, leading zeros).numeric-underscore-literals.mdexercises valid literals end-to-end (parse → evaluate → display) and verifies error messages for invalid literals.Loose ends
0xs...) could also support underscores — separate issue.Final checklist
.cabalfile changes.