Skip to content

Fix unicode Regex miscounting emoji length#2942

Merged
UziTech merged 1 commit intomarkedjs:masterfrom
calculuschild:FixUnicodeCharCount
Aug 15, 2023
Merged

Fix unicode Regex miscounting emoji length#2942
UziTech merged 1 commit intomarkedjs:masterfrom
calculuschild:FixUnicodeCharCount

Conversation

@calculuschild
Copy link
Copy Markdown
Contributor

@calculuschild calculuschild commented Aug 14, 2023

Description

Many emojis are 2+ unicode chars long. The \u tag which allows searching for punctuation also counts emojis as single chars, which throws off character count when slicing strings. Spreading the strings into an array restores the correct character count. There is probably some overhead slowdown but not that I can detect.

Eventually the \v regex tag (in Node 20) can replace\u to get an accurate char count natively.

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

Many emojis are 2+ unicode bytes long. The \u tag which allows searching for punctuation also counts emojis as single chars. Slicing the strings into an array restores the correct character count.
@vercel
Copy link
Copy Markdown

vercel bot commented Aug 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marked-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 14, 2023 6:35pm

Comment thread src/Tokenizer.ts
@UziTech UziTech linked an issue Aug 14, 2023 that may be closed by this pull request
@UziTech UziTech merged commit f3af23e into markedjs:master Aug 15, 2023
github-actions bot pushed a commit that referenced this pull request Aug 15, 2023
## [7.0.3](v7.0.2...v7.0.3) (2023-08-15)

### Bug Fixes

* Fix unicode Regex miscounting emoji length ([#2942](#2942)) ([f3af23e](f3af23e))
Logiclayer1111 pushed a commit to Logiclayer1111/marked that referenced this pull request Apr 20, 2026
## [7.0.3](markedjs/marked@v7.0.2...v7.0.3) (2023-08-15)

### Bug Fixes

* Fix unicode Regex miscounting emoji length ([#2942](markedjs/marked#2942)) ([9dbb3f2](markedjs/marked@9dbb3f2))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improper emoji rendering with v5.1.0

4 participants