Extract shared rustpython-unicode crate and route core Unicode semantics through it#7561
Extract shared rustpython-unicode crate and route core Unicode semantics through it#7561
rustpython-unicode crate and route core Unicode semantics through it#7561Conversation
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/e4115dbc-b1a5-4a77-90a7-38cf9ac1cdf2 Co-authored-by: youknowone <[email protected]>
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/e4115dbc-b1a5-4a77-90a7-38cf9ac1cdf2 Co-authored-by: youknowone <[email protected]>
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/e4115dbc-b1a5-4a77-90a7-38cf9ac1cdf2 Co-authored-by: youknowone <[email protected]>
rustpython-unicode crate and route core Unicode semantics through it
youknowone
left a comment
There was a problem hiding this comment.
@copilot check CI failures, comments and also missing refactoring about rustpython_common::encodings and rustpython_codegen::string_parser
crates/stdlib/src/unicodedata.rs
Outdated
| Nfkd, | ||
| } | ||
|
|
||
| impl From<NormalizeForm> for rustpython_unicode::NormalizeForm { |
There was a problem hiding this comment.
local NormalizeForm is expected to be replaced by rustpython_unicode::NormalizeForm
There was a problem hiding this comment.
Replaced the remaining local wrapper path in unicodedata and now parse directly into rustpython_unicode::NormalizeForm in 2934897.
| [features] | ||
| default = ["std", "casefold"] | ||
| std = [] | ||
| casefold = ["std", "dep:caseless"] |
There was a problem hiding this comment.
do we need this feature? when will the feature be disabled?
There was a problem hiding this comment.
Kept the std feature because rustpython-unicode is consumed by no_std crates, while casefold depends on caseless for std-enabled consumers. The feature wiring was clarified in 2934897.
crates/stdlib/src/unicodedata.rs
Outdated
There was a problem hiding this comment.
What I expect is the entire Ucd type is placed in the new unicode crate, without Python runtime.
There was a problem hiding this comment.
Moved the non-runtime Ucd implementation into rustpython-unicode and left crates/stdlib/src/unicodedata.rs as a thin Python wrapper in 2934897.
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/1d30ae08-d8f0-431c-9299-8aea5c21f7d4 Co-authored-by: youknowone <[email protected]>
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/1d30ae08-d8f0-431c-9299-8aea5c21f7d4 Co-authored-by: youknowone <[email protected]>
Checked the PR CI/check runs and the earlier branch run showed a stale lint failure while the newer run was otherwise succeeding/in progress. I addressed the review items and the missing |
youknowone
left a comment
There was a problem hiding this comment.
@copilot check CI failures and reviews
crates/literal/src/char.rs
Outdated
| @@ -10,17 +8,5 @@ use icu_properties::props::{EnumeratedProperty, GeneralCategory}; | |||
| /// * Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR) | |||
| /// * Zs (Separator, Space) other than ASCII space('\x20'). | |||
| pub fn is_printable(c: char) -> bool { | |||
There was a problem hiding this comment.
still need this function? why not directly calling is_repr_printable?
There was a problem hiding this comment.
Removed the redundant wrapper and now call rustpython_unicode::classify::is_repr_printable directly from escape.rs in 4efa5da.
crates/sre_engine/src/string.rs
Outdated
| } | ||
|
|
||
| #[inline] | ||
| pub(crate) fn is_word(ch: u32) -> bool { |
There was a problem hiding this comment.
importing rustpython_unicode::regex as unicode_regex will remove needs for all this helpers
There was a problem hiding this comment.
Switched the SRE engine over to a direct rustpython_unicode::regex import and removed the pass-through helpers in 4efa5da.
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/b894057a-9bed-4f35-8400-a5731c63602d Co-authored-by: youknowone <[email protected]>
Checked the current CI and reviews. I addressed the redundant Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
This change introduces a shared
rustpython-unicodecrate to own CPython-aligned Unicode semantics and data access, instead of keeping that logic spread acrossstr, regex, literal escaping, andunicodedata. The goal is to make Unicode behavior reusable across RustPython components while preserving higher-level string behavior in existing runtime crates.New shared Unicode crate
crates/unicodewith low-level, reusable APIs for:unicodedata-style database lookupsno_stdconsumers.Refactor core runtime call sites
rustpython-vmnow usesrustpython-unicodefor:str.isdecimal()str.isspace()str.isprintable()str.isidentifier()rustpython-literalnow uses the shared printable classification for repr/escape behavior.rustpython-sre_enginenow routes its Unicode character-class and case helpers through the shared crate.Thin
unicodedataintegrationcrates/stdlib/src/unicodedata.rsnow delegates reusable Unicode database and normalization operations torustpython-unicode.Focused coverage
rustpython-unicodefor classification, identifiers, regex predicates, case mapping, normalization, and Unicode database access.str,re, andunicodedata.Example of the new shared API shape: