add support of grapheme clusters#520
Conversation
now next/prev_grapheme_offset works correctly rather than falling back to codepoints. fixes xi-editor#298
cmyr
left a comment
There was a problem hiding this comment.
Cool, thanks for this! A few little nits but otherwise looks good.
rust/rope/src/rope.rs
Outdated
| } | ||
|
|
||
| fn is_boundary(s: &String, offset: usize) -> bool { | ||
| if let Ok(r) = GraphemeCursor::new(offset, s.len(), true).is_boundary(s, 0) { |
There was a problem hiding this comment.
you can call .ok() on any Result<T> to convert it to an Option<T>, discarding the error.
rust/rope/src/rope.rs
Outdated
| cursor.is_boundary::<GraphemeMetric>() | ||
| } | ||
|
|
||
| // graphemes should probably be developed as a cursor-based interface |
There was a problem hiding this comment.
We can get rid of this comment now. :)
rust/rope/src/rope.rs
Outdated
| } | ||
|
|
||
| #[derive(Clone, Copy)] | ||
| pub struct GraphemeMetric(()); |
There was a problem hiding this comment.
You can declare a unit struct without any braces: pub struct GraphemeMetric;.
|
I'm thrilled to see progress on this, and hate to throw cold water on the idea, but unfortunately trying to treat grapheme clusters as a metric doesn't quite work. The underlying reason is that grapheme cluster counting is fundamentally not a monoid homomorphism, as awesome as it would be if that were the case. Here's how to write a test case that will fail: Create a string of RIS code points, let's say Because the size of graphemes is not bounded, and because boundaries can in the worst case depend on unbounded amounts of context, the boundary detection must be able to handle bringing in an arbitrary number of leaves as pre-context. Bottom line, this needs to be implemented directly in {prev,next}_grapheme_offset with some attention to leaf boundaries. It is, sadly, not trivial to test - this is one reason I'd like to see some infrastructure for property testing developed. Thanks for the attempt, and I'm happy to provide guidance to see it through. |
|
I am taking a stab at this. Did I understand it correctly that what is needed is:
|
|
Yup, actually I was working on that. Some prototype is working on my local machine although it needs some refinements (sorry for my slow progress). However, I've found a few issues on |
|
unicode-rs/unicode-segmentation#37 here is the PR. |
Currently this points to my local github repository since some fixes are necessary.
|
So I pushed my version of the code. Please take a look, if that's okay. I'm fine to hold this until the problem I've seen in unicode-segmentation are fixed. |
|
This PR actually relies on unicode-rs/unicode-segmentation#40 too. I can create workarounds if that turned out unwanted. |
|
What is the status for this PR? All relevant PRs seem to be merged. |
|
Good question. I think unicode-segmentation needs to be published to crates.io. @Manishearth? |
|
done!
…-Manish Goregaokar
On Sun, Apr 29, 2018 at 8:33 AM, Raph Levien ***@***.***> wrote:
Good question. I think unicode-segmentation needs to be published to
crates.io. @Manishearth <https://github.com/Manishearth>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#520 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABivSD2dQ5KPhx_VeUrRGBpLK2WnKMY5ks5ttd1mgaJpZM4SEcDT>
.
|
|
Thanks @Manishearth! Would you like to resolve the conflicts, @jmuk? If not, we can take care of it. Incidentally, this will help a lot in dealing with CRLF line endings, right now the cursor can get between the two which causes all kinds of unexpected behavior. |
|
@raphlinus done! |
raphlinus
left a comment
There was a problem hiding this comment.
Looks great, thanks, and thanks for your patience.
rust/rope/src/rope.rs
Outdated
| @@ -363,13 +366,13 @@ impl Rope { | |||
|
|
|||
| // graphemes should probably be developed as a cursor-based interface | |||
There was a problem hiding this comment.
This comment can be removed, as you've actually done it :)
rust/rope/src/rope.rs
Outdated
| } | ||
| // GraphemeIncomplete::NextChunk error will not happen, since this cursor thinks the end of | ||
| // the current leaf as the end of the string. Can a cursor know the total length of | ||
| // the tree? |
There was a problem hiding this comment.
To answer this question, this should be available at cursor.root.len(). In my first reading, I'm not sure whether this would allow you to simplify the following logic.
There was a problem hiding this comment.
Okay -- yet cursor.root isn't visible here in rope.rs. I added total_len() to tree.rs, and it helps to simplify the code; at least I think so. PTAL.
rust/Cargo.lock
Outdated
| [[package]] | ||
| name = "unicode-segmentation" | ||
| version = "1.2.0" | ||
| source = "git+https://github.com/jmuk/unicode-segmentation.git?branch=fix_for_xi#1d7d062b38e6e5299ed0e5750f6890230c6cd0fa" |
There was a problem hiding this comment.
I think I'd like to wait for unicode-segmentation to release the fixed version, as (among other things), this would prevent releasing xi-rope as a crate. I know it's been a long time since we've done that, but we're discussing doing a release before long.
rust/rope/src/rope.rs
Outdated
| let mut c = GraphemeCursor::new(pos, l.len() + leaf_offset, true); | ||
| let mut next_boundary = c.next_boundary(&l, leaf_offset); | ||
| while let Err(GraphemeIncomplete::PreContext(_)) = next_boundary { | ||
| let (pl, poffset) = self.prev_leaf()?; |
There was a problem hiding this comment.
We should update the minimum rust version in README, as ? for Option is not supported in 1.20. A quick test shows that 1.22 should still work.
There was a problem hiding this comment.
It seems the README had been updated already to require Rust 1.22; as 3c53388
| assert_eq!(None, a.next_grapheme_offset(17)); | ||
| } | ||
|
|
||
| #[test] |
cmyr
left a comment
There was a problem hiding this comment.
Played around with this, feels good.
|
Many thanks! |
|
Yay! |
now next/prev_grapheme_offset works correctly rather than
falling back to codepoints.
fixes #298