Add char range overload to DotDot operator#5026
Add char range overload to DotDot operator#5026TravisEz13 merged 7 commits intoPowerShell:masterfrom IISResetMe:overload-DotDot-CharRange-1
Conversation
lzybkr
left a comment
There was a problem hiding this comment.
The code changes look fine - but we need tests.
There was a problem hiding this comment.
What do you think about returning char[] instead?
There was a problem hiding this comment.
@IISResetMe Could you please address the comment?
There was a problem hiding this comment.
I like the idea, but for now I think we should keep operator output consistent (ie. return object[] with individual items of the expected type).
There are are number of builtin operators that return object[] in this fashion (@(), -split, any comparison operator in array/filter mode etc.), changing the behavior should be a concerted effort across all of them IMO
There was a problem hiding this comment.
I think this special case is correctly handled by the code below and it happens rarely, so maybe we should remove it. (And I see you copied the code from the int version, so you could remove it there too.)
There was a problem hiding this comment.
Mainly did it this way to keep it similar to the existing code in Compiler.cs. I'm inclined to make a single Range method that takes the operands as Expression's, de-duplicate the loop that generates the actual sequence and then cast the entire thing to char[], int[] or long[] based on the lhs operand before returning. Will update the code later today
There was a problem hiding this comment.
I've removed the redundant branch in both overloads now. 0406cb3
There was a problem hiding this comment.
Will this work:
'A'..'Z' | % { ... }
There was a problem hiding this comment.
Yes, this will produce the sequence A (0x41) through Z (0x5A)
There was a problem hiding this comment.
Apparently it does not work on the Powershell Core 6.0.0 GA
> 'A'..'Z' | % {$_}
Cannot convert value "A" to type "System.Int32". Error: "Input string was not in a correct format."
At line:1 char:1
+ 'A'..'Z' | % {$_}
+ ~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvalidCastFromStringToIntegerThere was a problem hiding this comment.
@kborowinski there is an open issue for that #5519
There was a problem hiding this comment.
I wonder - is this ASCII Arithmetic work correctly for unicode?
There was a problem hiding this comment.
It won't work with surrogate pairs because those can't be represented in a char, but I don't think that is an interesting scenario.
There was a problem hiding this comment.
Since we added `u{} should we detect surrogate pairs and write an error?
There was a problem hiding this comment.
I guess we should error on invalid characters (half of the pair). We could also support surrogate pair ranges if we didn't convert the string to char immediately, instead analyzing the string more thoroughly. But I'm not sure I'd block this PR on either of those.
There was a problem hiding this comment.
I guess it will work with traditional languages in common scenarios but before I saw issues from China and Japan mans - if the new operator does not matter to them I agree with you.
There was a problem hiding this comment.
@IISResetMe Could you please address the comment in code or add relevant comments in the PR description for future documentation?
There was a problem hiding this comment.
@iSazonov I don't feel confident enough in my unicode-fu to attempt to implement the appropriate handling in code for this at the moment, I've updated the PR description to reflect this deficiency instead.
|
Oops, rebased powershell:master into the branch, looks like all parent commits are in the PR now, how do I clean it up? 😞 |
|
@IISResetMe did you only have the 2 commits? if so you could do this (assuming git fetch
git checkout -b 20171015fix PowerShell/master
git cherry-pick 6e6dfffc2d0150b504585597ad4df997a174867a
git cherry-pick 6d6f20d28157ee49071f018051ac7b35be57bbb0
git checkout -b 20171015backup overload-DotDot-CharRange-1
git checkout overload-DotDot-CharRange-1
git reset --hard 20171015fix
git push --forceThis will create a new |
This commit attempts to allow the DotDot operator (..) to generate character ranges. Since PowerShell has no shorthand syntax for char literals, this change tests whether the left-hand-side is of type String, and subsequently attempts to convert both operands to a char. NOTE: This breaks a number of currently supported range operations, such as implicit numerical conversion of strings: '1'..'100'
This commit adds basic tests for the current integer range (..) operator
IISResetMe
left a comment
There was a problem hiding this comment.
These were the most basic tests I could think of for the existing Range operator. All examples I could find in the about_* help files pass these.
|
Thanks @markekraus, that seems to have done the trick. |
Add tests for the [char] overload feature. RangeOperator.Tests.ps1 encoding changed to UTF16LE (Windows "Unicode") to allows inline unicode chars in test case
|
Can you make sure |
Changed RangeOperator.Tests.ps1 encoding to UTF-8 (no BOM), substituted UTF-16 char literals with [char] casts
|
@TravisEz13 Done, https://github.com/IISResetMe/PowerShell/blob/34bc59046698d0de41bb5403a6105c7070b38a46/test/powershell/Language/Operators/RangeOperator.Tests.ps1 is UTF-8 (with updated tests) now |
| It "Range operator generates arrays of integers" { | ||
| $Range = 5..8 | ||
| $Range.count | Should Be 4 | ||
| $Range[0] | Should BeOfType [int] |
There was a problem hiding this comment.
Please add tests for values - $Range[0] | Should Be 5
| It "Range operator accepts negative integer values" { | ||
| { | ||
| -8..-5 | ||
| } |Should Not Throw |
There was a problem hiding this comment.
Please remove the should - next line do the same and Pester can catch a throw there correctly.
|
|
||
| $Range = -8..-5 | ||
| $Range.count | Should Be 4 | ||
| $Range[0] |Should Be -8 |
There was a problem hiding this comment.
Formatting - please add space after |
Please add type check as in previous test.
| It "Range operator support single-item sequences" { | ||
| { | ||
| 0..0 | ||
| } |Should Not Throw |
| $Range[0] | Should Be 0 | ||
| } | ||
|
|
||
| It "Range operator works in ascending and descending order" { |
There was a problem hiding this comment.
Previous tests already test "ascending". I think we could test here only descending order for positive and negative numbers. But I believe we should replace some tests with one - "-2 .. 2" and "2 .. -2" - so we check transition through zero too.
|
|
||
| Context "Range operator operand types" { | ||
|
|
||
| It "Range operator works on [int]" { |
There was a problem hiding this comment.
Please remove the test - we do the checks above already.
| } | Should Not Throw | ||
| $Range = ([long]1)..([long]10) | ||
| $Range.count | Should Be 10 | ||
| $Range.Where({$_ -is [int]}).count | Should Be 10 |
There was a problem hiding this comment.
Is it really [int]? I think we should use values > int.MaxValue.
And please check values too.
There was a problem hiding this comment.
Current implementation doesn't support values > int.MaxValue, intended to add this in a separate branch/PR.
There was a problem hiding this comment.
Thanks for clarify. Please remove the confuse test.
There was a problem hiding this comment.
@IISResetMe Please address the comment about the tests for [long] and [bigint].
We should remove its - they don't make sense.
There was a problem hiding this comment.
You're right, they don't make too much sense at the moment. I've removed them now
| } | Should Not Throw | ||
| $Range = ([bigint]1)..([bigint]10) | ||
| $Range.count | Should Be 10 | ||
| $Range.Where({$_ -is [int]}).count | Should Be 10 |
There was a problem hiding this comment.
Is it really [int]? I think we should use values > long.MaxValue.
And please check values too.
There was a problem hiding this comment.
The existing numeric range operator only supports int regardless of input type
| { | ||
| 1.2d..2.9d | ||
| } | Should Not Throw | ||
| $Range = 1.1d..9.9d |
There was a problem hiding this comment.
Why so much? I think enough $Range = 1.1d..3.9d
| $Range[9] | Should Be 10 | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Please add new line at EOF.
Fixed formatting, removed unnecessary checks
|
@lzybkr Can you update your review? |
|
@iSazonov Can you indicate if you think this is ready or not (feel free to say if someone else signs off)? |
|
@TravisEz13 I updated three comment above - after they have been resolved we'll ready to merge. |
Both Range() overloads have unnecessary branches for the special case of lhs == rhs, removed for simplicity.
Removed tests for .. using [long] and [bigint] input operands. I believe these will make more sense once we add support for ranges with values outside Int32, but for now might be confusing
|
@iSazonov I believe I've addressed all outstanding comments :-) |
|
@IISResetMe Many thanks! We are near to merge. Please address @lzybkr comment above. |
|
@iSazonov Done |
|
@lzybkr Could you please take another look if you have a bit free time? |
This PR attempts to allow the DotDot operator (..) to generate
character ranges, in turn allowing shorthand syntax like:
Since PowerShell has no shorthand syntax for char literals, we
test whether the left-hand-side is of type String, and subsequently
attempts to convert both operands to a char (in accordance with
PowerShell operator overload behavior in general).
NOTE:
This breaks a number of currently supported range operations, such
as implicit numerical conversion of strings:
'1'..'100'
This implementation is fairly naive, making no attempts to assert whether the input characters (or any codepoints in between) fall outside UCS-2 (ie. surrogates). Attempts to expand ranges bound by surrogate pairs are expected to fail and not covered by this implementation (I will leave this an exercise for someone more experienced with unicode hackery than myself)
I was unable to locate existing tests for the integer range function, please advise on where to add those.