Conversation
| [Benchmark] public void IP() => _ip.IsMatch("012.345.678.910"); | ||
| [Benchmark] public void Uri() => _uri.IsMatch("http://example.org"); | ||
| [Benchmark] public void Email_IsMatch() => _email.IsMatch("[email protected]"); | ||
| [Benchmark] public void Email_IsNotMatch() => _email.IsMatch("[email protected]#"); |
There was a problem hiding this comment.
How big is the difference in the reported time for IsMatch and IsNotMatch? I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Code-Paths)
Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#benchmarks-are-immutable)
I wonder if instead of this we should have a single benchmark where the mismatched character is the first char. Just to measure the case where the method should return quickly.
[Benchmark] public void IsNotMatch_EmailFirstChar() => _email.IsMatch("#[email protected]");There was a problem hiding this comment.
Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks
I don't think that's important. We should ensure we have the benchmarks we want to have. If we're not measuring the right things, we should stop doing so and measure the right things. I really don't care about the history of benchmarks that aren't meaningful, and we can run the benchmarks at any given point in time against all previous builds/releases we care about. The only problem with changing the IDs is that we might have more difficulty tracking down when a regression occurred, and that is not an issue here. Further, these benchmarks are only a month or two old, anyway.
How big is the difference in the reported time for IsMatch and IsNotMatch?
Big. See dotnet/runtime#1348 for examples.
I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks
It is. The match case is the happy path: every check as part of the regex succeeds. The not-match case highlights the quality of the regex engine, since that's where backtracking potentially kicks in.
I wonder if instead of this we should have a single benchmark where the mismatched character is the first char.
That'd be a fine additional case, but it's not the most important case.
|
Thanks. |
cc: @adamsitnik, @billwert