Comments on: Remove Duplicates From a C# Array https://code-maze.com/csharp-array-remove-duplicates/ Learn. Code. Succeed. Fri, 26 May 2023 21:30:01 +0000 hourly 1 https://wordpress.org/?v=6.7.5 By: Jacob https://code-maze.com/csharp-array-remove-duplicates/#comment-8264 Fri, 26 May 2023 21:30:01 +0000 https://drafts.code-maze.com/?p=69862#comment-8264 I’m quite surprised that HashSet variant takes this much memory… That is weird.

However this benchmark rather is silly, because
From the performance complexity point of view…
The IterationAndSwapping has O(n^2) complexity = N * N search
And the ConversionToHashSet has O(n) complexity = N * O(1) search

And from the memory point of view…
The algorithms are not created equal!
Few only reads the input array,
but others with less memory footprint are mutating the input array.
I’d say that is cheating 🙂

From the benchmarking point of view…
I think using string as elements here is not nice. Comparing 2 strings can be non-trivial operation, so using ints or some other primite type would be much more straightforward way for this benchmark.

If minimal memory footprint is required I guess you could optimize the HashSet variant further by implementing the hashing yourself. It is weird that HashSet takes so much memory…

I created a PR for making this benchmark a much more precise than those 170 string items from before.
And the results speaks definitely for the hashing… 10000 ints 140us vs 19000us…

https://github.com/CodeMazeBlog/CodeMazeGuides/pull/1128
https://pastebin.com/QvxX1AHS
|           Method |   N | PercentDuplicates |      Mean |
|    ConversionToHashSet | 10000 |         0 |   131.517 us |
|    ConversionToHashSet | 10000 |        25 |   141.748 us |
|    ConversionToHashSet | 10000 |        50 |   151.430 us |
|    ConversionToHashSet | 10000 |        100 |   153.863 us |
|    ConversionToHashSet | 10000 |        75 |   163.419 us |
|    IterationAndSwapping | 10000 |        100 | 16,306.401 us |
|    IterationAndSwapping | 10000 |        75 | 17,706.000 us |
|    IterationAndSwapping | 10000 |        50 | 19,340.314 us |
|    IterationAndSwapping | 10000 |        25 | 20,492.625 us |
|    IterationAndSwapping | 10000 |         0 | 21,640.652 us |

]]>
By: Saurabh https://code-maze.com/csharp-array-remove-duplicates/#comment-7463 Sun, 05 Feb 2023 09:53:06 +0000 https://drafts.code-maze.com/?p=69862#comment-7463 In reply to James Curran.

James i have gone through above sol none of them working .for example there is nothing
arrayWithDuplicateValues[0..size]; in c# like this also there is nothing called TryAdd.Not sure if i am wrong somewhere

]]>
By: Marinko Spasojević https://code-maze.com/csharp-array-remove-duplicates/#comment-5781 Mon, 06 Jun 2022 08:44:30 +0000 https://drafts.code-maze.com/?p=69862#comment-5781 In reply to James M Curran.

🙂 Okay. Thank you for the reply. Of course, if you change your mind, we are here.

]]>
By: James M Curran https://code-maze.com/csharp-array-remove-duplicates/#comment-5778 Sun, 05 Jun 2022 23:19:47 +0000 https://drafts.code-maze.com/?p=69862#comment-5778 In reply to Marinko Spasojević.

I was thinking about it (I first saw the request for authors when I was trying to figure out how to log onto this site), before I posted my comment).
But I’m not that good at generating original content (As you can tell from my blog which has six new articles in the last four years). I’m better at looking at other work and finding ways to improve it.

]]>
By: Marinko Spasojević https://code-maze.com/csharp-array-remove-duplicates/#comment-5768 Thu, 02 Jun 2022 18:47:26 +0000 https://drafts.code-maze.com/?p=69862#comment-5768 In reply to James Curran.

Thank you very much, James, for your suggestions. I’ve verified the PR, and it looks great. We will accept it and also modify this article a bit. Also, if you want, we always have open positions for authors (a paid position, remote), so if you are interested to share knowledge with our readers, everyone would benefit from that 🙂

]]>
By: James Curran https://code-maze.com/csharp-array-remove-duplicates/#comment-5767 Thu, 02 Jun 2022 18:10:30 +0000 https://drafts.code-maze.com/?p=69862#comment-5767 In reply to James Curran.

Forgot to include this

|           Method |    Mean |   Error |  StdDev | Rank | Gen 0 | Gen 1 | Allocated |
|--------------------------- |------------:|----------:|----------:|-----:|-------:|-------:|----------:|
|    IterationAndSwapping |  539.9 ns |  9.92 ns |  9.28 ns |  1 | 0.0057 |   - |   40 B |
| IterationWithDictionaryOpt | 2,264.3 ns | 15.23 ns | 12.72 ns |  2 | 0.0420 |   - |   280 B |
|  IterationWithDictionary | 2,400.5 ns | 24.11 ns | 20.14 ns |  3 | 0.0648 |   - |   424 B |
|    ConversionToHashSet | 2,679.4 ns | 52.43 ns | 81.63 ns |  4 | 0.8087 |   - |  5,080 B |
|       HashingMethod | 2,682.5 ns | 46.32 ns | 41.06 ns |  4 | 0.8087 |   - |  5,080 B |
|     DistinctLINQMethod | 2,757.4 ns | 54.44 ns | 106.18 ns |  4 | 0.8163 |   - |  5,144 B |
| GroupByAndSelectLINQMethod | 5,623.3 ns | 100.15 ns | 102.85 ns |  5 | 0.7858 | 0.0076 |  4,976 B |
|      RecursiveMethod | 6,356.3 ns | 112.21 ns | 149.80 ns |  6 | 1.2817 |   - |  8,088 B |
|    IterationAndShifting | 19,631.9 ns | 316.15 ns | 280.26 ns |  7 |   - |   - |   40 B |

(unfortunately, I can’t find an option to make WordPress use a monospaced font.)

]]>
By: James Curran https://code-maze.com/csharp-array-remove-duplicates/#comment-5766 Thu, 02 Jun 2022 18:06:49 +0000 https://drafts.code-maze.com/?p=69862#comment-5766 I was rather surprised to see IterationWithDictionary take the top spot. I would think it’s essentially ConversionToHashSet with some extra overhead. But, anyway, it could be improved. Replacing the return line with just return dic.Keys.ToArray(); reduces it’s time by about 10%, and memory by about 33%

BUT, the biggest effect I was able to achieve was a simple change to IterationAndShifting which moved from a distant last place to way out in front — it literally cut the time by over 97%!!

Instead of shifting everything down, just replace the duplicate item with the last element of the array.

        if (arrayWithDuplicateValues[i] == arrayWithDuplicateValues[j])
        {
          size--;
          arrayWithDuplicateValues[j] = arrayWithDuplicateValues[size];
          j--;
        }

My changes are at : jamescurran/CodeMazeGuides: The main repository for all the Code Maze guides (github.com)

I’ve sent a pull request.

]]>