Skip to content

Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections#125555

Merged
danmoseley merged 9 commits intomainfrom
copilot/fix-frozenset-creation-exception
Mar 20, 2026
Merged

Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections#125555
danmoseley merged 9 commits intomainfrom
copilot/fix-frozenset-creation-exception

Conversation

Copy link
Contributor

Copilot AI commented Mar 14, 2026

ToFrozenSet()/ToFrozenDictionary() throws IndexOutOfRangeException for collections with more than ~1.07 billion items due to integer overflow in FrozenHashTable.

Description

In CalcNumBuckets, int minNumBuckets = uniqueCodesCount * 2 silently overflows for uniqueCodesCount > int.MaxValue / 2, producing a negative value. This causes the prime-table search loop to skip entirely (minPrimeIndexInclusive stays 0), leaving maxPrimeIndexExclusive = 0, and then primes[-1] throws.

State Working limit Behavior when exceeded
Before PR ~1.07B entries IndexOutOfRangeException (crash accessing primes[-1])
After PR ~1.07B entries OutOfMemoryException (clear, intentional)

Changes

  • CalcNumBuckets: Use long for minNumBuckets to prevent overflow. For inputs exceeding the prime table range, the existing early-return via HashHelpers.GetPrime(uniqueCodesCount) now fires correctly.

  • Create: Add an Array.MaxLength guard before ArrayPool.Rent(numBuckets + hashCodes.Length). Without it, the sum overflows int for ~1B+ item collections, producing a misleading ArgumentOutOfRangeException. Uses the same #if NET / #else fallback pattern as LengthBuckets.cs.

  • Tests: Added two [OuterLoop] regression tests — ToFrozenSet_LargeSet_ExceedsPrimeTable and ToFrozenDictionary_LargeDictionary_ExceedsPrimeTable — with 4M elements. These exercise the CalcNumBuckets early-return path (where uniqueCodesCount * 2 exceeds the precomputed primes table) and are practical for CI (~100MB memory). The integer overflow fix boundary (>1.07B items) was verified locally on a 64GB machine — the exception changed from IndexOutOfRangeException to OutOfMemoryException.

// Before: throws IndexOutOfRangeException in CalcNumBuckets
var frozen = new HashSet<int>(1_188_074_137) { ... }.ToFrozenSet();

// After: throws OutOfMemoryException (correct failure mode for infeasible allocation)
Original prompt

This section details on the original issue you should resolve

<issue_title>Creating a FrozenSet with from huge HashSet throws an exception</issue_title>
<issue_description>### Describe the bug
Creating a FrozenSet with from HashSet throws an exception

To Reproduce

using System.Collections.Frozen;

const int capacity = 1188074137;

var set = new HashSet<int>(capacity);
for (int i = 0; i < capacity; i++) set.Add(i);

Console.WriteLine("Forming frozen set");
try
{
    var frozen = set.ToFrozenSet();
    Console.WriteLine(frozen.Count);
}
catch (Exception ex)
{
    Console.WriteLine($"Handled exception: {ex}");
}

Exceptions (if any)

Handled exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Collections.Frozen.FrozenHashTable.CalcNumBuckets(ReadOnlySpan`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.FrozenHashTable.Create(Span`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.Int32FrozenSet..ctor(HashSet`1 source)
   at System.Collections.Frozen.FrozenSet.CreateFromSet[T](HashSet`1 source)
   at System.Collections.Frozen.FrozenSet.ToFrozenSet[T](IEnumerable`1 source, IEqualityComparer`1 comparer)
   at Program.<Main>$(String[] args) in /home/cert/RiderProjects/FrozenBug/FrozenBug/Program.cs:line 13

Further technical details

details of dotnet --info

.NET SDK: Version: 11.0.100-preview.3.26163.101 Commit: 1b989af698 Workload version: 11.0.100-manifests.b985b9e9 MSBuild version: 18.6.0-preview-26163-101+1b989af69

Runtime Environment:
OS Name: cachyos
OS Version:
OS Platform: Linux
RID: linux-x64
Base Path: /home/cert/.dotnet/sdk/11.0.100-preview.3.26163.101/

.NET workloads installed:
There are no installed workloads to display.
Configured to use workload sets when installing new manifests.
No workload sets are installed. Run "dotnet workload restore" to install a workload set.

Host:
Version: 11.0.0-preview.3.26163.101
Architecture: x64
Commit: 1b989af698

.NET SDKs installed:
8.0.419 [/home/cert/.dotnet/sdk]
9.0.312 [/home/cert/.dotnet/sdk]
10.0.300-preview.0.26163.113 [/home/cert/.dotnet/sdk]
11.0.100-preview.3.26163.101 [/home/cert/.dotnet/sdk]

.NET runtimes installed:
Microsoft.AspNetCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
None

Environment variables:
DOTNET_ROOT [/home/cert/.dotnet]

global.json file:
Not found

- JetBrains Rider 2026.1 EAP 8

Tested on .NET 8, .NET 9, .NET 10 and .NET 11

full log

❯ dotnet run --framework net11.0 --configuration Release FrozenBug.csproj
Forming frozen
Handled exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Collections.Frozen.FrozenHashTable.CalcNumBuckets(ReadOnlySpan`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.FrozenHashTable.Create(Span`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.Int32FrozenSet..ctor(HashSet`1 source)
   at System.Collections.Frozen.FrozenSet.CreateFromSet[T](HashSet`1 source)
   at Program.$(String[] args) in /home/cert/RiderProjects/FrozenBug/FrozenBug/Program.cs:line 13
❯ dotnet run --framework net10.0 --configuration Release FrozenBug.csproj
Forming frozen
Handled exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Collections.Frozen.FrozenHashTable.CalcNumBuckets(ReadOnlySpan`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.FrozenHashTable.Create(Span`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.Frozen.Int32FrozenSet..ctor(HashSet`1 source)
   at System.Collections.Frozen.FrozenSet.CreateFromSet[T](HashSet`1 source)
   at Program.$(String[] args) in /home/cert/RiderProjects/FrozenBug/FrozenBug/Program.cs:line 13
❯ dotnet run --framework net9.0 --configuration Release FrozenBug.csproj
Forming frozen
Handled exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Collections.Frozen.FrozenHashTable.CalcNumBuckets(ReadOnlySpan`1 hashCodes, Boolean hashCodesAreUnique)
   at System.Collections.F...
  • Fixes dotnet/runtime#125553

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

When creating a FrozenSet/FrozenDictionary from a collection with
more than ~1 billion items, `CalcNumBuckets` overflows int on the
computation `uniqueCodesCount * 2`, producing a negative value that
causes `primes[-1]` to be accessed.

Fix by using `long` for `minNumBuckets` in CalcNumBuckets, and add
an Array.MaxLength check in Create to throw OutOfMemoryException
instead of a confusing ArgumentOutOfRangeException when the combined
bucket+hashcode allocation would overflow int.

Co-authored-by: danmoseley <[email protected]>
Copilot AI changed the title [WIP] Fix FrozenSet creation from HashSet to prevent exception Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections Mar 14, 2026
Copilot AI requested a review from danmoseley March 14, 2026 14:14
@danmoseley
Copy link
Member

@copilot you need to add tests. Make sure they do fail without the fix

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow during bucket-count calculation and guarding against invalid pooled-array sizes.

Changes:

  • Update FrozenHashTable.CalcNumBuckets to compute the “minimum bucket count” using long to avoid int overflow for very large uniqueCodesCount.
  • Add a max-array-length guard in FrozenHashTable.Create before renting a combined ArrayPool<int> buffer sized numBuckets + hashCodes.Length.

These tests create collections with > int.MaxValue / 2 items, which is
the minimum count that triggers the overflow in `uniqueCodesCount * 2`
in CalcNumBuckets. Without the fix, both tests throw
IndexOutOfRangeException; with the fix they complete successfully.

Co-authored-by: danmoseley <[email protected]>
Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot you need to add tests. Make sure they do fail without the fix

Added two [OuterLoop] regression tests in commit 86ab3ea:

  • FrozenSet_Generic_Tests_int.ToFrozenSet_HugeSet_NoIntegerOverflow in FrozenSetTests.cs
  • FrozenDictionary_Generic_Tests_int_int.ToFrozenDictionary_HugeDictionary_NoIntegerOverflow in FrozenDictionaryTests.cs

Both create a collection with 1,073,741,825 items (just over int.MaxValue / 2, the minimum that triggers the overflow in uniqueCodesCount * 2). I confirmed they fail without the fix — throwing IndexOutOfRangeException in FrozenHashTable.CalcNumBuckets — and pass with the fix. The tests require several GB of available memory so they're appropriately gated behind [OuterLoop].

Copilot AI requested a review from danmoseley March 14, 2026 14:55
@danmoseley danmoseley marked this pull request as ready for review March 14, 2026 16:07
Copilot AI review requested due to automatic review settings March 14, 2026 16:07
@danmoseley danmoseley enabled auto-merge (squash) March 14, 2026 16:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow in bucket-count calculation and guarding large temporary allocations.

Changes:

  • Prevent overflow in FrozenHashTable.CalcNumBuckets by computing minNumBuckets as long.
  • Add an Array.MaxLength-based guard before renting the combined bucketStarts + nexts buffer in FrozenHashTable.Create.
  • Add two [OuterLoop] regression tests intended to cover the huge-collection scenario for ToFrozenSet / ToFrozenDictionary.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs Uses long for bucket sizing math; adds max-length guard before renting the combined temporary buffer.
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs Adds an [OuterLoop] test attempting to freeze a ~1.07B element HashSet<int>.
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs Adds an [OuterLoop] test attempting to freeze a ~1.07B entry Dictionary<int,int>.

@danmoseley
Copy link
Member

Checking for other places

The previous tests allocated 1B+ elements requiring tens of GB, making them
impractical for CI even as OuterLoop. Replace with 4M-element tests that
exercise the CalcNumBuckets early-return path (where uniqueCodesCount * 2
exceeds the precomputed primes table) with ~100MB memory.

These tests don't directly verify the integer overflow fix (which requires
>1.07B items to trigger) but add coverage for a previously untested code path.

Co-authored-by: Copilot <[email protected]>
@danmoseley
Copy link
Member

addressed

danmoseley and others added 2 commits March 15, 2026 08:22
…tionary test

Remove the two custom 4M-element ExceedsPrimeTable tests (FrozenDictionary
and FrozenSet) which were redundant with the existing 8M ulong dictionary
test. Add a matching CreateHugeSet_Success test in
FrozenSet_Generic_Tests_ulong to provide parallel FrozenSet coverage of the
same large-collection code path.

Co-authored-by: Copilot <[email protected]>
@danmoseley
Copy link
Member

Should be ready to sign off. Failures are dead letter.

@danmoseley danmoseley requested a review from stephentoub March 18, 2026 17:33
@danmoseley
Copy link
Member

@copilot look at last feedback and address if appropriate

Copilot AI requested a review from danmoseley March 19, 2026 01:53
@danmoseley
Copy link
Member

/ba-g dead letter, enough coverage

@danmoseley danmoseley enabled auto-merge (squash) March 19, 2026 03:06
@danmoseley
Copy link
Member

@stephentoub I think this is correct as is as noted above.

@danmoseley danmoseley requested a review from stephentoub March 19, 2026 22:00
@stephentoub
Copy link
Member

@copilot please invoke the code-review skill and post the analysis/results as a comment here

Copy link
Contributor Author

Copilot AI commented Mar 20, 2026

@copilot please invoke the code-review skill and post the analysis/results as a comment here

🤖 Copilot Code Review — PR #125512

Holistic Assessment

Motivation: The bug is real and verified — int minNumBuckets = uniqueCodesCount * 2 silently overflows for uniqueCodesCount > int.MaxValue/2 (~1.07B), producing a negative value that causes primes[-1] to throw IndexOutOfRangeException. The fix converts a confusing crash into either a working result (for ~715M–1.07B elements) or a clear OutOfMemoryException.

Approach: The intlong widening is the correct minimal fix. The early bail in CalcNumBuckets is a sound optimization that avoids expensive collision-counting when the result would be unusable. The OOM guard in Create converts a misleading ArgumentOutOfRangeException into a clear OutOfMemoryException.

Summary: ✅ LGTM. The three code changes are correct and well-motivated. The test is a reasonable compromise given the ~20-40GB memory requirement to exercise the actual overflow boundary. No new public API surface.


Detailed Findings

✅ Overflow fix in CalcNumBuckets (line 188)

long minNumBuckets = (long)uniqueCodesCount * 2 correctly prevents the overflow. The subsequent comparison at line 207 (minNumBuckets > primes[minPrimeIndexInclusive]) correctly promotes the int prime to long for comparison.

✅ Early bail in CalcNumBuckets (lines 193-200)

The condition minNumBuckets + hashCodes.Length > Array.MaxLength correctly identifies when any bucket count the collision-counting loop might find would cause Create to fail (since Create needs numBuckets + hashCodes.Length in one array). The fallback to HashHelpers.GetPrime(uniqueCodesCount) is correct — it returns a usable prime for the ~715M–1.07B range where 2*uniqueCodesCount exceeds the limit but uniqueCodesCount + hashCodes.Length does not.

This also safely guards the maxNumBuckets = uniqueCodesCount * MaxLargeBucketTableMultiplier computation at line 219-221, which would overflow int for uniqueCodesCount > ~715M. Since the early bail fires at uniqueCodesCount ≥ ~715,827,864 (when hashCodes.Length = uniqueCodesCount) and the *3 overflow begins at uniqueCodesCount > 715,827,882, the early return prevents the overflow from being reached.

✅ OOM guard in Create (lines 60-67)

The (long)numBuckets + hashCodes.Length > Array.MaxLength check correctly prevents a confusing ArgumentOutOfRangeException from ArrayPool.Rent when the sum overflows int. The #if NET / #else 0x7FFFFFC7 pattern is consistent with LengthBuckets.cs in the same project.

✅ Line 199 — HashHelpers.GetPrime(uniqueCodesCount) is correct (not int.MaxValue)

Re: @stephentoub's earlier question — returning int.MaxValue instead of computing a prime would break the working range of ~715M–1.07B elements. For those sizes, GetPrime(uniqueCodesCount) returns a prime ≈ uniqueCodesCount, which gives a ~1.0 load factor. Then in Create, numBuckets + hashCodes.Length ≈ 2*uniqueCodesCount fits within Array.MaxLength (since the early bail ensures 3*uniqueCodesCount ≤ Array.MaxLength). Returning int.MaxValue would always fail the Create guard. Dan's analysis in the resolved thread is accurate.

✅ Test (FrozenSetTests.cs lines 438-444)

The CreateHugeSet_Success test with 8M ulong elements mirrors the existing CreateHugeDictionary_Success test for FrozenDictionary. It exercises the prime-table-exceeded early return (since 8M*2 = 16M exceeds the largest precomputed prime 7,199,369) but does not directly test the overflow fix boundary (which requires ~20-40GB memory). This is a reasonable trade-off explicitly acknowledged in the PR description.

💡 Pre-existing: load factor cliff at ~3.5M unique hash codes

Not introduced by this PR, but worth noting: above ~3.5M unique hash codes (half the largest precomputed prime 7,199,369), the code falls back from optimized ~0.5 load factor to ~1.0 load factor. Dan noted this and suggested filing a separate issue.

@danmoseley danmoseley merged commit 5ab4f9b into main Mar 20, 2026
84 of 92 checks passed
@danmoseley danmoseley deleted the copilot/fix-frozenset-creation-exception branch March 20, 2026 22:13
@danmoseley
Copy link
Member

@copilot can you create a new issue for the performance cliff discussed above in case we want to look at that at some point.

@danmoseley
Copy link
Member

I guess copilot here can't do that. I'll do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants