Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections#125555
Conversation
When creating a FrozenSet/FrozenDictionary from a collection with more than ~1 billion items, `CalcNumBuckets` overflows int on the computation `uniqueCodesCount * 2`, producing a negative value that causes `primes[-1]` to be accessed. Fix by using `long` for `minNumBuckets` in CalcNumBuckets, and add an Array.MaxLength check in Create to throw OutOfMemoryException instead of a confusing ArgumentOutOfRangeException when the combined bucket+hashcode allocation would overflow int. Co-authored-by: danmoseley <[email protected]>
|
@copilot you need to add tests. Make sure they do fail without the fix |
There was a problem hiding this comment.
Pull request overview
Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow during bucket-count calculation and guarding against invalid pooled-array sizes.
Changes:
- Update
FrozenHashTable.CalcNumBucketsto compute the “minimum bucket count” usinglongto avoidintoverflow for very largeuniqueCodesCount. - Add a max-array-length guard in
FrozenHashTable.Createbefore renting a combinedArrayPool<int>buffer sizednumBuckets + hashCodes.Length.
These tests create collections with > int.MaxValue / 2 items, which is the minimum count that triggers the overflow in `uniqueCodesCount * 2` in CalcNumBuckets. Without the fix, both tests throw IndexOutOfRangeException; with the fix they complete successfully. Co-authored-by: danmoseley <[email protected]>
Added two
Both create a collection with 1,073,741,825 items (just over |
There was a problem hiding this comment.
Pull request overview
Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow in bucket-count calculation and guarding large temporary allocations.
Changes:
- Prevent overflow in
FrozenHashTable.CalcNumBucketsby computingminNumBucketsaslong. - Add an
Array.MaxLength-based guard before renting the combinedbucketStarts + nextsbuffer inFrozenHashTable.Create. - Add two
[OuterLoop]regression tests intended to cover the huge-collection scenario forToFrozenSet/ToFrozenDictionary.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs | Uses long for bucket sizing math; adds max-length guard before renting the combined temporary buffer. |
| src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs | Adds an [OuterLoop] test attempting to freeze a ~1.07B element HashSet<int>. |
| src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs | Adds an [OuterLoop] test attempting to freeze a ~1.07B entry Dictionary<int,int>. |
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
|
Checking for other places |
The previous tests allocated 1B+ elements requiring tens of GB, making them impractical for CI even as OuterLoop. Replace with 4M-element tests that exercise the CalcNumBuckets early-return path (where uniqueCodesCount * 2 exceeds the precomputed primes table) with ~100MB memory. These tests don't directly verify the integer overflow fix (which requires >1.07B items to trigger) but add coverage for a previously untested code path. Co-authored-by: Copilot <[email protected]>
|
addressed |
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
…tionary test Remove the two custom 4M-element ExceedsPrimeTable tests (FrozenDictionary and FrozenSet) which were redundant with the existing 8M ulong dictionary test. Add a matching CreateHugeSet_Success test in FrozenSet_Generic_Tests_ulong to provide parallel FrozenSet coverage of the same large-collection code path. Co-authored-by: Copilot <[email protected]>
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Show resolved
Hide resolved
|
Should be ready to sign off. Failures are dead letter. |
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
|
@copilot look at last feedback and address if appropriate |
|
/ba-g dead letter, enough coverage |
|
@stephentoub I think this is correct as is as noted above. |
|
@copilot please invoke the code-review skill and post the analysis/results as a comment here |
🤖 Copilot Code Review — PR #125512Holistic AssessmentMotivation: The bug is real and verified — Approach: The Summary: ✅ LGTM. The three code changes are correct and well-motivated. The test is a reasonable compromise given the ~20-40GB memory requirement to exercise the actual overflow boundary. No new public API surface. Detailed Findings✅ Overflow fix in
|
|
@copilot can you create a new issue for the performance cliff discussed above in case we want to look at that at some point. |
|
I guess copilot here can't do that. I'll do it |
ToFrozenSet()/ToFrozenDictionary()throwsIndexOutOfRangeExceptionfor collections with more than ~1.07 billion items due to integer overflow inFrozenHashTable.Description
In
CalcNumBuckets,int minNumBuckets = uniqueCodesCount * 2silently overflows foruniqueCodesCount > int.MaxValue / 2, producing a negative value. This causes the prime-table search loop to skip entirely (minPrimeIndexInclusivestays 0), leavingmaxPrimeIndexExclusive = 0, and thenprimes[-1]throws.IndexOutOfRangeException(crash accessingprimes[-1])OutOfMemoryException(clear, intentional)Changes
CalcNumBuckets: UselongforminNumBucketsto prevent overflow. For inputs exceeding the prime table range, the existing early-return viaHashHelpers.GetPrime(uniqueCodesCount)now fires correctly.Create: Add anArray.MaxLengthguard beforeArrayPool.Rent(numBuckets + hashCodes.Length). Without it, the sum overflowsintfor ~1B+ item collections, producing a misleadingArgumentOutOfRangeException. Uses the same#if NET / #elsefallback pattern asLengthBuckets.cs.Tests: Added two
[OuterLoop]regression tests —ToFrozenSet_LargeSet_ExceedsPrimeTableandToFrozenDictionary_LargeDictionary_ExceedsPrimeTable— with 4M elements. These exercise theCalcNumBucketsearly-return path (whereuniqueCodesCount * 2exceeds the precomputed primes table) and are practical for CI (~100MB memory). The integer overflow fix boundary (>1.07B items) was verified locally on a 64GB machine — the exception changed fromIndexOutOfRangeExceptiontoOutOfMemoryException.Original prompt
This section details on the original issue you should resolve
<issue_title>Creating a FrozenSet with from huge HashSet throws an exception</issue_title>
<issue_description>### Describe the bug
Creating a FrozenSet with from HashSet throws an exception
To Reproduce
Exceptions (if any)
Further technical details
details of dotnet --info
.NET SDK: Version: 11.0.100-preview.3.26163.101 Commit: 1b989af698 Workload version: 11.0.100-manifests.b985b9e9 MSBuild version: 18.6.0-preview-26163-101+1b989af69
Runtime Environment:
OS Name: cachyos
OS Version:
OS Platform: Linux
RID: linux-x64
Base Path: /home/cert/.dotnet/sdk/11.0.100-preview.3.26163.101/
.NET workloads installed:
There are no installed workloads to display.
Configured to use workload sets when installing new manifests.
No workload sets are installed. Run "dotnet workload restore" to install a workload set.
Host:
Version: 11.0.0-preview.3.26163.101
Architecture: x64
Commit: 1b989af698
.NET SDKs installed:
8.0.419 [/home/cert/.dotnet/sdk]
9.0.312 [/home/cert/.dotnet/sdk]
10.0.300-preview.0.26163.113 [/home/cert/.dotnet/sdk]
11.0.100-preview.3.26163.101 [/home/cert/.dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
DOTNET_ROOT [/home/cert/.dotnet]
global.json file:
Not found
Tested on .NET 8, .NET 9, .NET 10 and .NET 11
full log
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.