Skip to content

Fix infinite loop in GetPreExistingChunksIdsAsync when records exceed MaxTopCount#7311

Merged
adamsitnik merged 3 commits intomainfrom
copilot/fix-getpreexistingchunksidsasync-bugs
Feb 16, 2026
Merged

Fix infinite loop in GetPreExistingChunksIdsAsync when records exceed MaxTopCount#7311
adamsitnik merged 3 commits intomainfrom
copilot/fix-getpreexistingchunksidsasync-bugs

Conversation

Copy link
Contributor

Copilot AI commented Feb 16, 2026

GetPreExistingChunksIdsAsync would infinitely loop when a document had more pre-existing chunks than MaxTopCount (1,000), repeatedly fetching and adding the same records without pagination.

Changes

  • Add Skip parameter: Pass options: new() { Skip = keys.Count } to GetAsync to properly paginate through results
  • Test coverage: Add IncrementalIngestion_WithManyRecords_DeletesAllPreExistingChunks that creates 2500 chunks to verify pagination across multiple batches
// Before: fetched same records repeatedly
await foreach (var record in _vectorStoreCollection!.GetAsync(
    filter: record => (string)record[DocumentIdName]! == document.Identifier,
    top: MaxTopCount,
    cancellationToken: cancellationToken))

// After: properly skips already-fetched records
await foreach (var record in _vectorStoreCollection!.GetAsync(
    filter: record => (string)record[DocumentIdName]! == document.Identifier,
    top: MaxTopCount,
    options: new() { Skip = keys.Count },
    cancellationToken: cancellationToken))
Original prompt

This section details on the original issue you should resolve

<issue_title>[MEDI] Bugs in GetPreExistingChunksIdsAsync</issue_title>
<issue_description>The way GetPreExistingChunksIdsAsync is currently implemented:

int insertedCount;
do
{
insertedCount = 0;
await foreach (var record in _vectorStoreCollection!.GetAsync(
filter: record => (string)record[DocumentIdName]! == document.Identifier,
top: MaxTopCount,
cancellationToken: cancellationToken).ConfigureAwait(false))
{
keys.Add(record[KeyName]!);
insertedCount++;
}
}
while (insertedCount == MaxTopCount);

We have two bugs:

  • in case there are more matching records than MaxTopCount, we are going to keep fetching and adding the same records over and over. We need to provide the Skip parameter:
await foreach (var record in _vectorStoreCollection!.GetAsync(
    filter: record => (string)record[DocumentIdName]! == document.Identifier,
    top: MaxTopCount,
    options: new() { Skip = insertedCount }, // THE FIX
    cancellationToken: cancellationToken).ConfigureAwait(false))
  • the loop counter should not be reset at every iteration, otherwise we might end up getting an endless loop

We need a test that reproduces the problem. If preparing MaxTopCount-many records is too complex, we can use #if !RELEASE and set the value to low number in order to make testing easier in Debug builds (our CI always runs them)

        const int MaxTopCount =
#if RELEASE
            1_000;
#else
            10; // Use smaller batch size in debug to be able to test the looping logic without needing to insert a lot of records.
#endif

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Fix bugs in GetPreExistingChunksIdsAsync implementation Fix infinite loop in GetPreExistingChunksIdsAsync when records exceed MaxTopCount Feb 16, 2026
Copilot AI requested a review from adamsitnik February 16, 2026 11:30
@adamsitnik adamsitnik requested a review from roji February 16, 2026 12:31
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too but see testing nit

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamsitnik adamsitnik marked this pull request as ready for review February 16, 2026 15:08
Copilot AI review requested due to automatic review settings February 16, 2026 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an infinite loop bug in GetPreExistingChunksIdsAsync that occurred when a document had more than 1,000 (MaxTopCount) pre-existing chunks. The method was repeatedly fetching the same records without proper pagination.

Changes:

  • Add Skip = keys.Count parameter to GetAsync call to properly paginate through results
  • Add comprehensive test with 2,500 chunks to verify pagination across multiple batches works correctly

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs Adds options: new() { Skip = keys.Count } parameter to GetAsync call to properly skip already-fetched records during pagination
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Writers/VectorStoreWriterTests.cs Adds test IncrementalIngestion_WithManyRecords_DeletesAllPreExistingChunks that creates 2500 chunks to verify pagination works correctly when records exceed MaxTopCount of 1000

@adamsitnik adamsitnik closed this Feb 16, 2026
@adamsitnik adamsitnik reopened this Feb 16, 2026
@adamsitnik adamsitnik enabled auto-merge (squash) February 16, 2026 15:30
@adamsitnik adamsitnik merged commit a7a967c into main Feb 16, 2026
12 checks passed
@adamsitnik adamsitnik deleted the copilot/fix-getpreexistingchunksidsasync-bugs branch February 16, 2026 16:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MEDI] Bugs in GetPreExistingChunksIdsAsync

4 participants