Skip to content

[dbsp] Promote batches with negative weights more aggressively.#5951

Merged
ryzhyk merged 3 commits intomainfrom
zero-weight-cleanup
Mar 30, 2026
Merged

[dbsp] Promote batches with negative weights more aggressively.#5951
ryzhyk merged 3 commits intomainfrom
zero-weight-cleanup

Conversation

@ryzhyk
Copy link
Copy Markdown
Contributor

@ryzhyk ryzhyk commented Mar 30, 2026

Merge batches with many negative weights more aggressively. Negative updates
are likely to cancel out during merging. Every time this happens, two records
are eliminated from the spine, speeding up lookups and reducing the spine's
storage footprint. Furthermore, this avoids performance anomalies where an
operator spends a lot of time skipping over 0-weight records.

We implement this heuristic by accounting each record with a negative weight as
N records when calculating the batch's effective length. This should push such
batches to the next level more eagerly, until they get merged with batches that
contain matching positive-weight records.

The value of N is controlled by a new dev tweak negative_weight_multiplier.
It is set to 0 by default, which means that retractions are not given any
additional weight.

This table shows how the value of negative_weight_multiplier affects the window_min
benchmark, which represents the pathological case of 0-weight records affecting
lookups.

| multiplier| time(s)      |
|-----------|--------------|
| 0 		| 56.990349017 |
| 1 		| 34.751515002 |
| 2 		| 16.627565868 |
| 4 		| 18.612214789 |
| 6 		| 23.141154003 |
| 8 		| 30.764389275 |

@ryzhyk ryzhyk requested a review from blp March 30, 2026 07:02
@ryzhyk ryzhyk added DBSP core Related to the core DBSP library performance labels Mar 30, 2026
@ryzhyk ryzhyk marked this pull request as ready for review March 30, 2026 07:29
Comment on lines +867 to +872
fn update_stats(&mut self, weight: &R) {
if TypeId::of::<R>() == TypeId::of::<DynZWeight>()
&& unsafe { *weight.downcast::<ZWeight>() } < 0
{
self.stats.negative_weight_count += 1;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a sad special case. Maybe we should generalize Weight enough to allow testing for positive/negative as well as zero.

Comment on lines +197 to +198
8..=128,
8..=128,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this because ListMerger has a hard limit of 64 batches at a time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't meant to be in this PR. But I actually wanted to discuss this with you. We no longer limit merge size to 64 (since we don't rely on BitSets in the merger), so I was thinking of increasing the limit for at least level0-level1 batches.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's fine then.

You'll need to delete the ListMerger assertion then:

    pub fn new(factories: &B::Factories, cursors: Vec<C>) -> Self {
        // [IndexSet] supports a maximum of 64 batches.
        assert!(cursors.len() <= 64);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, already discovered these asserts. But I still won't mix this change into this PR.

ryzhyk added 2 commits March 30, 2026 10:52
Benchmark that simulates a pathological scenario where a window operator is
followed by an aggregate that computes min over the timestamp column. As the
window operator evicts tuples on the left side of the window, 0-weight tuples
accumulate in the trace, slowing down the skip_zero_weight_vals_forward
operation.  Effectively the O(1) min computation becomes O(n), where n is the
number of zero-weight tuples in the trace.

Signed-off-by: Leonid Ryzhyk <[email protected]>
Merge batches with many negative weights more aggressively. Negative updates
are likely to cancel out during merging. Every time this happens, two records
are eliminated from the spine, speeding up lookups and reducing the spine's
storage footprint. Furthermore, this avoids performance anomalies where an
operator spends a lot of time skipping over 0-weight records.

We implement this heuristic by accounting each record with a negative weight as
N records when calculating the batch's effective length. This should push such
batches to the next level more eagerly, until they get merged with batches that
contain matching positive-weight records.

The value of N is controlled by a new dev tweak `negative_weight_multiplier`.
It is set to 0 by default, which means that retractions are not given any
additional weight.

This table shows how the value of negative_weight_multiplier affects the window_min
benchmark, which represents the pathological case of 0-weight records affecting
lookups.

| multiplier 	| time(s)      |
|--------------:|-------------:|
| 0 		| 56.990349017 |
| 1 		| 34.751515002 |
| 2 		| 16.627565868 |
| 4 		| 18.612214789 |
| 6 		| 23.141154003 |
| 8 		| 30.764389275 |

Signed-off-by: Leonid Ryzhyk <[email protected]>
@ryzhyk ryzhyk force-pushed the zero-weight-cleanup branch from 13c892e to d6af78b Compare March 30, 2026 17:53
@ryzhyk ryzhyk enabled auto-merge March 30, 2026 17:59
@ryzhyk ryzhyk added this pull request to the merge queue Mar 30, 2026
@ryzhyk ryzhyk removed this pull request from the merge queue due to a manual request Mar 30, 2026
Make sure that deserializing a file written by a version that doesn't support
batch metadata, is read with default metadata values (specifically zero-weight
count is 0)

Signed-off-by: Leonid Ryzhyk <[email protected]>
@ryzhyk ryzhyk enabled auto-merge March 30, 2026 18:35
@ryzhyk ryzhyk added this pull request to the merge queue Mar 30, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — default means no behavior change until explicitly tuned. The benchmark data in the PR description makes a compelling case for the approach.

Merged via the queue into main with commit 1984c3a Mar 30, 2026
1 check passed
@ryzhyk ryzhyk deleted the zero-weight-cleanup branch March 30, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants