Skip to content

[dbsp] Make DevTweaks public so it will get documented.#5941

Merged
blp merged 2 commits intomainfrom
devtweaks
Apr 1, 2026
Merged

[dbsp] Make DevTweaks public so it will get documented.#5941
blp merged 2 commits intomainfrom
devtweaks

Conversation

@blp
Copy link
Copy Markdown
Member

@blp blp commented Mar 27, 2026

Describe Manual Test Plan

I ran the unit tests.

@blp blp requested review from anandbraman and gz March 27, 2026 22:24
@blp blp self-assigned this Mar 27, 2026
@blp blp added documentation Improvements or additions to documentation DBSP core Related to the core DBSP library API Distributed system APIs User-facing For PRs that lead to Feldera-user visible changes user-reported Reported by a user or customer labels Mar 27, 2026
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean extraction. The move to Option<T> fields with explicit accessors is the right approach for a public, documented struct.

use serde::{Deserialize, Serialize};
use utoipa::ToSchema;

/// Optional settings for tweaking Feldera internals.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to say something about how these are not guaranteed to be a stable API?

multihost: if runtime_config.hosts > 1
|| runtime_config.dev_tweaks.contains_key("multihost")
{
multihost: if runtime_config.hosts > 1 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one is gone?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an experiment that I used for testing while developing the multihost feature. I forgot about it. It's not needed anymore.

/// storage, such as object storage, but it could use excessive amounts of
/// memory if the number of keys fetched is very large.
#[serde(skip_serializing_if = "Option::is_none")]
pub fetch_join: Option<bool>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name does not suggest the same thing that the documentation describes, but it's probably too late to change that.

/// storage, such as object storage, but it could use excessive amounts of
/// memory if the number of keys fetched is very large.
#[serde(skip_serializing_if = "Option::is_none")]
pub fetch_distinct: Option<bool>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before

/// each step.
///
/// The default value is 10,000 records.
// TODO: splitter_chunk_size_bytes, per-operator chunk size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this TODO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this either. It is a copy of a @ryzhyk comment, so he might chime in.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO suggests that we support configuring chunk size in bytes vs records, and that we may want to make it configurable per operator. It's not part of the doc comment. I don't think this is something we need to worry about now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to:

    // TODO: It would be better if the value were denominated in bytes rather
    // than records, and if it were configurable per-operator.

#[serde(skip_serializing_if = "Option::is_none")]
pub adaptive_joins: Option<bool>,

/// The minimum relative improvement threshold for the join balancer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "adaptive" and "balanced" are the same thing, but the names of the fields do not make it clear. Perhaps the comments should.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need @ryzhyk's feedback on that too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this sentence to the doc comment:

"The join balancer is a component that dynamically chooses an optimal partitioning policy for adaptive join operators."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.adaptive_joins.unwrap_or(false)
}

pub fn balancer_min_relative_improvement_threshold(&self) -> f64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sometimes empty lines and sometimes no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake. I will fix it.

Copy link
Copy Markdown
Contributor

@anandbraman anandbraman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting!

@blp blp force-pushed the devtweaks branch 2 times, most recently from db740b8 to 62f881d Compare March 31, 2026 16:24
@blp blp enabled auto-merge March 31, 2026 16:24
Signed-off-by: feldera-bot <[email protected]>
@blp blp added this pull request to the merge queue Mar 31, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 31, 2026
@blp blp added this pull request to the merge queue Mar 31, 2026
Merged via the queue into main with commit 68ee6dd Apr 1, 2026
1 check passed
@blp blp deleted the devtweaks branch April 1, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Distributed system APIs DBSP core Related to the core DBSP library documentation Improvements or additions to documentation User-facing For PRs that lead to Feldera-user visible changes user-reported Reported by a user or customer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants