Skip to content

feat: session-first DataFusion integration + session resolution policies#4145

Merged
roeap merged 9 commits intodelta-io:mainfrom
ethan-tyler:feat/datafusion-session-resolution-4081-4139
Jan 30, 2026
Merged

feat: session-first DataFusion integration + session resolution policies#4145
roeap merged 9 commits intodelta-io:mainfrom
ethan-tyler:feat/datafusion-session-resolution-4081-4139

Conversation

@ethan-tyler
Copy link
Copy Markdown
Collaborator

Description

Make delta-rs' DataFusion integration consistently honor caller provided sessions and introduce a session first API for registering Delta object stores.

Changes:

Session resolution:

  • Add resolve_session_state(...) & SessionFallbackPolicy (InternalDefaults / DeriveFromTrait / RequireSessionState)
  • Builders expose with_session_fallback_policy(...) to control strictness
  • Migrate operations to use resolver (optimize/merge/update/write/delete)

Session first registration:

  • Add DeltaSessionExt trait:
    • ensure_object_store_registered(...)
    • ensure_log_store_registered(...)
  • Deprecate DeltaTable::update_datafusion_session(...) (shim kept for compatibility)

Predicate parsing:

  • Non-SessionState sessions can preserve UDFs when configured via DeriveFromTrait

Compatibility:

  • Default is backward compatible: SessionFallbackPolicy::InternalDefaults warns but doesn't break
  • Strict mode available: with_session_fallback_policy(RequireSessionState) errors instead of falling back
  • DeltaTable::update_datafusion_session remains but is deprecated

Tests:

  • Regression tests for fallback policy wiring across builders (RequireSessionState path)
  • Existing deltalake-core DataFusion test suite passes with --features datafusion

Related Issue(s)

Addresses:

Documentation

Follow ups:

  • Flip default to RequireSessionState (breaking change)
  • Remove deprecated DeltaTable::update_datafusion_session after deprecation window

@github-actions github-actions Bot added the binding/rust Issues for the Rust crate label Jan 29, 2026
@ethan-tyler ethan-tyler marked this pull request as ready for review January 29, 2026 17:26
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 85.78089% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.35%. Comparing base (c095787) to head (7eb0319).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/test_utils/datafusion.rs 54.87% 37 Missing ⚠️
crates/core/src/delta_datafusion/session.rs 94.68% 10 Missing and 1 partial ⚠️
crates/core/src/delta_datafusion/expr.rs 91.42% 2 Missing and 1 partial ⚠️
crates/core/src/operations/write/mod.rs 86.36% 0 Missing and 3 partials ⚠️
crates/core/src/operations/delete.rs 90.00% 0 Missing and 2 partials ⚠️
crates/core/src/operations/update.rs 90.00% 0 Missing and 2 partials ⚠️
crates/core/src/operations/constraints.rs 66.66% 0 Missing and 1 partial ⚠️
crates/core/src/operations/load_cdf.rs 0.00% 0 Missing and 1 partial ⚠️
crates/core/src/operations/merge/mod.rs 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4145      +/-   ##
==========================================
+ Coverage   76.24%   76.35%   +0.11%     
==========================================
  Files         164      165       +1     
  Lines       45807    46151     +344     
  Branches    45807    46151     +344     
==========================================
+ Hits        34924    35239     +315     
- Misses       9210     9234      +24     
- Partials     1673     1678       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hntd187
Copy link
Copy Markdown
Collaborator

hntd187 commented Jan 29, 2026

I like the idea behind this, very cool. @roeap weren't we going to eventually do something with the datafusion session such that we might not need to do this much resolution on it?

Copy link
Copy Markdown
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flushing some initial comments.

Really looking forward to getting our sessions straight, and this is a big step!

While writing #4138 I started wondering about something more fundamental. I.e. where should we depend on Session and where can we use SessionState and SessionContext?

IF we expose an API that can consume DataFrame, then we will also always have SessionState in our dependencies.

Folks who implement their own session would probably(?) not go through the top level api, but rather build plans using the lower level stuff like TableProvider.

If we end up in a place where our builders can produce logical plans they are then free to execute that using their session and we should not.

To be clear, we should still push put SessionState from the operations!

Just some thoughts that came up :).

Comment thread crates/core/src/delta_datafusion/session.rs Outdated
operation_id: Option<Uuid>,
) -> DeltaResult<()> {
let url = log_store.root_url().as_object_store_url();
if self.runtime_env().object_store(&url).is_err() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to drop the ?Sized constraint by ...

Suggested change
if self.runtime_env().object_store(&url).is_err() {
if T::runtime_env(self).object_store(&url).is_err() {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension needs to work on &dyn Session (e.g. DeltaTable::update_datafusion_session), impl must support unsized receivers. UFCS is fine but doesn't remove the ?Sized requirement unless we add impl DeltaSessionExt for dyn DataFusionSession or change the shim to take concrete SessionState/SessionContext.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just keep it like this :)

Comment thread crates/core/src/delta_datafusion/session.rs
Comment thread crates/core/src/delta_datafusion/session.rs Outdated
Copy link
Copy Markdown
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming along really nicely!

Just a few minor comments to maybe look at.

Comment thread crates/core/src/delta_datafusion/expr.rs Outdated
let object_store_url = log_store.object_store_url();
if self.runtime_env().object_store(&object_store_url).is_err() {
self.runtime_env()
.register_object_store(object_store_url.as_ref(), log_store.object_store(None));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mention, the object_store method returns a pre-fixed object store that points to a path inside the root store and we create this cryptic URL. However this will never work properly if we have delta logs with fully qualified URLs (e.g. shallow clones). So I personally consider this a deprecated behaviour.

Not sure how to best do this, but I believe we should either directly deprecate this method (mark it as migration helper) or keep register_store around and deprecate that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only non-migrated operations should rely on this behavior. The new table provider does not make use of these internal / special.

Not sure if this already landed, but in my more experimental code, we are building the log store from the session (actually TaskContext) and pass it in our code. deep in the log handling, a scoped store is still convenient. so i.e.

session + table__url = log_store

Comment thread crates/core/src/delta_datafusion/session.rs Outdated
/// internal defaults. To make this strict (error instead), set
/// `with_session_fallback_policy(SessionFallbackPolicy::RequireSessionState)`.
///
/// Example: `Arc::new(create_session().state())`.
Copy link
Copy Markdown
Collaborator

@roeap roeap Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing this made me wonder if create_session should just return the DF session? Do we even invoke any method from our session wrapper ever? I feel like we almost always immediately convert this to a DF session?

@roeap roeap merged commit 021cc3b into delta-io:main Jan 30, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants