Skip to content

fix: ODFV output projection in offline retrieval (#6099)#6140

Open
jyejare wants to merge 1 commit intofeast-dev:masterfrom
jyejare:fix/odfv-output-projection-6099
Open

fix: ODFV output projection in offline retrieval (#6099)#6140
jyejare wants to merge 1 commit intofeast-dev:masterfrom
jyejare:fix/odfv-output-projection-6099

Conversation

@jyejare
Copy link
Collaborator

@jyejare jyejare commented Mar 23, 2026

Summary

Fixes #6099 - Ensures offline retrieval honors ODFV feature projection, matching online retrieval behavior.

Problem

When requesting a subset of features from an OnDemandFeatureView:

  • Online retrieval ✅ Returns only requested features
  • Offline retrieval ❌ Returns ALL ODFV output features (before this fix)

This caused schema mismatches between training and serving pipelines.

Solution

Modified RetrievalJob.to_arrow() in offline_store.py to:

  1. Parse requested features from metadata.features
  2. Build a mapping of ODFV name → requested feature names
  3. Filter ODFV transformation output to only include requested columns

Example

Before this fix:

features = ["my_odfv:feature_a"]
offline_result = store.get_historical_features(features=features, ...)
# Columns: driver_id, event_timestamp, feature_a, feature_b, feature_c ❌

After this fix:

features = ["my_odfv:feature_a"]
offline_result = store.get_historical_features(features=features, ...)
# Columns: driver_id, event_timestamp, feature_a ✅

Changes

Modified: sdk/python/feast/infra/offline_stores/offline_store.py

  • Updated RetrievalJob.to_arrow() method (lines 140-184)
  • Added filtering logic for ODFV output projection
  • Maintains backward compatibility

Added: Test in sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py

  • test_odfv_projection() - Comprehensive test verifying:
    • Single feature request returns only that feature
    • Multiple feature request returns only requested features
    • Unrequested features are NOT included
    • Offline and online retrieval have consistent behavior
  • Parametrized for both full_feature_names=True and False

Testing

The new test test_odfv_projection verifies:

  1. ✅ Requesting 1 out of 3 ODFV features → returns only that 1 feature
  2. ✅ Requesting 2 out of 3 ODFV features → returns only those 2 features
  3. ✅ Unrequested features are NOT included in the result
  4. ✅ Offline and online retrieval return consistent schemas

Backward Compatibility

  • ✅ Falls back to old behavior if metadata is unavailable
  • ✅ No breaking changes to existing functionality
  • ✅ Only affects ODFV feature projection

Impact

This fix ensures:

  • ✅ Consistent behavior between online and offline retrieval
  • ✅ No schema mismatches in ML pipelines
  • ✅ More efficient - doesn't compute/return unnecessary features
  • ✅ Matches user expectations - returns exactly what was requested

Open with Devin

@jyejare jyejare requested review from a team as code owners March 23, 2026 08:28
@jyejare jyejare requested review from dmartinol, ejscribner and shuchu and removed request for a team March 23, 2026 08:28
@jyejare jyejare changed the title Fix ODFV output projection in offline retrieval (#6099) fix: ODFV output projection in offline retrieval (#6099) Mar 23, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare jyejare marked this pull request as draft March 23, 2026 09:14

if metadata and metadata.features:
for feature_ref in metadata.features:
if ":" in feature_ref:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be brittle after my feature view version PR lands as feature references will now support @vN syntax.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo How would it look like once implemented so that I can make it future proof OR would you like to handle in your PR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see! It may look like driver_stats@v2:trips_today but I dont see this syntax will break identifying view_name and feature_name. Because separator would still remain same :.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyejare jyejare force-pushed the fix/odfv-output-projection-6099 branch from 6dc5107 to a6bbfda Compare March 23, 2026 15:10
@jyejare jyejare marked this pull request as ready for review March 23, 2026 15:10
Changes:
- Modified RetrievalJob.to_arrow() to filter ODFV outputs based on requested
  features from metadata, matching online retrieval behavior
- Added test_odfv_projection to verify the fix and prevent regression

Before this fix:
- Online: features=['odfv:feature_a'] -> returns feature_a only ✓
- Offline: features=['odfv:feature_a'] -> returns feature_a, feature_b, feature_c ✗

After this fix:
- Both online and offline return only the requested features ✓

This ensures schema consistency between training (offline) and serving (online)
pipelines, preventing downstream issues in ML workflows.

Fixes feast-dev#6099

Signed-off-by: Jitendra Yejare <[email protected]>
@jyejare jyejare force-pushed the fix/odfv-output-projection-6099 branch from a6bbfda to 266a37d Compare March 26, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_historical_features returns all ODFV output columns even when a single ODFV feature is requested

2 participants