Skip to content

fix: ODFV output projection in offline retrieval (#6099)#6140

Open
jyejare wants to merge 3 commits intofeast-dev:masterfrom
jyejare:fix/odfv-output-projection-6099
Open

fix: ODFV output projection in offline retrieval (#6099)#6140
jyejare wants to merge 3 commits intofeast-dev:masterfrom
jyejare:fix/odfv-output-projection-6099

Conversation

@jyejare
Copy link
Collaborator

@jyejare jyejare commented Mar 23, 2026

Summary

Fixes #6099 - Ensures offline retrieval honors ODFV feature projection, matching online retrieval behavior.

Problem

When requesting a subset of features from an OnDemandFeatureView:

  • Online retrieval ✅ Returns only requested features
  • Offline retrieval ❌ Returns ALL ODFV output features (before this fix)

This caused schema mismatches between training and serving pipelines.

Solution

Modified RetrievalJob.to_arrow() in offline_store.py to:

  1. Parse requested features from metadata.features
  2. Build a mapping of ODFV name → requested feature names
  3. Filter ODFV transformation output to only include requested columns

Example

Before this fix:

features = ["my_odfv:feature_a"]
offline_result = store.get_historical_features(features=features, ...)
# Columns: driver_id, event_timestamp, feature_a, feature_b, feature_c ❌

After this fix:

features = ["my_odfv:feature_a"]
offline_result = store.get_historical_features(features=features, ...)
# Columns: driver_id, event_timestamp, feature_a ✅

Changes

Modified: sdk/python/feast/infra/offline_stores/offline_store.py

  • Updated RetrievalJob.to_arrow() method (lines 140-184)
  • Added filtering logic for ODFV output projection
  • Maintains backward compatibility

Added: Test in sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py

  • test_odfv_projection() - Comprehensive test verifying:
    • Single feature request returns only that feature
    • Multiple feature request returns only requested features
    • Unrequested features are NOT included
    • Offline and online retrieval have consistent behavior
  • Parametrized for both full_feature_names=True and False

Testing

The new test test_odfv_projection verifies:

  1. ✅ Requesting 1 out of 3 ODFV features → returns only that 1 feature
  2. ✅ Requesting 2 out of 3 ODFV features → returns only those 2 features
  3. ✅ Unrequested features are NOT included in the result
  4. ✅ Offline and online retrieval return consistent schemas

Backward Compatibility

  • ✅ Falls back to old behavior if metadata is unavailable
  • ✅ No breaking changes to existing functionality
  • ✅ Only affects ODFV feature projection

Impact

This fix ensures:

  • ✅ Consistent behavior between online and offline retrieval
  • ✅ No schema mismatches in ML pipelines
  • ✅ More efficient - doesn't compute/return unnecessary features
  • ✅ Matches user expectations - returns exactly what was requested

Open with Devin

@jyejare jyejare requested review from a team as code owners March 23, 2026 08:28
@jyejare jyejare requested review from dmartinol, ejscribner and shuchu and removed request for a team March 23, 2026 08:28
@jyejare jyejare changed the title Fix ODFV output projection in offline retrieval (#6099) fix: ODFV output projection in offline retrieval (#6099) Mar 23, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare jyejare marked this pull request as draft March 23, 2026 09:14

if metadata and metadata.features:
for feature_ref in metadata.features:
if ":" in feature_ref:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be brittle after my feature view version PR lands as feature references will now support @vN syntax.

Ambient Code Bot added 3 commits March 23, 2026 20:40
This commit fixes issue feast-dev#6099 where offline retrieval (get_historical_features)
was returning ALL OnDemandFeatureView output features, even when only a subset
was requested, while online retrieval correctly returned only requested features.

Changes:
- Modified RetrievalJob.to_arrow() to filter ODFV outputs based on requested
  features from metadata, matching online retrieval behavior
- Added test_odfv_projection to verify the fix and prevent regression

Before this fix:
- Online: features=['odfv:feature_a'] -> returns feature_a only ✓
- Offline: features=['odfv:feature_a'] -> returns feature_a, feature_b, feature_c ✗

After this fix:
- Both online and offline return only the requested features ✓

This ensures schema consistency between training (offline) and serving (online)
pipelines, preventing downstream issues in ML workflows.

Fixes feast-dev#6099
- Fix empty list edge case: Use explicit dict key check instead of 'or'
  operator to avoid treating empty sets as falsy
- Use sets instead of lists for requested features to prevent duplicates
  and improve lookup performance (O(1) instead of O(n))
Some RetrievalJob implementations don't implement the metadata property
and raise NotImplementedError. Wrap metadata access in try-except to
gracefully handle this case and maintain backward compatibility.

Fixes CI test failure in test_retrieval_job_dataframe.py
@jyejare jyejare force-pushed the fix/odfv-output-projection-6099 branch from 6dc5107 to a6bbfda Compare March 23, 2026 15:10
@jyejare jyejare marked this pull request as ready for review March 23, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_historical_features returns all ODFV output columns even when a single ODFV feature is requested

2 participants