Skip to content

RemoteOfflineStore does not support SQL string as entity_df in get_historical_features() #6236

@Witich

Description

@Witich

RemoteOfflineStore does not support SQL string as entity_df in get_historical_features()

Expected Behavior

get_historical_features() should accept a SQL string as entity_df, as documented and supported by local offline stores (ClickHouse, PostgreSQL, BigQuery). The type signature in RemoteOfflineStore already declares Optional[Union[pd.DataFrame, str]].

entity_sql = f"""
    SELECT driver_id, event_timestamp
    FROM {store.get_data_source("driver_hourly_stats_source").get_table_query_string()}
    WHERE event_timestamp BETWEEN '2021-01-01' and '2021-12-31'
"""

training_df = store.get_historical_features(
    entity_df=entity_sql,
    features=["driver_hourly_stats:conv_rate"],
).to_df()

Current Behavior

Passing a SQL string as entity_df to RemoteOfflineStore raises:

AttributeError: 'str' object has no attribute 'columns'

Two functions in feast/infra/offline_stores/remote.py assume entity_df is always a DataFrame:

  1. _create_retrieval_metadata() (line 456) — calls _get_entity_schema(entity_df) which accesses entity_df.columns
  2. _put_parameters() (line 564) — calls pa.Table.from_pandas(entity_df)

Steps to reproduce

  1. Deploy Feast with a remote offline store (Arrow Flight) backed by any store that supports SQL entity_df (ClickHouse, PostgreSQL, etc.)
  2. Run from the client:
from feast import FeatureStore

store = FeatureStore(config=config)  # remote offline store

entity_sql = "SELECT id, event_timestamp FROM my_table WHERE event_timestamp > '2025-01-01'"
job = store.get_historical_features(entity_df=entity_sql, features=["my_fv:feature1"])
df = job.to_df()  # raises AttributeError

Specifications

  • Version: 0.61.0
  • Platform: Linux / macOS
  • Subsystem: feast.infra.offline_stores.remote (RemoteOfflineStore / Arrow Flight)

Possible Solution

Option A — pass SQL via api_parameters:

  • Client (RemoteOfflineStore.get_historical_features): if entity_df is a string, put it into api_parameters["entity_df_sql"] and pass entity_df=None to RemoteRetrievalJob
  • Server (OfflineServer.get_historical_features): if command contains entity_df_sql, forward it as entity_df to the local offline store

Option B — fix _create_retrieval_metadata and _put_parameters:

  • _create_retrieval_metadata: return metadata with empty keys/timestamps when entity_df is a string
  • _put_parameters: encode SQL string in a transport-compatible format (e.g., Flight descriptor command metadata)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions