feat: Making feast vector store with open ai search api compatible#6121
feat: Making feast vector store with open ai search api compatible#6121patelchaitany wants to merge 2 commits intofeast-dev:masterfrom
Conversation
e45f167 to
c8392a9
Compare
974d688 to
639a87e
Compare
Signed-off-by: Chaitany patel <[email protected]>
Signed-off-by: Chaitany patel <[email protected]>
7e8adfb to
3f541ad
Compare
| if requested_features is None: | ||
| requested_features = [] | ||
| if "distance" not in requested_features: | ||
| requested_features.append("distance") |
There was a problem hiding this comment.
🔴 RemoteOnlineStore mutates caller's requested_features list, causing duplicate 'distance' in response
In RemoteOnlineStore.retrieve_online_documents_v2, lines 351-354 mutate the requested_features parameter in-place by appending "distance". This list is the same object passed by reference from feature_store.py:_retrieve_from_online_store_v2 (line 3066). After the remote store call returns, feature_store.py:3089 builds features_to_request = requested_features + ["distance"], which now produces a list with "distance" appearing twice (since it was already appended). This causes _populate_response_from_feature_data at feature_store.py:3121-3128 to add a duplicate "distance" entry in the response metadata and results, producing a malformed OnlineResponse when using the remote online store.
| if requested_features is None: | |
| requested_features = [] | |
| if "distance" not in requested_features: | |
| requested_features.append("distance") | |
| if requested_features is None: | |
| requested_features = [] | |
| requested_features = list(requested_features) | |
| if "distance" not in requested_features: | |
| requested_features.append("distance") |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
the request feature is modified after getting response as it is not returning the distance field in the response of the remoteonlinestore
| for table in tables_to_keep: | ||
| conn.execute( | ||
| f"CREATE TABLE IF NOT EXISTS {_table_id(project, table)} (entity_key BLOB, feature_name TEXT, value BLOB, vector_value BLOB, event_ts timestamp, created_ts timestamp, PRIMARY KEY(entity_key, feature_name))" | ||
| f"CREATE TABLE IF NOT EXISTS {_table_id(project, table)} (entity_key BLOB, feature_name TEXT, value BLOB, value_text TEXT, value_num REAL, vector_value BLOB, event_ts timestamp, created_ts timestamp, PRIMARY KEY(entity_key, feature_name))" | ||
| ) | ||
| conn.execute( | ||
| f"CREATE INDEX IF NOT EXISTS {_table_id(project, table)}_ek ON {_table_id(project, table)} (entity_key);" |
There was a problem hiding this comment.
🔴 SQLite store missing ALTER TABLE migration for new value_text and value_num columns
The SQLite store's update method (line 333) and SqliteTable.update (line 917) use CREATE TABLE IF NOT EXISTS with the new value_text TEXT and value_num REAL columns. For existing tables created before this PR (which only had entity_key, feature_name, value, vector_value, event_ts, created_ts), CREATE TABLE IF NOT EXISTS is a no-op and won't add the new columns. The online_write_batch method (lines 222, 246) now references these columns in INSERT statements, which will fail with a SQLite error on any pre-existing table. The Postgres store correctly handles this with an explicit ALTER TABLE ADD COLUMN IF NOT EXISTS at postgres.py:371-375, but the SQLite store has no equivalent migration.
(Refers to lines 331-337)
Prompt for agents
In sdk/python/feast/infra/online_stores/sqlite.py, the update() method at line 331-337 needs to add ALTER TABLE migration logic after the CREATE TABLE IF NOT EXISTS statement. After line 337, add SQLite ALTER TABLE statements to add the missing columns for existing tables:
try:
conn.execute(f"ALTER TABLE {_table_id(project, table)} ADD COLUMN value_text TEXT")
except Exception:
pass # Column already exists
try:
conn.execute(f"ALTER TABLE {_table_id(project, table)} ADD COLUMN value_num REAL")
except Exception:
pass # Column already exists
SQLite doesn't support IF NOT EXISTS on ALTER TABLE ADD COLUMN, so a try/except is the standard pattern. Similarly, add the same migration logic in the SqliteTable.update() method around line 930.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if op_type == "eq": | ||
| return f"{key} == {_milvus_fmt(value)}" | ||
| elif op_type == "ne": | ||
| return f"{key} != {_milvus_fmt(value)}" | ||
| elif op_type in milvus_ops: | ||
| return f"{key} {milvus_ops[op_type]} {_milvus_fmt(value)}" | ||
| elif op_type in ("in", "nin"): | ||
| if not isinstance(value, list): | ||
| raise ValueError( | ||
| f"'{op_type}' filter requires a list value, got {type(value)}" | ||
| ) | ||
| formatted = [_milvus_fmt(v) for v in value] | ||
| kw = "not in" if op_type == "nin" else "in" | ||
| return f"{key} {kw} [{', '.join(formatted)}]" |
There was a problem hiding this comment.
🔴 Milvus filter key is interpolated into expression string without sanitization, enabling expression injection
In _translate_comparison_filter, the user-controlled filter_obj.key is directly interpolated into Milvus boolean expression strings (e.g., f"{key} == {_milvus_fmt(value)}" at line 565). While _milvus_fmt escapes the value, the key is not sanitized at all. A malicious filter key like category == 'x' or valid_field would produce category == 'x' or valid_field == 'value', effectively bypassing the intended filter logic. The Postgres and SQLite stores avoid this by using parameterized queries for the key (feature_name = %s / feature_name = ?), but the Milvus store's string-based expression API makes it vulnerable.
Prompt for agents
In sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py, the _translate_comparison_filter method (lines 551-578) must sanitize the filter key before interpolating it into the Milvus expression string. Add a validation step at the start of the method that checks the key contains only alphanumeric characters and underscores (matching valid Milvus field names), and raises a ValueError otherwise. For example:
import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', key):
raise ValueError(f"Invalid filter key: {key!r}. Keys must be valid field identifiers.")
This prevents expression injection via crafted key names while still allowing all legitimate field names.
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
This PR making the feast vector store api with open ai search api compatible so.
the current changes are creating an new rest api end point which is compatible with open ai search api and also it include an extra field in the metadata named features_to_retrieve(allows to retrieve specific feature) and content_field (which field include the main content or Chunk document)
what things are missing :
Which issue(s) this PR fixes:
#5615
Misc