Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/getting-started/concepts/feast-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Feast supports the following categories of data types:

- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`.
- **Domain-specific primitives**: `PdfBytes` (PDF binary data for RAG/document pipelines) and `ImageBytes` (image binary data for multimodal pipelines). These are semantic aliases over `Bytes` and must be explicitly declared in schema — no backend infers them.
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
- **UUID types**: `Uuid` and `TimeUuid` for universally unique identifiers. Stored as strings at the proto level but deserialized to `uuid.UUID` objects in Python.
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`, `Array(Uuid)`.
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. Set types are not inferred by any backend and must be explicitly declared. They are best suited for online serving use cases.
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
Expand Down
42 changes: 41 additions & 1 deletion docs/reference/type-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Feast supports the following data types:
| `Bytes` | `bytes` | Binary data |
| `Bool` | `bool` | Boolean value |
| `UnixTimestamp` | `datetime` | Unix timestamp (nullable) |
| `Uuid` | `uuid.UUID` | UUID (any version) |
| `TimeUuid` | `uuid.UUID` | Time-based UUID (version 1) |

### Domain-Specific Primitive Types

Expand Down Expand Up @@ -52,6 +54,8 @@ All primitive types have corresponding array (list) types:
| `Array(Bytes)` | `List[bytes]` | List of binary data |
| `Array(Bool)` | `List[bool]` | List of booleans |
| `Array(UnixTimestamp)` | `List[datetime]` | List of timestamps |
| `Array(Uuid)` | `List[uuid.UUID]` | List of UUIDs |
| `Array(TimeUuid)` | `List[uuid.UUID]` | List of time-based UUIDs |

### Set Types

Expand All @@ -67,6 +71,8 @@ All primitive types (except `Map` and `Json`) have corresponding set types for s
| `Set(Bytes)` | `Set[bytes]` | Set of unique binary data |
| `Set(Bool)` | `Set[bool]` | Set of unique booleans |
| `Set(UnixTimestamp)` | `Set[datetime]` | Set of unique timestamps |
| `Set(Uuid)` | `Set[uuid.UUID]` | Set of unique UUIDs |
| `Set(TimeUuid)` | `Set[uuid.UUID]` | Set of unique time-based UUIDs |

**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.

Expand Down Expand Up @@ -169,7 +175,7 @@ from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import (
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
Array, Set, Map, Json, Struct
Uuid, TimeUuid, Array, Set, Map, Json, Struct
)

# Define a data source
Expand Down Expand Up @@ -199,6 +205,8 @@ user_features = FeatureView(
Field(name="profile_picture", dtype=Bytes),
Field(name="is_active", dtype=Bool),
Field(name="last_login", dtype=UnixTimestamp),
Field(name="session_id", dtype=Uuid),
Field(name="event_id", dtype=TimeUuid),

# Array types
Field(name="daily_steps", dtype=Array(Int32)),
Expand All @@ -209,12 +217,16 @@ user_features = FeatureView(
Field(name="document_hashes", dtype=Array(Bytes)),
Field(name="notification_settings", dtype=Array(Bool)),
Field(name="login_timestamps", dtype=Array(UnixTimestamp)),
Field(name="related_session_ids", dtype=Array(Uuid)),
Field(name="event_chain", dtype=Array(TimeUuid)),

# Set types (unique values only — see backend caveats above)
Field(name="visited_pages", dtype=Set(String)),
Field(name="unique_categories", dtype=Set(Int32)),
Field(name="tag_ids", dtype=Set(Int64)),
Field(name="preferred_languages", dtype=Set(String)),
Field(name="unique_device_ids", dtype=Set(Uuid)),
Field(name="unique_event_ids", dtype=Set(TimeUuid)),

# Map types
Field(name="user_preferences", dtype=Map),
Expand Down Expand Up @@ -250,6 +262,34 @@ tag_list = [100, 200, 300, 100, 200]
tag_ids = set(tag_list) # {100, 200, 300}
```

### UUID Type Usage Examples

UUID types store universally unique identifiers natively, with support for both random UUIDs and time-based UUIDs:

```python
import uuid

# Random UUID (version 4) — use Uuid type
session_id = uuid.uuid4() # e.g., UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

# Time-based UUID (version 1) — use TimeUuid type
event_id = uuid.uuid1() # e.g., UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')

# UUID values are returned as uuid.UUID objects from get_online_features()
response = store.get_online_features(
features=["user_features:session_id"],
entity_rows=[{"user_id": 1}],
)
result = response.to_dict()
# result["session_id"][0] is a uuid.UUID object

# UUID lists
related_sessions = [uuid.uuid4(), uuid.uuid4(), uuid.uuid4()]

# UUID sets (unique values)
unique_devices = {uuid.uuid4(), uuid.uuid4()}
```

### Map Type Usage Examples

Maps can store complex nested data structures:
Expand Down
12 changes: 12 additions & 0 deletions protos/feast/types/Value.proto
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,12 @@ message ValueType {
JSON_LIST = 33;
STRUCT = 34;
STRUCT_LIST = 35;
UUID = 36;
TIME_UUID = 37;
UUID_LIST = 38;
TIME_UUID_LIST = 39;
UUID_SET = 40;
TIME_UUID_SET = 41;
}
}

Expand Down Expand Up @@ -96,6 +102,12 @@ message Value {
StringList json_list_val = 33;
Map struct_val = 34;
MapList struct_list_val = 35;
string uuid_val = 36;
string time_uuid_val = 37;
StringList uuid_list_val = 38;
StringList time_uuid_list_val = 39;
StringSet uuid_set_val = 40;
StringSet time_uuid_set_val = 41;
}
}

Expand Down
11 changes: 8 additions & 3 deletions sdk/python/feast/feature_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -2768,7 +2768,10 @@ def _doc_feature(x):
online_features_response=online_features_response,
data=requested_features_data,
)
return OnlineResponse(online_features_response)
feature_types = {
f.name: f.dtype.to_value_type() for f in requested_feature_view.features
}
return OnlineResponse(online_features_response, feature_types=feature_types)

def retrieve_online_documents_v2(
self,
Expand Down Expand Up @@ -3058,7 +3061,8 @@ def _retrieve_from_online_store_v2(
online_features_response.metadata.feature_names.val.extend(
features_to_request
)
return OnlineResponse(online_features_response)
feature_types = {f.name: f.dtype.to_value_type() for f in table.features}
return OnlineResponse(online_features_response, feature_types=feature_types)

table_entity_values, idxs, output_len = utils._get_unique_entities_from_values(
entity_key_dict,
Expand All @@ -3081,7 +3085,8 @@ def _retrieve_from_online_store_v2(
data=entity_key_dict,
)

return OnlineResponse(online_features_response)
feature_types = {f.name: f.dtype.to_value_type() for f in table.features}
return OnlineResponse(online_features_response, feature_types=feature_types)

def serve(
self,
Expand Down
30 changes: 28 additions & 2 deletions sdk/python/feast/infra/online_stores/online_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
from feast.protos.feast.types.Value_pb2 import Value as ValueProto
from feast.repo_config import RepoConfig
from feast.stream_feature_view import StreamFeatureView
from feast.value_type import ValueType


class OnlineStore(ABC):
Expand Down Expand Up @@ -236,18 +237,21 @@ def get_online_features(

track_online_store_read(_time.monotonic() - _read_start)

feature_types = self._build_feature_types(grouped_refs)

if requested_on_demand_feature_views:
utils._augment_response_with_on_demand_transforms(
online_features_response,
feature_refs,
requested_on_demand_feature_views,
full_feature_names,
feature_types=feature_types,
)

utils._drop_unneeded_columns(
online_features_response, requested_result_row_names
)
return OnlineResponse(online_features_response)
return OnlineResponse(online_features_response, feature_types=feature_types)

def _check_versioned_read_support(self, grouped_refs):
"""Raise an error if versioned reads are attempted on unsupported stores."""
Expand Down Expand Up @@ -367,18 +371,40 @@ async def query_table(table, requested_features):

track_online_store_read(_time.monotonic() - _read_start)

feature_types = self._build_feature_types(grouped_refs)

if requested_on_demand_feature_views:
utils._augment_response_with_on_demand_transforms(
online_features_response,
feature_refs,
requested_on_demand_feature_views,
full_feature_names,
feature_types=feature_types,
)

utils._drop_unneeded_columns(
online_features_response, requested_result_row_names
)
return OnlineResponse(online_features_response)
return OnlineResponse(online_features_response, feature_types=feature_types)

@staticmethod
def _build_feature_types(
grouped_refs: List,
) -> Dict[str, ValueType]:
"""Build a mapping of feature names to ValueType from grouped feature view refs.

Includes both bare names and prefixed names (feature_view__feature) so that
lookups succeed regardless of the full_feature_names setting.
"""
feature_types: Dict[str, ValueType] = {}
for table, requested_features in grouped_refs:
table_name = table.projection.name_to_use()
for field in table.features:
if field.name in requested_features:
vtype = field.dtype.to_value_type()
feature_types[field.name] = vtype
feature_types[f"{table_name}__{field.name}"] = vtype
return feature_types

@abstractmethod
def update(
Expand Down
27 changes: 24 additions & 3 deletions sdk/python/feast/infra/online_stores/remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.
import json
import logging
import uuid as uuid_module
from collections import defaultdict
from datetime import datetime
from typing import Any, Callable, Dict, List, Literal, Optional, Sequence, Tuple
Expand All @@ -38,6 +39,17 @@
logger = logging.getLogger(__name__)


def _json_safe(val: Any) -> Any:
"""Convert uuid.UUID objects and sets to JSON-serializable form."""
if isinstance(val, uuid_module.UUID):
return str(val)
if isinstance(val, set):
return [str(v) if isinstance(v, uuid_module.UUID) else v for v in val]
if isinstance(val, list):
return [str(v) if isinstance(v, uuid_module.UUID) else v for v in val]
return val


class RemoteOnlineStoreConfig(FeastConfigBaseModel):
"""Remote Online store config for remote online store"""

Expand Down Expand Up @@ -103,6 +115,16 @@ def _proto_value_to_transport_value(proto_value: ValueProto) -> Any:
if val_attr in ("map_list_val", "struct_list_val"):
return [json.dumps(v) for v in feast_value_type_to_python_type(proto_value)]

# UUID types are stored as strings in proto — return them directly
# to avoid feast_value_type_to_python_type converting to uuid.UUID
# objects which are not JSON-serializable.
if val_attr in ("uuid_val", "time_uuid_val"):
return getattr(proto_value, val_attr)
if val_attr in ("uuid_list_val", "time_uuid_list_val"):
return list(getattr(proto_value, val_attr).val)
if val_attr in ("uuid_set_val", "time_uuid_set_val"):
return list(getattr(proto_value, val_attr).val)

return feast_value_type_to_python_type(proto_value)

def online_write_batch(
Expand All @@ -128,9 +150,8 @@ def online_write_batch(
for join_key, entity_value_proto in zip(
entity_key_proto.join_keys, entity_key_proto.entity_values
):
columnar_data[join_key].append(
feast_value_type_to_python_type(entity_value_proto)
)
val = feast_value_type_to_python_type(entity_value_proto)
columnar_data[join_key].append(_json_safe(val))

# Populate feature values – use transport-safe conversion that
# preserves JSON strings instead of parsing them into dicts.
Expand Down
17 changes: 17 additions & 0 deletions sdk/python/feast/on_demand_feature_view.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import copy
import functools
import uuid
import warnings
from types import FunctionType
from typing import Any, List, Optional, Union, cast
Expand Down Expand Up @@ -1162,6 +1163,9 @@ def _get_sample_values_by_type(self) -> dict[ValueType, list[Any]]:
# Special binary types
ValueType.PDF_BYTES: [pdf_sample],
ValueType.IMAGE_BYTES: [image_sample],
# UUID types
ValueType.UUID: [uuid.uuid4()],
ValueType.TIME_UUID: [uuid.uuid1()],
# List types
ValueType.BYTES_LIST: [[b"hello world"]],
ValueType.STRING_LIST: [["hello world"]],
Expand All @@ -1171,6 +1175,19 @@ def _get_sample_values_by_type(self) -> dict[ValueType, list[Any]]:
ValueType.FLOAT_LIST: [[1.0]],
ValueType.BOOL_LIST: [[True]],
ValueType.UNIX_TIMESTAMP_LIST: [[_utc_now()]],
ValueType.UUID_LIST: [[uuid.uuid4(), uuid.uuid4()]],
ValueType.TIME_UUID_LIST: [[uuid.uuid1(), uuid.uuid1()]],
# Set types
ValueType.BYTES_SET: [{b"hello world", b"foo bar"}],
ValueType.STRING_SET: [{"hello world", "foo bar"}],
ValueType.INT32_SET: [{1, 2}],
ValueType.INT64_SET: [{1, 2}],
ValueType.DOUBLE_SET: [{1.0, 2.0}],
ValueType.FLOAT_SET: [{1.0, 2.0}],
ValueType.BOOL_SET: [{True, False}],
ValueType.UNIX_TIMESTAMP_SET: [{_utc_now()}],
ValueType.UUID_SET: [{uuid.uuid4(), uuid.uuid4()}],
ValueType.TIME_UUID_SET: [{uuid.uuid1(), uuid.uuid1()}],
}

@staticmethod
Expand Down
Loading
Loading