feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132
Conversation
|
@soooojinlee , this looks good. Just need to respond to Devin feedback and update branch and I can approve. |
58dff41 to
c1ce5d8
Compare
|
Addressed Devin review feedback and also removed the 2-level depth restriction for nested collection types — now supports unbounded nesting depth as discussed in #5947. |
@soooojinlee , thanks so much for working on this. My question is if unbounded, why not just implement like map. Reduce to just |
@nquinn408 thanks for good feedback! I agree. the combinatorial approach doesn't scale well. map-like recursive approach makes much more sense. 👍 |
…e VALUE_LIST/VALUE_SET Replace 4 combinatorial enum values (LIST_LIST=36, LIST_SET=37, SET_LIST=38, SET_SET=39) with 2 recursive enum values (VALUE_LIST=40, VALUE_SET=41) that use RepeatedValue to enable unlimited nesting depth. This is a breaking change for an unreleased feature, as suggested in PR feast-dev#6132 review. Key changes: - Proto: Remove 4 enum/oneof fields, add VALUE_LIST/VALUE_SET with reserved 36-39 - Python: Update ValueType enum, type system, serialization, field persistence - JSON: Update proto_json encode/decode for new field names - Tests: Rewrite all nested collection tests (204 tests passing) - Docs: Update type-system.md for recursive design Co-Authored-By: Claude Opus 4.6 <[email protected]>
…e VALUE_LIST/VALUE_SET Replace 4 combinatorial enum values (LIST_LIST=36, LIST_SET=37, SET_LIST=38, SET_SET=39) with 2 recursive enum values (VALUE_LIST=40, VALUE_SET=41) that use RepeatedValue to enable unlimited nesting depth. This is a breaking change for an unreleased feature, as suggested in PR feast-dev#6132 review. Key changes: - Proto: Remove 4 enum/oneof fields, add VALUE_LIST/VALUE_SET with reserved 36-39 - Python: Update ValueType enum, type system, serialization, field persistence - JSON: Update proto_json encode/decode for new field names - Tests: Rewrite all nested collection tests (204 tests passing) - Docs: Update type-system.md for recursive design Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
d9dd0fe to
26a7663
Compare
nquinn408
left a comment
There was a problem hiding this comment.
@soooojinlee , this looks good.
| ValueType.VALUE_LIST: pyarrow.list_(pyarrow.list_(pyarrow.string())), | ||
| ValueType.VALUE_SET: pyarrow.list_(pyarrow.list_(pyarrow.string())), |
There was a problem hiding this comment.
🟡 feast_value_type_to_pa returns hardcoded list(list(string)) for all VALUE_LIST/VALUE_SET regardless of actual inner type
The feast_value_type_to_pa mapping at sdk/python/feast/type_map.py:1671-1672 hardcodes pyarrow.list_(pyarrow.list_(pyarrow.string())) for both VALUE_LIST and VALUE_SET. This means any caller that uses feast_value_type_to_pa to build a PyArrow schema for nested collections (e.g., Array(Array(Int32))) would get list<list<string>> instead of list<list<int32>>. While get_pyarrow_schema_from_batch_source in offline_utils.py:287-288 has special handling to avoid this, other potential callers would silently get the wrong type. The correct type-aware conversion exists in from_feast_to_pyarrow_type (sdk/python/feast/types.py:375-376), but that function takes a FeastType, not a ValueType. This inconsistency could cause silent data corruption if any code path relies on feast_value_type_to_pa for nested types without the special-case override.
Was this helpful? React with 👍 or 👎 to provide feedback.
bf3b971 to
0778030
Compare
…e VALUE_LIST/VALUE_SET Replace 4 combinatorial enum values (LIST_LIST=36, LIST_SET=37, SET_LIST=38, SET_SET=39) with 2 recursive enum values (VALUE_LIST=40, VALUE_SET=41) that use RepeatedValue to enable unlimited nesting depth. This is a breaking change for an unreleased feature, as suggested in PR feast-dev#6132 review. Key changes: - Proto: Remove 4 enum/oneof fields, add VALUE_LIST/VALUE_SET with reserved 36-39 - Python: Update ValueType enum, type system, serialization, field persistence - JSON: Update proto_json encode/decode for new field names - Tests: Rewrite all nested collection tests (204 tests passing) - Docs: Update type-system.md for recursive design Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
f9ee82d to
6d8b00e
Compare
nquinn408
left a comment
There was a problem hiding this comment.
This looks good! Thanks for working on it!
…-dev#5947) Add support for 2-level nested collection types: Array(Array(T)), Array(Set(T)), Set(Array(T)), and Set(Set(T)). - Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET) backed by RepeatedValue proto messages - Persist inner type info in Field tags (feast:nested_inner_type), following the existing Struct schema tag pattern - Handle edge cases: empty inner collections, Set dedup at inner level, depth limit enforcement (2 levels max) - Add proto/JSON/remote transport serialization support - Add 25 unit tests covering all combinations and edge cases Signed-off-by: Soojin Lee <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
- Fix remote online store read path to use declared feature types from FeatureView instead of ValueType.UNKNOWN, which fails for nested collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET) - Add Nested Collection Types section to type-system.md with type table, usage examples, and empty-inner-collection→None limitation docs Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…for nested collection types - Add nested list handling in proto_json from_json_object (list of lists was raising ParseError since no branch matched list-typed elements) - Fix pa_to_feast_value_type to recognize nested list PyArrow types (list<item: list<item: T>>) instead of crashing with KeyError - Replace silent String fallback in _str_to_feast_type with ValueError to surface corrupted tag values instead of silently losing type info - Strengthen test coverage: type str roundtrip, inner value verification, multi-value batch, proto JSON roundtrip, PyArrow nested type inference Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue construction to satisfy mypy's strict type checking. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…n edge case - Add __eq__/__hash__ to Array and Set so inner element types are compared (previously Array(Array(String)) == Array(Array(Int32)) was True) - Fix nested collection detection in proto_json when first element is None by using any() fallback instead of only checking value[0] Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
… coverage - Remove 2-level depth restriction from Array and Set constructors to support unbounded nesting per maintainer request - Make _convert_nested_collection_to_proto() recursive for 3+ levels - Update error message for nested type inference to guide users toward explicit Field dtype declaration - Add 3+ level tests for Field roundtrip, str roundtrip, and PyArrow conversion Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…e VALUE_LIST/VALUE_SET Replace 4 combinatorial enum values (LIST_LIST=36, LIST_SET=37, SET_LIST=38, SET_SET=39) with 2 recursive enum values (VALUE_LIST=40, VALUE_SET=41) that use RepeatedValue to enable unlimited nesting depth. This is a breaking change for an unreleased feature, as suggested in PR feast-dev#6132 review. Key changes: - Proto: Remove 4 enum/oneof fields, add VALUE_LIST/VALUE_SET with reserved 36-39 - Python: Update ValueType enum, type system, serialization, field persistence - JSON: Update proto_json encode/decode for new field names - Tests: Rewrite all nested collection tests (204 tests passing) - Docs: Update type-system.md for recursive design Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…imize JSON nested list detection - Add _parse_pa_type_str() to reconstruct PyArrow types from type strings for VALUE_LIST/VALUE_SET, avoiding lossy round-trip through placeholder - Optimize proto_json nested list detection: only scan with any() when first element is None, avoiding O(n) scan for flat lists - Add warning log for unrecognized PyArrow type strings Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
… clarify placeholder pyarrow type - Add np.ndarray to isinstance check in _convert_nested_collection_to_proto to fix KeyError for 3+ level nesting during materialization (PyArrow produces np.ndarray, not Python list) - Add comment clarifying VALUE_LIST/VALUE_SET placeholder in feast_value_type_to_pa Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
6d8b00e to
a4dde0f
Compare
# [0.61.0](v0.60.0...v0.61.0) (2026-04-07) ### Bug Fixes * Add grpcio dependency group to transformation server Dockerfile ([2c2150a](2c2150a)) * Add https readiness check for rest-registry tests ([ea85e63](ea85e63)) * Add website build check for PRs and fix blog frontmatter YAML error ([#6079](#6079)) ([30a3a43](30a3a43)) * Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7)) * Added MLflow metric charts across feature selection ([#6080](#6080)) ([a403361](a403361)) * Check duplicate names for feature view across types ([#5999](#5999)) ([95b9af8](95b9af8)) * Fix integration tests ([#6046](#6046)) ([02d5548](02d5548)) * Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce)) * Fix non-specific label selector on metrics service ([a1a160d](a1a160d)) * fix path feature_definitions.py ([7d7df68](7d7df68)) * Fix regstry Rest API tests intermittent failure ([d53a339](d53a339)) * Fixed IntegrityError on SqlRegistry ([#6047](#6047)) ([325e148](325e148)) * Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7)) * Fixed pre-commit check ([114b7db](114b7db)) * Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7)) * Fixed uv cache permission error for docker build on mac ([ad807be](ad807be)) * Fixes a `PydanticDeprecatedSince20` warning for trino_offline_store ([#5991](#5991)) ([abfd18a](abfd18a)) * Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b)) * Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d)) * Integration test failures ([#6040](#6040)) ([9165870](9165870)) * Mount TLS volumes for init container ([080a9b5](080a9b5)) * **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066) * Ray offline store tests are duplicated across 3 workflows ([54f705a](54f705a)) * Reenable tests ([#6036](#6036)) ([82ee7f8](82ee7f8)) * SSL/TLS mode by default for postgres connection ([4844488](4844488)) * Use commitlint pre-commit hook instead of a separate action ([35a81e7](35a81e7)) ### Features * Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007) * Add complex type support (Map, JSON, Struct) with schema validation ([#5974](#5974)) ([1200dbf](1200dbf)) * Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf)) * Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43)) * Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728) * Add materialization, feature freshness, request latency, and push metrics to feature server ([2c6be18](2c6be18)) * Add metadata statistics to registry api ([ef1d4fc](ef1d4fc)) * Add non-entity retrieval support for ClickHouse offline store ([4d08ddc](4d08ddc)), closes [#5835](#5835) * Add OnlineStore for MongoDB ([#6025](#6025)) ([bf4e3fa](bf4e3fa)), closes [golang/go#74462](golang/go#74462) * Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368)) * Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6)) * Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21)) * Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116) * Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311)) * Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3)) * Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2)) * Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8)) * Added CodeQL SAST scanning and detect-secrets pre-commit hook ([547b516](547b516)) * Added odfv transformations metrics ([8b5a526](8b5a526)) * Adding optional name to Aggregation (feast-dev[#5994](#5994)) ([#6083](#6083)) ([56469f7](56469f7)) * Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06)) * Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026)) * Feature Server High-Availability on Kubernetes ([#6028](#6028)) ([9c07b4c](9c07b4c)), closes [Hi#Availability](https://github.com/Hi/issues/Availability) [Hi#Availability](https://github.com/Hi/issues/Availability) * **go:** Implement metrics and tracing for http and grpc servers ([#5925](#5925)) ([2b4ec9a](2b4ec9a)) * Horizontal scaling support to the Feast operator ([#6000](#6000)) ([3ec13e6](3ec13e6)) * Making feature view source optional (feast-dev[#6074](#6074)) ([#6075](#6075)) ([76917b7](76917b7)) * Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c)) * Support arm docker build ([#6061](#6061)) ([1e1f5d9](1e1f5d9)) * Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570)) * Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983)) * Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642)) * Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1)) * Use orjson for faster JSON serialization in feature server ([6f5203a](6f5203a)) * Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32)) ### Performance Improvements * Online feature response construction in a single pass over read rows ([113fb04](113fb04)) * Optimize protobuf parsing in Redis online store ([#6023](#6023)) ([59dfdb8](59dfdb8)) * Optimize timestamp conversion in _convert_rows_to_protobuf ([33a2e95](33a2e95)) * Parallelize DynamoDB batch reads in sync online_read ([#6024](#6024)) ([9699944](9699944)) * Remove redundant entity key serialization in online_read ([d87283f](d87283f))
# [0.62.0](v0.61.0...v0.62.0) (2026-04-08) ### Bug Fixes * Added missing jackc/pgx/v5 entries ([94ad0e7](94ad0e7)) * Fix missing error handling for resource_counts endpoint ([d9706ce](d9706ce)) * fix path feature_definitions.py ([7d7df68](7d7df68)) * Fix regstry Rest API tests intermittent failure ([d53a339](d53a339)) * Fixed intermittent failures in get_historical_features ([c335ec7](c335ec7)) * Fixed the intermittent FeatureViewNotFoundException ([661ecc7](661ecc7)) * Handle existing RBAC role gracefully in namespace registry ([b46a62b](b46a62b)) * Ignore ipynb files during apply ([#6151](#6151)) ([4ea123d](4ea123d)) * Mount TLS volumes for init container ([080a9b5](080a9b5)) * **postgres:** Use end_date in synthetic entity_df for non-entity retrieval ([#6110](#6110)) ([088a802](088a802)), closes [#6066](#6066) * SSL/TLS mode by default for postgres connection ([4844488](4844488)) * Sync v0.61-branch so v0.61.0 tag is reachable from master ([af66878](af66878)) ### Features * Add Claude Code agent skills for Feast ([#6081](#6081)) ([1e5b60f](1e5b60f)), closes [#5976](#5976) [#6007](#6007) * Add decimal to supported feature types ([#6029](#6029)) ([#6226](#6226)) ([cff6fbf](cff6fbf)) * Add feast apply init container to automate registry population on pod start ([#6106](#6106)) ([6b31a43](6b31a43)) * Add feature view versioning support to PostgreSQL and MySQL online stores ([#6193](#6193)) ([940e0f0](940e0f0)), closes [#6168](#6168) [#6169](#6169) [#2728](#2728) * Add metadata statistics to registry api ([ef1d4fc](ef1d4fc)) * Add Oracle DB as Offline store in python sdk & operator ([#6017](#6017)) ([9d35368](9d35368)) * Add RBAC aggregation labels to FeatureStore ClusterRoles ([daf77c6](daf77c6)) * Add ServiceMonitor auto-generation for Prometheus discovery ([#6126](#6126)) ([56e6d21](56e6d21)) * Add typed_features field to grpc write request (([#6117](#6117)) ([#6118](#6118)) ([eeaa6db](eeaa6db)), closes [#6116](#6116) * Add UUID and TIME_UUID as feature types ([#5885](#5885)) ([#5951](#5951)) ([5d6e311](5d6e311)) * Add version indicators to lineage graph nodes ([#6187](#6187)) ([73805d3](73805d3)) * Add version tracking to FeatureView ([#6101](#6101)) ([ed4a4f2](ed4a4f2)) * Added Agent skills for AI Agents ([#6007](#6007)) ([99008c8](99008c8)) * Added odfv transformations metrics ([8b5a526](8b5a526)) * Created DocEmbedder class ([#5973](#5973)) ([0719c06](0719c06)) * Extended OIDC support to extract groups & namespaces and token injection with multiple methods ([#6089](#6089)) ([7c04026](7c04026)) * Replace ORJSONResponse with Pydantic response models for faster JSON serialization ([65cf03c](65cf03c)) * Support distinct count aggregation [[#6116](#6116)] ([3639570](3639570)) * Support HTTP in MCP ([#6109](#6109)) ([e72b983](e72b983)) * Support nested collection types (Array/Set of Array/Set) ([#5947](#5947)) ([#6132](#6132)) ([ab61642](ab61642)) * Support podAnnotations on Deployment pod template ([1b3cdc1](1b3cdc1)) * Utilize date partition column in BigQuery ([#6076](#6076)) ([4ea9b32](4ea9b32)) ### Performance Improvements * Online feature response construction in a single pass over read rows ([113fb04](113fb04))
What this PR does / why we need it:
Add 2-level nested collection types:
Array(Array(T)),Array(Set(T)),Set(Array(T)),Set(Set(T))Changes
LIST_LIST/LIST_SET/SET_LIST/SET_SET) withRepeatedValuefieldsArray/Setmutual nesting with 2-level depth limit, preserve inner type viafeast:nested_inner_typetagfeature_type_mapinstead ofValueType.UNKNOWNStringfallback in_str_to_feast_typewithValueErrorWhich issue(s) this PR fixes:
Fixes #5947
Misc
The UUID type feature PR was developed first and uses ValueType enum values 36-41. This PR uses 36-39 for nested collection types. If this PR is merged first, the UUID branch will need to reassign its enum values to avoid conflicts.