feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132
feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132soooojinlee wants to merge 6 commits intofeast-dev:masterfrom
Conversation
|
@soooojinlee , this looks good. Just need to respond to Devin feedback and update branch and I can approve. |
…-dev#5947) Add support for 2-level nested collection types: Array(Array(T)), Array(Set(T)), Set(Array(T)), and Set(Set(T)). - Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET) backed by RepeatedValue proto messages - Persist inner type info in Field tags (feast:nested_inner_type), following the existing Struct schema tag pattern - Handle edge cases: empty inner collections, Set dedup at inner level, depth limit enforcement (2 levels max) - Add proto/JSON/remote transport serialization support - Add 25 unit tests covering all combinations and edge cases Signed-off-by: Soojin Lee <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
- Fix remote online store read path to use declared feature types from FeatureView instead of ValueType.UNKNOWN, which fails for nested collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET) - Add Nested Collection Types section to type-system.md with type table, usage examples, and empty-inner-collection→None limitation docs Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…for nested collection types - Add nested list handling in proto_json from_json_object (list of lists was raising ParseError since no branch matched list-typed elements) - Fix pa_to_feast_value_type to recognize nested list PyArrow types (list<item: list<item: T>>) instead of crashing with KeyError - Replace silent String fallback in _str_to_feast_type with ValueError to surface corrupted tag values instead of silently losing type info - Strengthen test coverage: type str roundtrip, inner value verification, multi-value batch, proto JSON roundtrip, PyArrow nested type inference Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue construction to satisfy mypy's strict type checking. Signed-off-by: soojin <[email protected]> Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
…n edge case - Add __eq__/__hash__ to Array and Set so inner element types are compared (previously Array(Array(String)) == Array(Array(Int32)) was True) - Fix nested collection detection in proto_json when first element is None by using any() fallback instead of only checking value[0] Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
… coverage - Remove 2-level depth restriction from Array and Set constructors to support unbounded nesting per maintainer request - Make _convert_nested_collection_to_proto() recursive for 3+ levels - Update error message for nested type inference to guide users toward explicit Field dtype declaration - Add 3+ level tests for Field roundtrip, str roundtrip, and PyArrow conversion Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: soojin <[email protected]>
58dff41 to
c1ce5d8
Compare
|
Addressed Devin review feedback and also removed the 2-level depth restriction for nested collection types — now supports unbounded nesting depth as discussed in #5947. |
@soooojinlee , thanks so much for working on this. My question is if unbounded, why not just implement like map. Reduce to just |
What this PR does / why we need it:
Add 2-level nested collection types:
Array(Array(T)),Array(Set(T)),Set(Array(T)),Set(Set(T))Changes
LIST_LIST/LIST_SET/SET_LIST/SET_SET) withRepeatedValuefieldsArray/Setmutual nesting with 2-level depth limit, preserve inner type viafeast:nested_inner_typetagfeature_type_mapinstead ofValueType.UNKNOWNStringfallback in_str_to_feast_typewithValueErrorWhich issue(s) this PR fixes:
Fixes #5947
Misc
The UUID type feature PR was developed first and uses ValueType enum values 36-41. This PR uses 36-39 for nested collection types. If this PR is merged first, the UUID branch will need to reassign its enum values to avoid conflicts.