feat: Add DynamoDB in-place list update support for array-based features#5916
Conversation
a69361f to
45b218f
Compare
|
Reviewing the devin analysis |
c3d81ed to
eff9006
Compare
eff9006 to
8221600
Compare
franciscojavierarceo
left a comment
There was a problem hiding this comment.
one small nit but otherwise this lgtm
This PR adds support for in-place list updates (append/prepend) for array-based features in DynamoDB online store, addressing issue feast-dev#5687. Key changes: - Add update_online_store() and update_online_store_async() methods to FeatureStore for list operations on existing feature values - Implement read-modify-write pattern in DynamoDB store to handle list operations while maintaining compatibility with existing protobuf serialization format - Add comprehensive tests for list append, prepend, mixed operations, and new entity handling The implementation uses a read-modify-write approach because existing data is stored as serialized protobuf bytes, not native DynamoDB lists. This maintains backward compatibility with existing online_read methods. Signed-off-by: Anshi Shrivastava <[email protected]>
Per reviewer feedback, removed the synchronous update_online_store method and renamed update_online_store_async to update_online_store. Signed-off-by: Anshi Shrivastava <[email protected]>
c1cf89c to
9dd1ab9
Compare
|
@franciscojavierarceo -Thanks for the feedback on leaving the async version - updated. Feel free to merge, good to go. |
| ) | ||
|
|
||
| # Process each entity update | ||
| for entity_key, features, timestamp, _ in _latest_data_to_write(data): |
There was a problem hiding this comment.
🔴 Using _latest_data_to_write discards multiple list updates for the same entity in a batch
The update_online_store and update_online_store_async methods use _latest_data_to_write(data) to iterate over data, which deduplicates records by keeping only the latest record per entity key.
Click to expand
Problem
The _latest_data_to_write function at dynamodb.py:1166-1173 is designed to keep only the record with the latest timestamp for each entity:
def _latest_data_to_write(data):
as_hashable = ((d[0].SerializeToString(), d) for d in data)
sorted_data = sorted(as_hashable, key=lambda ah: (ah[0], ah[1][2]))
return (v for _, v in OrderedDict((ah[0], ah[1]) for ah in sorted_data).items())This is correct for normal writes (where you want to overwrite with the latest value), but incorrect for list append/prepend operations.
Impact
If a user tries to append multiple items to the same entity's list in a single batch call, only the item with the latest timestamp will actually be appended. For example:
- Appending
["tx1"]at timestamp T1 - Appending
["tx2"]at timestamp T2 - Appending
["tx3"]at timestamp T3
Only ["tx3"] will be appended because _latest_data_to_write discards the first two records.
Expected vs Actual
- Expected: All list updates in the batch should be processed sequentially
- Actual: Only the update with the latest timestamp per entity is processed
Recommendation: For list update operations, process all records in order without deduplication. Either remove the _latest_data_to_write call in update_online_store/update_online_store_async methods, or create a new iteration function that preserves all records while still sorting by timestamp to ensure correct ordering of appends.
Was this helpful? React with 👍 or 👎 to provide feedback.
…res (feast-dev#5916) Signed-off-by: yassinnouh21 <[email protected]>
…res (feast-dev#5916) Signed-off-by: yassinnouh21 <[email protected]>
# [0.60.0](v0.59.0...v0.60.0) (2026-02-17) ### Bug Fixes * Added a flag to correctly download the go binaries ([0f77135](0f77135)) * Adds mapping of date Trino's type into string Feast's type ([531e839](531e839)) * **ci:** Use uv run for pytest in master_only benchmark step ([#5957](#5957)) ([5096010](5096010)) * Disable materialized odfvs for historical retrieval ([#5880](#5880)) ([739d28a](739d28a)) * Fix linting and formatting issues ([#5907](#5907)) ([42ca14a](42ca14a)) * Make timestamp field handling compatible with Athena V3 ([#5936](#5936)) ([e2bad34](e2bad34)) * Support pgvector under non-default schema ([#5970](#5970)) ([c636cd4](c636cd4)) * unit tests not running on main branch ([#5909](#5909)) ([62fe664](62fe664)) * Update java dep which blocking release ([#5903](#5903)) ([a5b8186](a5b8186)) * Update the dockerfile with golang 1.24.12. ([#5918](#5918)) ([be1b522](be1b522)) * Use context.Background() in client constructors ([#5897](#5897)) ([984f93a](984f93a)) ### Features * Add blog post for PyTorch ecosystem announcement ([#5906](#5906)) ([d2eb629](d2eb629)) * Add blog post on Feast dbt integration ([#5915](#5915)) ([b3c8138](b3c8138)) * Add DynamoDB in-place list update support for array-based features ([#5916](#5916)) ([aa5973f](aa5973f)) * Add HTTP connection pooling for remote online store client ([#5895](#5895)) ([e022bf8](e022bf8)) * Add integration tests for dbt import ([#5899](#5899)) ([a444692](a444692)) * Add lazy initialization and feature service caching ([#5924](#5924)) ([b37b7d0](b37b7d0)) * Add multiple entity support to dbt integration ([#5901](#5901)) ([05a4fb5](05a4fb5)), closes [#5872](#5872) * Add PostgreSQL online store support for Go feature server ([#5963](#5963)) ([b8c6f3d](b8c6f3d)) * Add publish docker image of Go feature server. ([#5923](#5923)) ([759d8c6](759d8c6)) * Add Set as feature type ([#5888](#5888)) ([52458fc](52458fc)) * Added online server worker config support in operator ([#5926](#5926)) ([193c72a](193c72a)) * Added support for OpenLineage integration ([#5884](#5884)) ([df70d8d](df70d8d)) * Adjust ray offline store to support abfs(s) ADLS Azure Storage ([#5911](#5911)) ([d6c0b2d](d6c0b2d)) * Batch_engine config injection in feature_store.yaml through operator ([#5938](#5938)) ([455d56c](455d56c)) * Consolidate Python packaging - remove setup.py/setup.cfg, standardize on pyproject.toml and uv ([16696b8](16696b8)) * **go:** Add MySQL registry store support for Go feature server ([#5933](#5933)) ([19f9bb8](19f9bb8)) * Improve local dev experience with file-aware hooks and auto parallelization ([#5956](#5956)) ([839b79e](839b79e)) * Modernize precommit hooks and optimize test performance ([#5929](#5929)) ([ea7d4fa](ea7d4fa)) * Optimize container infrastructure for production ([#5881](#5881)) ([5ebdac8](5ebdac8)) * Optimize DynamoDB online store for improved latency ([#5889](#5889)) ([fcc8274](fcc8274))
# [0.60.0](feast-dev/feast@v0.59.0...v0.60.0) (2026-02-17) ### Bug Fixes * Added a flag to correctly download the go binaries ([0f77135](feast-dev@0f77135)) * Adds mapping of date Trino's type into string Feast's type ([531e839](feast-dev@531e839)) * **ci:** Use uv run for pytest in master_only benchmark step ([feast-dev#5957](feast-dev#5957)) ([5096010](feast-dev@5096010)) * Disable materialized odfvs for historical retrieval ([feast-dev#5880](feast-dev#5880)) ([739d28a](feast-dev@739d28a)) * Fix linting and formatting issues ([feast-dev#5907](feast-dev#5907)) ([42ca14a](feast-dev@42ca14a)) * Make timestamp field handling compatible with Athena V3 ([feast-dev#5936](feast-dev#5936)) ([e2bad34](feast-dev@e2bad34)) * Support pgvector under non-default schema ([feast-dev#5970](feast-dev#5970)) ([c636cd4](feast-dev@c636cd4)) * unit tests not running on main branch ([feast-dev#5909](feast-dev#5909)) ([62fe664](feast-dev@62fe664)) * Update java dep which blocking release ([feast-dev#5903](feast-dev#5903)) ([a5b8186](feast-dev@a5b8186)) * Update the dockerfile with golang 1.24.12. ([feast-dev#5918](feast-dev#5918)) ([be1b522](feast-dev@be1b522)) * Use context.Background() in client constructors ([feast-dev#5897](feast-dev#5897)) ([984f93a](feast-dev@984f93a)) ### Features * Add blog post for PyTorch ecosystem announcement ([feast-dev#5906](feast-dev#5906)) ([d2eb629](feast-dev@d2eb629)) * Add blog post on Feast dbt integration ([feast-dev#5915](feast-dev#5915)) ([b3c8138](feast-dev@b3c8138)) * Add DynamoDB in-place list update support for array-based features ([feast-dev#5916](feast-dev#5916)) ([aa5973f](feast-dev@aa5973f)) * Add HTTP connection pooling for remote online store client ([feast-dev#5895](feast-dev#5895)) ([e022bf8](feast-dev@e022bf8)) * Add integration tests for dbt import ([feast-dev#5899](feast-dev#5899)) ([a444692](feast-dev@a444692)) * Add lazy initialization and feature service caching ([feast-dev#5924](feast-dev#5924)) ([b37b7d0](feast-dev@b37b7d0)) * Add multiple entity support to dbt integration ([feast-dev#5901](feast-dev#5901)) ([05a4fb5](feast-dev@05a4fb5)), closes [feast-dev#5872](feast-dev#5872) * Add PostgreSQL online store support for Go feature server ([feast-dev#5963](feast-dev#5963)) ([b8c6f3d](feast-dev@b8c6f3d)) * Add publish docker image of Go feature server. ([feast-dev#5923](feast-dev#5923)) ([759d8c6](feast-dev@759d8c6)) * Add Set as feature type ([feast-dev#5888](feast-dev#5888)) ([52458fc](feast-dev@52458fc)) * Added online server worker config support in operator ([feast-dev#5926](feast-dev#5926)) ([193c72a](feast-dev@193c72a)) * Added support for OpenLineage integration ([feast-dev#5884](feast-dev#5884)) ([df70d8d](feast-dev@df70d8d)) * Adjust ray offline store to support abfs(s) ADLS Azure Storage ([feast-dev#5911](feast-dev#5911)) ([d6c0b2d](feast-dev@d6c0b2d)) * Batch_engine config injection in feature_store.yaml through operator ([feast-dev#5938](feast-dev#5938)) ([455d56c](feast-dev@455d56c)) * Consolidate Python packaging - remove setup.py/setup.cfg, standardize on pyproject.toml and uv ([16696b8](feast-dev@16696b8)) * **go:** Add MySQL registry store support for Go feature server ([feast-dev#5933](feast-dev#5933)) ([19f9bb8](feast-dev@19f9bb8)) * Improve local dev experience with file-aware hooks and auto parallelization ([feast-dev#5956](feast-dev#5956)) ([839b79e](feast-dev@839b79e)) * Modernize precommit hooks and optimize test performance ([feast-dev#5929](feast-dev#5929)) ([ea7d4fa](feast-dev@ea7d4fa)) * Optimize container infrastructure for production ([feast-dev#5881](feast-dev#5881)) ([5ebdac8](feast-dev@5ebdac8)) * Optimize DynamoDB online store for improved latency ([feast-dev#5889](feast-dev#5889)) ([fcc8274](feast-dev@fcc8274)) Signed-off-by: soojin <[email protected]>
Summary
Adds DynamoDB-specific list update functionality to avoid inefficient read-modify-write patterns for array-based features. This implements the solution requested in #5687.
Fixes #5687
Changes
update_online_store()andupdate_online_store_async()methods toDynamoDBOnlineStoreclassFeatureStoreclass for user-facing APIUpdateItemwithlist_appendexpressions for efficient array operationswrite_to_online_storebehaviorTesting
Usage Example