-
Notifications
You must be signed in to change notification settings - Fork 238
Support root_id for storage backends #775
Copy link
Copy link
Closed
Description
Some of our users work with deeply nested data, where they perform vector search on some nesting level, but are actually interested in retrieving the root level documents.
In memory this can be solved by traversing the nested structure on the fly, but with a database backend it is not doable: nested levels are only present in serialized form, so one would have to load everything into memory in order to be able to traverse the structure.
To tackle this, we propose the following:
- we create a function
get_root_doc(da, doc)that returns the root document ofdoc. The implementation could be something similar to this:
def get_root_doc(da, doc):
root_da_flat = da[...]
result = doc
while result.parent_id:
result = root_da_flat[result.parent_id]
return result- For storage backends we expose an api that allows you to search by some nesting level, but retrieve documents on the root level:
da.find(..., return_root=True) - It works the following way:
- when inserting a (batch of) Document(s), it calls
get_root_doc()on that - It stores the root document's
idas a separate column in the database - when searching with
return_root=Trueit performs a search, then take the result's stored root_id, and returns the root document based on that - The level the user searches on needs to exist as a subindex (this is already the case), and the root level is always properly indexed anyways. The intermediate nesting levels can stay serialized.
- when inserting a (batch of) Document(s), it calls
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels