feat: check if a document is already index#1633
Conversation
Signed-off-by: maxwelljin2 <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
| rows = self._sqlite_cursor.fetchall() | ||
| return len(rows) > 0 | ||
| else: | ||
| raise NotImplementedError |
There was a problem hiding this comment.
this does not seem a proper exception
There was a problem hiding this comment.
This PR is still in progress, I'll change it later :) It should output proper hint to users
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
docarray/index/abstract.py
Outdated
| return False | ||
|
|
||
| if safe_issubclass(type(item), BaseDoc): | ||
| docs = self._get_all_documents() |
There was a problem hiding this comment.
I do not think this is the way to do right? I think u should call __contains__ in every subindex and if one returns True, it is true
There was a problem hiding this comment.
It's implemented in the line 1205 - 1207. I'll remove all _get_all_documents method, so it would only look for any sub-document inside the DocArray. (so the meaning for this subindex_contains method would similar to subindex_find)
|
With the |
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
Signed-off-by: maxwelljin2 <[email protected]>
This PR is designed to enable the indexer (for all supported backends) to check whether a document has already been indexed. We aim to accommodate various backends including in_memory, hnswlib, elastic, qdrant, and weaviate. Given that different backends store and index documents in distinct ways, we need to custom-tailor our function for each backend.
Progress: