refactor: count items from hnsw index#1765
refactor: count items from hnsw index#1765shobhit9957 wants to merge 16 commits intodocarray:mainfrom
Conversation
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
| Get the number of documents using the HNSW method. | ||
| """ | ||
| # Access the first HNSW index from self._hnsw_indices | ||
| first_hnsw_index = self._hnsw_indices[0] |
There was a problem hiding this comment.
this is a dictionary, not a list, so u should not access it like this
|
Also, please remember that you need to signofff every commit so that we csn merge the PR |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1765 +/- ##
==========================================
- Coverage 83.74% 82.70% -1.05%
==========================================
Files 134 134
Lines 8845 8850 +5
==========================================
- Hits 7407 7319 -88
- Misses 1438 1531 +93
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
|
Hey, i'm so sorry joan. for the mistakes... I'll remember to sign off my every commit from now. and i'll try to solve this issue and commit again(signed off). Thanks Joan! |
Signed-off-by: Shobhit Kumar <[email protected]>
Nothing to be sorrry about. This is about Open Source and supporting each other |
docarray/index/backends/hnswlib.py
Outdated
| first_hnsw_index = self._hnsw_indices[first_index_key] | ||
| hnsw_num_docs = ( | ||
| first_hnsw_index.get_document_count() | ||
| ) # Replace with actual method |
There was a problem hiding this comment.
this comment is not anymore needed
Signed-off-by: Shobhit Kumar <[email protected]>
|
Hey Joan, could you please check the commit, and if there are no issues in it, can you please merge it Joan. If there are any issues , please let me know Joan. I'll try to solve it. |
The tests are failing as seen here https://github.com/docarray/docarray/actions/runs/5978323747/job/16260526492?pr=1765 I believe the method you use does not exist, please check what is the appropiate method name |
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
| first_index_key = next(iter(self._hnsw_indices)) # Get the first key | ||
| first_hnsw_index = self._hnsw_indices[first_index_key] | ||
| hnsw_num_docs = first_hnsw_index.get_document_count() | ||
| hnsw_num_docs = first_hnsw_index.element_count() |
There was a problem hiding this comment.
it is not a method, it is an attribute
There was a problem hiding this comment.
hey joan I've fixed this....can you please say me what is the issue again...so that the tests are failing...
There was a problem hiding this comment.
You can check what is faling here https://github.com/docarray/docarray/actions/runs/6003026515/job/16306992708?pr=1765
And try to run the tests urself
There was a problem hiding this comment.
Okey, I see that the test asserts the number of docs after deletion, and this HNSWlib element_count may not count deleted
Signed-off-by: Shobhit Kumar <[email protected]>
|
I have realised about some challenges of this implementation, would u be able to join our discord to have better way to discuss how to tackle them? |
|
hey thanks joan, for the invitation of the discord, for sure I'll join joan. Thanks joan! let's fix this issue! |
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
|
|
||
| def _count_docs_in_hnswlib(self) -> int: | ||
| # Use HNSWlib's method to count documents | ||
| return self._hnsw_indices.element_count |
There was a problem hiding this comment.
remember from last iteration, this is a dictionary, not an index itself, u need to get the first index in the dictionary
Signed-off-by: Shobhit Kumar <[email protected]>
Signed-off-by: Shobhit Kumar <[email protected]>
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
| self._num_docs = self._get_num_docs_hnsw() | ||
| return self._num_docs | ||
|
|
||
| def is_index_empty(self) -> bool: |
There was a problem hiding this comment.
this method needs to be defined at parent class
There was a problem hiding this comment.
Please remove this method here as it is not needed now
This reverts commit f12bf20.
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
| self._num_docs = self._get_num_docs_hnsw() | ||
| return self._num_docs | ||
|
|
||
| def is_index_empty(self) -> bool: |
There was a problem hiding this comment.
Please remove this method here as it is not needed now
docarray/index/backends/hnswlib.py
Outdated
| """ | ||
| if self._num_docs == 0: | ||
| self._num_docs = self._get_num_docs_sqlite() | ||
| self._num_docs = self._get_num_docs_hnsw() |
docarray/index/backends/hnswlib.py
Outdated
| @@ -428,9 +426,17 @@ def num_docs(self) -> int: | |||
| Get the number of documents. | |||
| """ | |||
| if self._num_docs == 0: | |||
There was a problem hiding this comment.
Change this call to use the new method
Signed-off-by: Shobhit Kumar <[email protected]>
| :return: True if the index is empty, False otherwise. | ||
| """ | ||
| return self.num_docs() == 0 | ||
| if self.is_index_empty == 0: |
There was a problem hiding this comment.
this does not make sense, is_index_empty should not be compared to 0, it already gives a boolean
|
Let's take over again when I merge the first step of this epic #1801 |
|
Okay!, so now should I commit on #1801 ? |
No, you should create a new PR with the next steps. |
|
Okay!, got it. |
Fix: #1764