Skip to content

fix: del with ids#79

Merged
hanxiao merged 15 commits intomainfrom
fix-del-with-ids
Jan 28, 2022
Merged

fix: del with ids#79
hanxiao merged 15 commits intomainfrom
fix-del-with-ids

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jan 26, 2022

Current version of docarray has a wrong behavior while deleting item via ids ==>

@JohannesMessner and I added some test to cover all of the del case from the documentation + a fix when deleting by ids

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 26, 2022

Codecov Report

Merging #79 (3052494) into main (95d9f5a) will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
+ Coverage   82.85%   82.98%   +0.13%     
==========================================
  Files          87       87              
  Lines        3815     3827      +12     
==========================================
+ Hits         3161     3176      +15     
+ Misses        654      651       -3     
Flag Coverage Δ
docarray 82.98% <100.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/memory/backend.py 97.61% <100.00%> (+0.74%) ⬆️
docarray/array/storage/memory/getsetdel.py 97.87% <100.00%> (-0.05%) ⬇️
docarray/array/storage/memory/seqlike.py 100.00% <100.00%> (+3.22%) ⬆️
docarray/array/mixins/getitem.py 87.93% <0.00%> (+1.72%) ⬆️
docarray/array/mixins/delitem.py 86.66% <0.00%> (+2.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95d9f5a...3052494. Read the comment docs.

@samsja samsja force-pushed the fix-del-with-ids branch 2 times, most recently from d81bb73 to 5e09e7e Compare January 26, 2022 14:20
@samsja samsja marked this pull request as ready for review January 26, 2022 16:06
def _del_docs_by_slice(self, _slice: slice):
del self._data[_slice]
self._rebuild_id2offset()
self._needs_id2offset_rebuild = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the _needs_id2offset_rebuild to a decorator?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it in the last commit, could you review ?


def _init_storage(
self, _docs: Optional['DocumentArraySourceType'] = None, copy: bool = False
self, _docs: Optional["DocumentArraySourceType"] = None, copy: bool = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quote

Comment on lines +7 to +39
def docs():
return DocumentArray([Document(id=f"{i}") for i in range(1, 10)])


@pytest.mark.parametrize(
"to_delete",
[
0,
1,
4,
-1,
list(range(1, 4)),
[2, 4, 7, 1, 1],
slice(0, 2),
slice(2, 4),
slice(4, -1),
[True, True, False],
...,
],
)
def test_del_all(docs, to_delete):
doc_to_delete = docs[to_delete]
del docs[to_delete]
assert doc_to_delete not in docs


@pytest.mark.parametrize(
["deleted_ids", "expected_ids"],
[
(["1", "2", "3", "4"], ["5", "6", "7", "8", "9"]),
(["2", "4", "7", "1"], ["3", "5", "6", "8", "9"]),
],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use black -s
  • or a better way is to install pre-commit hook via
# at docarray root
pre-commit install

you will see this option auto-checked on every commit
image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already installed the pre-commit hook, this is already formatted by black

Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think that the operations that require rebuilding the id2offset are the delete and insert. Extend and append really don't require doing so. If we append/extend and the id2offset, is ready, then we update it efficiently without rebuilding, otherwise we don't update and wait for the next rebuilding

@samsja samsja requested a review from alaeddine-13 January 27, 2022 10:31
Co-authored-by: AlaeddineAbdessalem <[email protected]>
@samsja samsja changed the title Fix del with ids fix: del with ids Jan 27, 2022
@samsja samsja requested a review from alaeddine-13 January 27, 2022 14:28
Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 75 to 77
elif isinstance(_docs, DocumentArray):
self._data = _docs._data
self._id_to_index = _docs._id2offset
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this part is wrong, not all DocumentArray has _id_to_index or _id2offset

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that was here before so I might be an other bug ? we should delete line 77 as the index _id_to_index will be automatically built on first call.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in last commit

if copy:
self._data = [Document(d, copy=True) for d in _docs]
self._rebuild_id2offset()
elif isinstance(_docs, DocumentArray):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif isinstance(_docs, DocumentArray):
elif isinstance(_docs, DocumentArrayinMemory):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in last commit

Copy link
Copy Markdown
Member

@hanxiao hanxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong logic in copy constructor

@github-actions github-actions bot added size/m and removed size/s labels Jan 27, 2022
@github-actions github-actions bot added size/s and removed size/m labels Jan 27, 2022
@hanxiao hanxiao dismissed their stale review January 28, 2022 09:42

changes applied already

@hanxiao hanxiao merged commit 2d5e93b into main Jan 28, 2022
@hanxiao hanxiao deleted the fix-del-with-ids branch January 28, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants