Skip to content

feat: milvus document store#587

Merged
JohannesMessner merged 114 commits intomainfrom
feat-milvus
Nov 22, 2022
Merged

feat: milvus document store#587
JohannesMessner merged 114 commits intomainfrom
feat-milvus

Conversation

@JohannesMessner
Copy link
Copy Markdown
Member

@JohannesMessner JohannesMessner commented Oct 4, 2022

Goals

Add Milvus as a supported Document Store

ToDo

Outline:

  • file structure
  • implement getsetdel
    • optimized methods
  • implement seqlike
    • optimized methods
  • implement backend
  • implement find
  • implement init
  • documentation
    • docstrings
    • backend summary page
  • tests
    • tests for specifics
      • specific config params
      • loading behaviour (decorator)
  • update benchmark needs to happen in a separate PR after this is merged
  • add search_params in the configuration (and document it) document kwargs that can be passed to find etc.
  • Bug: limit does not work for filter-only find call
  • Issue: .extend() does not use bulk operations

Other stuff:

  • find better way to store offset2ids that does not require a new collection
  • use context manager to load and release database to/from memory

New dependencies:
This PR adds one new test dependency: pytest-mock.

Known limitations:

  • Cannot store documents with large-ish tensors. This is because str type is not yet supported in Milvus, so the serialized document has to to into a varchar column. If the tensor is too large, the base64 encoded document is too long for the maximum allowed varchar length.
  • due to slow collection load and release, the with da: context manager should be used whenever accessing more than a few documents in a non-batched way.
  • the mechanism that enables collection loading in the with da: context manager logs some errors to the command line; this is not an issue, since these backend errors are expected and handled properly, but it is irritating for the user. Currently investigating if Milvus offers a better API to tackle this issue. this is solved

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 4, 2022

Codecov Report

Merging #587 (fea7c28) into main (f4b16ef) will decrease coverage by 1.46%.
The diff coverage is 91.97%.

@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
- Coverage   88.29%   86.83%   -1.47%     
==========================================
  Files         134      139       +5     
  Lines        6648     6920     +272     
==========================================
+ Hits         5870     6009     +139     
- Misses        778      911     +133     
Flag Coverage Δ
docarray 86.83% <91.97%> (-1.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/find.py 90.42% <ø> (ø)
docarray/array/qdrant.py 100.00% <ø> (ø)
docarray/array/storage/milvus/find.py 75.00% <75.00%> (ø)
docarray/array/document.py 86.20% <83.33%> (-0.34%) ⬇️
docarray/array/storage/milvus/seqlike.py 85.71% <85.71%> (ø)
docarray/array/storage/milvus/backend.py 93.28% <93.28%> (ø)
docarray/array/storage/milvus/getsetdel.py 98.21% <98.21%> (ø)
docarray/array/milvus.py 100.00% <100.00%> (ø)
docarray/array/storage/base/seqlike.py 89.47% <100.00%> (+2.63%) ⬆️
docarray/array/mixins/reduce.py 26.92% <0.00%> (-73.08%) ⬇️
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions github-actions bot added size/m and removed size/xs labels Oct 4, 2022
@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@JohannesMessner
Copy link
Copy Markdown
Member Author

I need to update the PR to play nice with the new mode that disables offset2id

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Co-authored-by: AlaeddineAbdessalem <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@hanxiao
Copy link
Copy Markdown
Member

hanxiao commented Nov 16, 2022

image

This should be wrapped with ```console

Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Copy Markdown
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM what a big PR 😃

@JohannesMessner JohannesMessner dismissed JoanFM’s stale review November 22, 2022 08:48

addressed the requested changes

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@JohannesMessner JohannesMessner merged commit e2d4d19 into main Nov 22, 2022
@JohannesMessner JohannesMessner deleted the feat-milvus branch November 22, 2022 08:52
@github-actions
Copy link
Copy Markdown

📝 Docs are deployed on https://ft-feat-milvus--jina-docs.netlify.app 🎉

@JohannesMessner JohannesMessner restored the feat-milvus branch November 23, 2022 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants