Skip to content

fix: calculate relevant docs on index instead of queries#950

Merged
samsja merged 5 commits intodocarray:mainfrom
guenthermi:fix-num_relevant_documents_per_label-calculation
Dec 16, 2022
Merged

fix: calculate relevant docs on index instead of queries#950
samsja merged 5 commits intodocarray:mainfrom
guenthermi:fix-num_relevant_documents_per_label-calculation

Conversation

@guenthermi
Copy link
Copy Markdown
Contributor

Goals:

  • The num_relevant_documents_per_label is calculated in the embed_and_evaluate method on the documents in self. However, this is only correct if self is matched against itself. Instead it should be calculated on the index_data attribute if it is provided.

  • check and update documentation, if required. See guide

Copy link
Copy Markdown
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it as draft until really ready for review

match_batch_size=1,
limit=10,
)
print(res)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print, there should be assertions

@JoanFM JoanFM marked this pull request as draft December 15, 2022 15:38
Comment on lines +656 to +657
def test_embed_and_evaluate_on_real_data(two_embed_funcs, kwargs):
metric_names = ['precision_at_k', 'reciprocal_rank']
metric_names = ['precision_at_k', 'reciprocal_rank', 'recall_at_k']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test case for exclude_self = True/False

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 15, 2022

Codecov Report

Base: 85.21% // Head: 77.21% // Decreases project coverage by -7.99% ⚠️

Coverage data is based on head (017354d) compared to base (ceb16ec).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #950      +/-   ##
==========================================
- Coverage   85.21%   77.21%   -8.00%     
==========================================
  Files         155      155              
  Lines        8048     8049       +1     
==========================================
- Hits         6858     6215     -643     
- Misses       1190     1834     +644     
Flag Coverage Δ
docarray 77.21% <0.00%> (-8.00%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/evaluation.py 8.86% <0.00%> (-0.06%) ⬇️
docarray/array/mixins/reduce.py 26.92% <0.00%> (-73.08%) ⬇️
docarray/array/mixins/io/pushpull.py 23.07% <0.00%> (-70.09%) ⬇️
docarray/array/mixins/io/csv.py 23.68% <0.00%> (-65.79%) ⬇️
docarray/array/mixins/io/dataframe.py 41.66% <0.00%> (-58.34%) ⬇️
docarray/array/mixins/sample.py 45.45% <0.00%> (-54.55%) ⬇️
docarray/array/mixins/text.py 50.00% <0.00%> (-50.00%) ⬇️
docarray/array/mixins/io/common.py 26.31% <0.00%> (-42.11%) ⬇️
docarray/document/mixins/text.py 56.00% <0.00%> (-42.00%) ⬇️
docarray/array/mixins/plot.py 27.27% <0.00%> (-40.70%) ⬇️
... and 46 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Copy Markdown

@LMMilliken LMMilliken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@guenthermi guenthermi force-pushed the fix-num_relevant_documents_per_label-calculation branch from a94f76d to e1a74b9 Compare December 15, 2022 16:49
@JoanFM JoanFM marked this pull request as ready for review December 15, 2022 20:50
Signed-off-by: Michael Guenther <[email protected]>
@samsja samsja merged commit 67d2b7c into docarray:main Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants