feat: multiple metrics in evaluate by guenthermi · Pull Request #643 · docarray/docarray

guenthermi · 2022-10-17T14:15:56Z

Goals:

allow users to pass multiple metrics to the evaluation function
calculate all metric scores at once
check and update documentation, if required. See guide

codecov · 2022-10-17T15:25:38Z

Codecov Report

Merging #643 (096dbee) into main (5dbf1a3) will decrease coverage by 0.04%.
The diff coverage is 76.31%.

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   86.50%   86.45%   -0.05%     
==========================================
  Files         133      133              
  Lines        6735     6755      +20     
==========================================
+ Hits         5826     5840      +14     
- Misses        909      915       +6

Flag	Coverage Δ
docarray	`86.45% <76.31%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/array/mixins/evaluation.py	`80.72% <76.31%> (-3.41%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

JoanFM

is this a breaking change with respect to the return of evaluate?

docarray/array/mixins/evaluation.py

guenthermi · 2022-10-18T07:02:54Z

is this a breaking change with respect to the return of evaluate?

Yes, I could return only a single number if the user passes only one metric but having different return types for different inputs is a bit strange

guenthermi · 2022-10-18T07:06:02Z

depends on #617

docarray/array/mixins/evaluation.py

alexcg1 · 2022-10-19T08:58:48Z

@guenthermi remember to tag @NicholasDunham for any future PRs that include docs (or docstrings)

bwanglzu

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.
can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one.
I'm expecting we:

.evaluate(metrics=['precision_at_k'])

guenthermi · 2022-10-19T09:28:26Z

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.

can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:
.evaluate(metrics=['precision_at_k'])

I don't care much if we support passing only a string or not.
@JoanFM @samsja @JohannesMessner Do you have an opinion about it?

Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

JoanFM · 2022-10-19T09:39:28Z

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.

can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:
.evaluate(metrics=['precision_at_k'])
I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?

Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.

I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

JoanFM · 2022-10-19T09:39:42Z

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.

can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:
.evaluate(metrics=['precision_at_k'])
I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.
I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

Agree, all is not good

bwanglzu · 2022-10-19T09:42:05Z

make sense, ignore my all suggestion.

guenthermi · 2022-10-19T09:43:06Z

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.

can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:
.evaluate(metrics=['precision_at_k'])
I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.
I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me

We could transform strings into a list of one string in the deprecation decorator and adjust the documentation. So I would not say that it is complex

JoanFM · 2022-10-19T09:44:43Z

two comments:

i don't like the fact that metrics='precision_at_k' should be supported, i suppose metrics should be always accept a list, not a str.

can we provide a all option to evaluate all metrics we have?

it's like what we're doing in callbacks, user should always pass a list of callbacks even user only need one. I'm expecting we:
.evaluate(metrics=['precision_at_k'])
I don't care much if we support passing only a string or not. @JoanFM @samsja @JohannesMessner Do you have an opinion about it?
Having an all keyword is not a good idea, because the metric functions have different use cases, e.g., the ndcg metric should not be applied on a top k ranking while for other metrics this makes sense.
I believe, that to keep backwards compatibility we need to accept at least metric=STR but I agree, it is cleaner if user needs to pass list. But if it is complex, this seems like a minor issue to me
We could transform strings into a list of one string in the deprecation decorator and adjust the documentation. So I would not say that it is complex

then go for it

bwanglzu

LGTM!

docs/fundamentals/documentarray/evaluation.md

samsja

I think that we can make the documentation a bit better for evaluation. @NicholasDunham what do u think ?

samsja

the docstring for evaluate are not reported because of the decorator

old:

new:

guenthermi · 2022-10-19T11:56:59Z

I think that we can make the documentation a bit better for evaluation. @NicholasDunham what do u think ?

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

samsja · 2022-10-19T12:11:56Z

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

Okay then lets open an issue for improving this documentation and please align with @NicholasDunham on how do refactor it

guenthermi · 2022-10-19T12:18:49Z

Maybe we can rewrite the documentation in another PR. There are also some old features of the evaluate functions which are not yet documented and today, I have not more time to do this.

Okay then lets open an issue for improving this documentation and please align with @NicholasDunham on how do refactor it

I created #651 for the documentation changes.

github-actions · 2022-10-19T12:26:44Z

📝 Docs are deployed on https://ft-feat-multiple-metrics-in-evaluate--jina-docs.netlify.app 🎉

guenthermi added 10 commits October 12, 2022 17:13

feat: add support for labeled dataset to evaluate function

94d17c9

refactor: add matches before transfering to backend, add example to docs

99e3e66

fix: missing initialization of d2

bedfc7a

test: add tests to check if exceptions and warnings are raised

23fafd3

fix: duplicate test name

dfa4aa3

refactor: implement review notes

cfb6b2d

Merge branch 'main' into feat-support-labels-in-evaluate

4df372a

fix: update r_precision test

c499634

refactor: remove comment

15ea32d

feat: support multiple metrics in evaluation function

798d94f

guenthermi linked an issue Oct 17, 2022 that may be closed by this pull request

allow evaluate function to calculate multiple metrics at once #641

Closed

docs: update evaluate functions in docs

087d7d6

github-actions bot added size/m area/core area/docs area/testing component/array labels Oct 17, 2022

JoanFM changed the title ~~Feat multiple metrics in evaluate~~ feat: multiple metrics in evaluate Oct 17, 2022

JoanFM requested changes Oct 17, 2022

View reviewed changes

docarray/array/mixins/evaluation.py Outdated Show resolved Hide resolved

docarray/array/mixins/evaluation.py Show resolved Hide resolved

guenthermi added 2 commits October 18, 2022 09:29

fix: change metric to metrics in tests

a5a697b

refactor: solve merge conflict

9461ed5

JoanFM requested changes Oct 18, 2022

View reviewed changes

docarray/array/mixins/evaluation.py Outdated Show resolved Hide resolved

guenthermi added 2 commits October 18, 2022 17:16

test: add test for multiple metrics

4942676

refactor: handle metric and metric_name in deprecation decorator

c607fcf

guenthermi requested a review from JoanFM October 18, 2022 15:39

guenthermi added 2 commits October 18, 2022 17:45

fix: black

241c294

Merge branch 'main' into feat-multiple-metrics-in-evaluate

011cdf1

guenthermi requested review from JohannesMessner and NicholasDunham October 19, 2022 09:04

bwanglzu reviewed Oct 19, 2022

View reviewed changes

guenthermi added 3 commits October 19, 2022 12:15

refactor: only support lists

3f6f2de

fix: tuple error in deprecation decorator

b50a642

refactor: change some tests to pass metrics as list

3fefd2d

bwanglzu approved these changes Oct 19, 2022

View reviewed changes

JoanFM approved these changes Oct 19, 2022

View reviewed changes

samsja reviewed Oct 19, 2022

View reviewed changes

docs/fundamentals/documentarray/evaluation.md Show resolved Hide resolved

samsja reviewed Oct 19, 2022

View reviewed changes

docs/fundamentals/documentarray/evaluation.md Show resolved Hide resolved

samsja requested changes Oct 19, 2022

View reviewed changes

fix: add wraps decorator to retain docstring

b9415db

Merge branch 'main' into feat-multiple-metrics-in-evaluate

096dbee

JohannesMessner approved these changes Oct 19, 2022

View reviewed changes

samsja approved these changes Oct 19, 2022

View reviewed changes

JoanFM merged commit 135c007 into main Oct 19, 2022

JoanFM deleted the feat-multiple-metrics-in-evaluate branch October 19, 2022 12:58

alaeddine-13 mentioned this pull request Oct 19, 2022

Chore: draft release note v0.18.0 #648

Closed

Conversation

guenthermi commented Oct 17, 2022

Uh oh!

codecov bot commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JoanFM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

guenthermi commented Oct 18, 2022

Uh oh!

guenthermi commented Oct 18, 2022

Uh oh!

Uh oh!

alexcg1 commented Oct 19, 2022

Uh oh!

bwanglzu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guenthermi commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoanFM commented Oct 19, 2022

Uh oh!

JoanFM commented Oct 19, 2022

Uh oh!

bwanglzu commented Oct 19, 2022

Uh oh!

guenthermi commented Oct 19, 2022

Uh oh!

JoanFM commented Oct 19, 2022

Uh oh!

bwanglzu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

samsja left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guenthermi commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samsja commented Oct 19, 2022

Uh oh!

guenthermi commented Oct 19, 2022

Uh oh!

github-actions bot commented Oct 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Oct 17, 2022 •

edited

Loading

bwanglzu left a comment •

edited

Loading

guenthermi commented Oct 19, 2022 •

edited

Loading

samsja left a comment •

edited

Loading

guenthermi commented Oct 19, 2022 •

edited

Loading